[storage] s3 provider PR

rosscdh · September 17, 2020, 7:49am

Hi there started on my first zammad pr with a view to adding s3 provider support to the application for cloud/docker/kube deployments where persistence is not on the FS and you dont want to store binary objects in the database.

github.com/zammad/zammad

[WIP] #660 start of PR to get s3 into zammad

zammad:develop ← rosscdh:faeture/s3-stateless-file-persistence

opened 03:02PM - 16 Sep 20 UTC

rosscdh

+91 -1

start of PR to get s3 into zammad needs migrations and the provider interface to… work and be hooked up to the stored accesskeyid accesskeysecret and endpoint todo * create migration * create interface for getting accessid/secret and endpoint (maybe even profile) * get provider interface working * get store mover working * tests

Its been a while since i did some rails, and am looking for a pointer as to how to get the settings coffee to take fields access_key_id access_key_secret and _endpoint_url

Its also made slightly more complicated by the fact that DB and FS dont currently take any input so i dont have an example of how to implement this (ie does it need a new model?) how to change the ui in an acceptable manner to take the new fields.

another perhaps better option would be to rely on ENV vars rather than db configuration for the s3 provider but again id like some direction from the devs in the community

I realise this is more a rails general question but as this is a community project I would prefer direction from the owners.

Thanks and looking forward to your feedback!

anon29869905 · September 17, 2020, 12:03pm

Hey @rosscdh - what a first post in the community. I love it Welcome to the Zammad community.

I checked the current implementation and the Setting storage_provider JS controller is inheriting from App.SettingsAreaItem and renders a custom template (you’ve already updated in the PR). A Setting Item is handled as a simple one value entity without any relation to other Settings. This helps us to add simple Settings without much hassle.
In your context you want to have a more rich Setting with logic and dependencies. So instead of using the App.SettingsAreaItem as a controller you want to build a custom one like e.g. App.SettingsAreaLogo. This way you get the full control of rendering your setting block. App.SettingsAreaLogo actually is a pretty good example here with a rich functionality.

We have no perfect solution for storing access variables yet and relied on dedicated Settings-Records with custom rendering in the past.

So what you could do is migrate the Setting “element” from inheriting from App.SettingsAreaItem to a custom version of App.SettingsAreaLogo e.g. App.SettingsAreaStorage. This would extend the current UI and would render a nice UI where the Admin could choose between the storage providers the same way as it is right now. There would be no visual difference from the current input except that S3 would be available. But when the Admin selects S3 the additionally required attribute input fields are shown. Once the Admin clicks submit the new App.SettingsAreaStorage controller kicks in, checks which provider is selected and also checks for additional attributes if needed. These custom attributes/additional access variables could then be stored via Setting.set('key', 'value').

We could let the S3 storage backend fall back on ENVs if the access variable Settings are blank.

Some more backround information you might be interested in:

For the frontend we rely on an outdated framework called SpineJS with some additional custom logic. You can find more information about it here: https://spine.github.io/
I’d also recommend to open up a bottle of wine and spend some time on looking how other JS controllers work.
The JS template engine we use is called and outdated as well. You can find more information about it here: https://github.com/sstephenson/eco

Hope that helps and gives you a direction. Let me know if you have any further questions.

Looking forward to merge your S3 contribution

rosscdh · September 17, 2020, 2:54pm

Hey there, thanks for the info and of course the welcome!

Yep env vars are def the way to go; its pretty much how most(all?) other cloud oriented systems work so perhaps dont even need to change the form (tho should add info/checks about whether the env vars are set or not). there are actually security concerns about storing this kind of info in a db and not injecting as envs (12 factor app)

no probs re outdated libs, it happens; difficult to keep up with js libs in the last 5 years; am starting to recommend https://svelte.dev/ to anyone listening on that topic tho.

Im really impressed by zammad its a fantastic system but needs a little work on the getting it cloud ready epic; and am keen to assist if i can

May I ask another question about database and database migrations or rather another entry?

anon29869905 · September 17, 2020, 3:28pm

As it’s part of the S3 topic feel free to ask here

rosscdh · September 17, 2020, 5:52pm

Thanks nevermind the db migration (i wanted to make sure there were no anti-patterns with dbs like there are with discourse forum) but my test migration went smoothly.

I imagine as the app has a js frontend we woudl want to upload the file with a presigned url? instead of serverside?

This simplifies many items and puts the onus on the frontend and the backend simply provides an endpoint that calculates a presigned upload url that the client then can upload directly to. Saving a base64 encoded request to the server->upload to s3?

Does that make sense? (also implies im going to have to get to know spine a bit better i guess)

I’ve updated the PR to cater to this; and once I’ve gotten my dev env setup ill write some tests.

Ideally (as i have 0 xp with spine and the implementation here) Id like to provide the endpoint and get some help with the FE work?

anon29869905 · September 21, 2020, 8:54am

Actually I’m not that experienced with S3 and the workflow. Would you mind sharing/pointing me to a resource that matches the Zammad use case or elaborate it to me on what the steps are. That would be a great help to me. The goal would be to provide developers a transparent API that works the same for any storage backend without the need to check for the currently active one.

JFI: There should be no business logic in the JS App / Frontend. This makes it harder to implement different clients (e.g. Mobile/API integrations etc.).

JFI2: We planned to evaluate Rails ActiveStorage and see how/if it covers our requirements (duplicate prevention etc.) but haven’t had the chance yet. AFAIK it also supports S3 and other cloud storage providers.

rosscdh · September 21, 2020, 11:30am

No prob, so the usecase is such

user wants to upload file
file with backend client goes client → POST fat binary → backend → s3 (this is kinda inefficient especially since the destination is actually s3 not the server)
instead client → POST get presigned url → client uploads binary direct to s3 (skipping the network and overhead of having to upload the file 2x once to the server and the server again to s3 double network traffic
its very common in most places where a javascript frontend is used (https://medium.com/@aakashbanerjee/upload-files-to-amazon-s3-from-the-browser-using-pre-signed-urls-4602a9a90eb5 Upload direct to S3 with Pre-signed POST request - BigBinary Blog many resources on this topic)

the store provider would then provide all the “normal” interfaces so that data can be moved from DB and File to S3 but… AS WELL AS and endpoint to calculate the presigned url (ive already laid the basis for this n my PR) the web client would POST request a presigned url and then upload direct to s3 instead of via the server backend (saving bandwith and time)

no business logic in the front end - client selects file as normal; and on change a presigned url is requested from backend. client then transfers data straight to s3 using the presigned url and extra fields provided by the presigned endpoint
as with current the incoming request provides things like filename, guessed content type and can be prefixed with random sha as is currently done
Uploading from Rails to AWS S3 with PreSigned URLs - Stuff… And Things… is also pretty common in the rails world from what i remember

https://kirshatrov.com/2015/08/19/s3-direct-uploads/

rosscdh · September 21, 2020, 11:41am

ActiveStorage seems to be the current best practice for this kind of thing … seems to be a good choice and also appears to support presigning of urls transparently.

https://applaudostudios.com/blog/implementing-direct-upload-in-a-rails-api/

anon29869905 · September 22, 2020, 10:03am

Hm while the benefits of direct upload look appealing I see major changes required to integrate it in the generic storage backend architecture we currently have. I estimate a required rewrite of about 70-80% of the current implementation. We have made bad experience with such “follow up” or “extended scope” changes. Resulting in broken or never finished implementation. Therefore I’d recommend to take the direct route first and add the S3 integration without direct upload first. Afterwards we can pick up the work on the ActiveStorage migration or change the generic storage backend implementation to support direct upload as well.

What do you think?

rosscdh · September 29, 2020, 7:03am

Not sure where you get the 70-80% number from?

also not sure where you get extended scope from?

Naja, cest la vie I guess

anon29869905 · September 29, 2020, 1:24pm

Sorry, that might have sounded different than I meant it. English is not my mother tongue and I was a bit in a hurry.

Next try – please don’t get me wrong

70-80% is just a rough estimation based on my experience with the codebase, the implementation and past experience with such changes. Let me know what’s unclear and I try to elaborate in more detail.

By “extended scope” I mean that we’ve made bad experience when tasks started with e.g. “Let’s add S3 support” and ended with “we should refactor/rewrite the storage logic/layer”. However, I don’t intend to judge your skill based on the experience I/we made with other people. If you think you can make it don’t let a random stranger on the internet tell you that you can’t.

I’m happy to help and try to free as much time as I can for giving guidance and answer questions. Again based on past experience with other people, I spend much time on guidance and the changes were never finished. That can be frustrating. It’s just that we need to think about various use cases and setups – and future maintenance. Don’t want to sound rude or picky.

Thanks for keeping it up!

rosscdh · September 29, 2020, 5:05pm

Hi Thorsten, kein Thema, du kannst gerne auf Deutsch mir schrieben.

Aber schriftlich ist mein Deutsch schrecklich, dazu entschuldigung wenn ich auf English zuruck schreiben.

Yeah sure I can understand your concerns, working on quite a few projects myself I quite well know the feeling.

While I dont quite agree its a 70% rewrite as its basically the same interfaces as for the other 2 backends plus a single presign route to grant access to upload to the js client.

The standard interface methods woull be used only to migrate from db/file to s3

so really its only 1 new route; and some javascript of which there are tonnes of working examples in the wild. and the store interfaces for move and delete which are made quite simple by the aws-sdk s3 gem

rosscdh · September 29, 2020, 5:07pm

Please take a look at the existing MR ive laid the framework for it already.

https://github.com/rosscdh/ is my profile to give you an idea of the range i work on in my private time

anon29869905 · October 5, 2020, 1:10pm

Hey @rosscdh - thanks for the offer but since english is the primary language around here I’d rather stick to it – so that as much people as possible can have their take aways… hopefully

As you already pointed out there needs to be a dedicated logic to generate the presign URLs. That collides with the specifications of our backend/module/plugin/abstraction layer. We have various places where we have those kind of layers (Channel/Authentication/Import/etc.). The goal is to have one interface that work for all implementations/integrations/provider/plugins/etc. of the corresponding area.
As we now noticed the current implementation of the storage layer would work for a “double upload” implementation but wouldn’t work for the “direct upload” approach. That’s too bad. Therefore we’d need a new layer architecture that would also cover the “direct upload”. Having a layer that covers only 80% of the implementation of a backend/provider/etc. is not an option for us. It would require ifs and other conditional logic all over the place. So whenever new backends/providers/etc. would be added those conditionals would grow and grow. Covering new cases all over the application. That would make the (part of the) application unmaintainable over time. A typical symptom of this anti-pattern is a Shotgun surgery. We have had enough of those in the past and want to avoid them for the future. To do this we’d need a new module/backend/provider layer.

Looking at the implementation of the current module/backend/provider layer I see the need of a major rewrite. We’d need to change the implementation of the frontend and backend. That’s why I think it’s a 70-80% rewrite.

It maybe helps if you imagine adding/building the S3 without adding if statements or dedicated checks for S3, class names or method checks.

system · February 2, 2021, 1:10pm

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.