I’d like to handle user uploads and downloads by proxying to s3, but wrapped in an authentication/authorization layer. Would prefer to be an async/streaming solution so that I don’t have to deal with memory issues and managing threads. From a little bit of research, there’s a lot of moving parts that have to align to make this happen... Any pointers on this?
I mean, cognitect’s AWS api doesn’t handle streaming yet. The popular http libraries are blocking. Most http servers support async responses but making sure a slow client doesn’t starve everything else is also tricky to get right.
I’m using pedestal/jetty as web server at the moment. Not wedded to Jetty but keeping pedestal around would be nice.
@orestis AFAIK the best way to do this is to use pre-signed upload/download urls. You can have an endpoint on your server that checks for authentication and authorization, generates the URL and returns it to the client. The client can then push the file directly to S3 without you needing to proxy it through your servers. Official AWS-sdk’s have utils for creating the url’s but I don’t know about Cognitect’s AWS lib.
@valtteri right, and I presume that these URLs expire after some time, right? I wonder how that works in practice. This content is embedded on a web application and i don’t want to risk having broken links... But for the upload it makes sense, you are essentially making a write-only url that the browser can just POST to, that then expires after some time. I’ll dig more into that.
Yes, you can set expiration to be whatever is good for your use-case.
This is an interesting approach for the downloads: https://www.mediasuite.co.nz/blog/proxying-s3-downloads-nginx/
Essentially, generate the presigned url, pass it to nginx, let nginx do the proxying for you. The presigned url is never visible to the end-user, and you can also control headers.
Interesting indeed! However I’d still use the simple “handing the pre-signed s3 download url to the client” approach unless the files contain something super sensitive.
There’s quite a lot hassle with all the headers, retries etc. if you go down that route. I once wrote a thing that streams several files from S3 and aggregates them into a single zip-file in flight while the client is downloading. It eventually worked amazingly well but it required several days of fiddling with semi low-level transportation stuff.
The files do contain sensitive stuff - our clients are not technical but they make us jump through a thousand hoops before they trust us with their data. So even if I think a 60 second pre-signed url would be acceptable, if it comes up on a review there will be questions.
The one thing I don’t want to do is to write the proxying code - I’ve done similar things in the past and it was always a pain in the butt to fine tune and harden everything for production. For development, it’s not too bad though.