Re: AWS S3 big files. I've seen multi gigabyte objects uploads, so single large files can be done, but the size of the object needs to be known beforehand, which might be a reason that libs tend to keep the data in memory
for multipart uploads, the size of the part needs probably be known beforehand
the Java libs have a nice TransferManager, that can do parallel multipart downloads, if the object was uploaded in multipart fashion
would be need to add such support for aws-api too π
If I remember correclty current aws-api wants to keep the whole file in memory at least when downloading. http client used by aws-api does not support streaming. amazon java sdk do not have that limitation.
Indeed, with java sdk it doesnt keep all content in memory during upload
I think that S3 has limit of 5G for file transfer in one request. If the file is bigger than that then you must use multipart upload.
what would go around all this big file stuff would be support for pre-signed upload requests in the aws-api client. Iβm hoping to see this soon
Dito. I think the new SDK for Java (from Aws) will solve this, but maybe Itβs just wishful thinking.
so we need a name for a support lib :)
yeah, I think the limit used to be lower, but got raised at some point
the bring-your-own http client will help :)
CreateMultipartUpload -> UploadPart (many, you can do it concurrently) -> CompleteMultipartUpload
ah, true
have used only multipart download via the java libs in the past, since had a Redshift cluster write data to S3 in parallel :)
there is no multipart download
yup
you can do byte range requests on S3 objects in parallel, though. Similar effect π
got carried away remembering old project :)
nice point
we've thought about some "userspace helpers" for aws-api, like paginators, etc. but so far we're focusing on the raw operations
(presigning URLs is in that ballpark too)
if there would be 3rd psrty "userspace" libs, would it be ok to use aws-api as name prefix?
I can't stop you, but viesti.aws-api would be better π
:D
liking that already :)
would have to do some explaining on that particular name though :)
we have some code at work to do this for staging build artifacts. reduces over the chunks of a file, starts a future uploading a part for each one, then waits for all the futures to complete and completes the multipart upload. works great.