aws

http://status.aws.amazon.com/ https://www.expeditedssl.com/aws-in-plain-english
viesti 2020-01-15T08:17:58.013300Z

Re: AWS S3 big files. I've seen multi gigabyte objects uploads, so single large files can be done, but the size of the object needs to be known beforehand, which might be a reason that libs tend to keep the data in memory

viesti 2020-01-15T08:18:23.013800Z

for multipart uploads, the size of the part needs probably be known beforehand

viesti 2020-01-15T08:20:57.014800Z

the Java libs have a nice TransferManager, that can do parallel multipart downloads, if the object was uploaded in multipart fashion

viesti 2020-01-15T08:21:07.015100Z

would be need to add such support for aws-api too πŸ™‚

jsyrjala 2020-01-15T08:34:28.017Z

If I remember correclty current aws-api wants to keep the whole file in memory at least when downloading. http client used by aws-api does not support streaming. amazon java sdk do not have that limitation.

kirill.salykin 2020-01-15T08:46:22.017100Z

Indeed, with java sdk it doesnt keep all content in memory during upload

jsyrjala 2020-01-15T08:57:57.017300Z

I think that S3 has limit of 5G for file transfer in one request. If the file is bigger than that then you must use multipart upload.

jsyrjala 2020-01-15T08:59:01.017500Z

https://github.com/cognitect-labs/aws-api/issues/107

steveb8n 2020-01-15T09:01:51.018500Z

what would go around all this big file stuff would be support for pre-signed upload requests in the aws-api client. I’m hoping to see this soon

Linus Ericsson 2020-01-15T14:58:29.020400Z

Dito. I think the new SDK for Java (from Aws) will solve this, but maybe It’s just wishful thinking.

viesti 2020-01-15T17:30:22.020600Z

so we need a name for a support lib :)

viesti 2020-01-15T17:31:01.020800Z

yeah, I think the limit used to be lower, but got raised at some point

viesti 2020-01-15T17:32:56.021800Z

the bring-your-own http client will help :)

ghadi 2020-01-15T17:35:47.022900Z

@viesti @jsyrjala multipart uploads are already supported. Call the various Multipart operations with your input split into chunks downloading large files is problematic because of no streaming

ghadi 2020-01-15T17:36:51.023600Z

CreateMultipartUpload -> UploadPart (many, you can do it concurrently) -> CompleteMultipartUpload

viesti 2020-01-15T17:37:37.024Z

ah, true

viesti 2020-01-15T17:38:33.026100Z

have used only multipart download via the java libs in the past, since had a Redshift cluster write data to S3 in parallel :)

ghadi 2020-01-15T17:39:08.026900Z

there is no multipart download

viesti 2020-01-15T17:39:17.027400Z

yup

ghadi 2020-01-15T17:39:28.028Z

you can do byte range requests on S3 objects in parallel, though. Similar effect πŸ™‚

viesti 2020-01-15T17:39:34.028200Z

got carried away remembering old project :)

viesti 2020-01-15T17:39:43.028600Z

nice point

ghadi 2020-01-15T17:40:11.029200Z

we've thought about some "userspace helpers" for aws-api, like paginators, etc. but so far we're focusing on the raw operations

ghadi 2020-01-15T17:40:56.030800Z

(presigning URLs is in that ballpark too)

viesti 2020-01-15T17:41:36.031600Z

if there would be 3rd psrty "userspace" libs, would it be ok to use aws-api as name prefix?

ghadi 2020-01-15T17:42:04.032800Z

I can't stop you, but viesti.aws-api would be better πŸ™‚

viesti 2020-01-15T17:42:12.033100Z

:D

viesti 2020-01-15T17:42:41.033700Z

liking that already :)

viesti 2020-01-15T17:43:16.034600Z

would have to do some explaining on that particular name though :)

2020-01-15T17:55:11.034700Z

we have some code at work to do this for staging build artifacts. reduces over the chunks of a file, starts a future uploading a part for each one, then waits for all the futures to complete and completes the multipart upload. works great.