aws

http://status.aws.amazon.com/ https://www.expeditedssl.com/aws-in-plain-english
kennytilton 2020-02-27T22:22:50.054200Z

Has anyone had any luck computing a :content-md5 satisfactory to Amazon S3, via Amazonica?

One twist is that I am gzipping my content, but I tried computing the MD5 off both the gzipped and ungzipped content, no luck. 

The google got me this far computing the md5 param:

(-> (util/json-to-gzip-bytearray "helloworld")
      digest/md5
      bs/to-byte-array
      (bt/encode :base64 {:url-safe? false})
      bs/to-string)

FWIW, I stored once without the MD5 and then got back S3s MD5, "gwD+BwffF71J92+mTz2LPA==", and that works dandy. My code above yields "ODMwMGZlMDcwN2RmMTdiZDQ5Zjc2ZmE2NGYzZDhiM2M=". Just a hair off. :

I also saw somewhere that the MD5 digest must be converted from hex to integer and _that_ converted to base64. Tried that, no luck.

Any tips, links, guesses are welcome! Thx.

2020-02-27T22:28:09.055200Z

I would be very suspicious of all that code

2020-02-27T22:28:19.055700Z

very easy to get encoding, or bytes vs. characters wrong

2020-02-27T22:28:52.056600Z

the java code for getting an md5sum should just return a byte array

shaun-mahood 2020-02-27T22:29:25.057100Z

@hiskennyness I've been able to calculate matching MD5s using https://github.com/tebeka/clj-digest, comparing my local files with what comes back from https://github.com/cognitect-labs/aws-api/

kennytilton 2020-02-28T09:37:13.060600Z

OMG.

(:import [com.amazonaws.util Md5Utils])
.....
(Md5Utils/md5AsBase64 data-gzipped)

kennytilton 2020-02-28T09:41:20.061Z

After a day of googling "Clojure S3 content MD5" and not doing very well it occurred to me to try other languages. Including Java. 🙂 https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/util/Md5Utils.html Bingo.

kennytilton 2020-02-28T09:41:54.061200Z

Too easy. 🙂 Thanks for pitching in, all!

kennytilton 2020-02-28T09:44:59.061500Z

Oh, one more thing. I was wondering if I should compute the MD5 off the gzip I was uploading or the data I gzipped. Turned out it was the gzip.

shaun-mahood 2020-02-27T22:31:43.057900Z

Here's my comparison code

(defn duplicate-file? [s3-file local-file]
  (when (some? s3-file)
    (let [s3-md5 (str/replace (:ETag s3-file) "\"" "")
          local-md5 (digest/md5 local-file)]
      (= local-md5 s3-md5))))

kennytilton 2020-02-27T22:36:41.058100Z

Interesting. I will explore that. But S3 wants a base64 encoding passed to it if I want it to validate my upload, and I think that is where I am stuck. Thx, tho! I will learn sth playing with that.

shaun-mahood 2020-02-27T22:42:16.058300Z

I think you should be able to use digest to calculate the correct md5, at least assuming that AWS wants it in the same format as they pass back to you. I struggled with this quite a bit and it was all related to how I was reading and writing my files - when I brought it in to Clojure to calculate the md5 I was screwing something up and calculating the hash on the wrong data. Had to get it pulling out a byte-array properly, and for most of my troubleshooting I thought I was performing an incorrect hash on the correct data.

ghadi 2020-02-27T22:51:44.059200Z

@shaun-mahood do you have to strip those quotes in other libraries, or just aws-api?

shaun-mahood 2020-02-27T22:53:42.059400Z

I've only used aws-api to get that data, so I'm not sure how the etag data is returned in other libraries

2020-02-27T23:06:03.059600Z

I recall s3 adding extra quotes around the etag header you get back

2020-02-27T23:08:35.059800Z

https://github.com/hiredman/propS3t/blob/master/src/propS3t/core.clj#L189 is where I had to slice them off for multipart uploads when doing the s3 thing via the rest api (but that is super old code, I haven't looked at the s3 rest api in a long time)

ghadi 2020-02-27T23:17:47.060100Z

ok cool... I know aws-api handles all string datatypes in the same way...

ghadi 2020-02-27T23:18:05.060300Z

so it probably wasn't the lib