Has anyone had any luck computing a :content-md5 satisfactory to Amazon S3, via Amazonica?
One twist is that I am gzipping my content, but I tried computing the MD5 off both the gzipped and ungzipped content, no luck.
The google got me this far computing the md5 param:
(-> (util/json-to-gzip-bytearray "helloworld")
digest/md5
bs/to-byte-array
(bt/encode :base64 {:url-safe? false})
bs/to-string)
FWIW, I stored once without the MD5 and then got back S3s MD5, "gwD+BwffF71J92+mTz2LPA==", and that works dandy. My code above yields "ODMwMGZlMDcwN2RmMTdiZDQ5Zjc2ZmE2NGYzZDhiM2M=". Just a hair off. :
I also saw somewhere that the MD5 digest must be converted from hex to integer and _that_ converted to base64. Tried that, no luck.
Any tips, links, guesses are welcome! Thx.
I would be very suspicious of all that code
very easy to get encoding, or bytes vs. characters wrong
the java code for getting an md5sum should just return a byte array
@hiskennyness I've been able to calculate matching MD5s using https://github.com/tebeka/clj-digest, comparing my local files with what comes back from https://github.com/cognitect-labs/aws-api/
OMG.
(:import [com.amazonaws.util Md5Utils])
.....
(Md5Utils/md5AsBase64 data-gzipped)
After a day of googling "Clojure S3 content MD5" and not doing very well it occurred to me to try other languages. Including Java. 🙂 https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/util/Md5Utils.html Bingo.
Too easy. 🙂 Thanks for pitching in, all!
Oh, one more thing. I was wondering if I should compute the MD5 off the gzip I was uploading or the data I gzipped. Turned out it was the gzip.
Here's my comparison code
(defn duplicate-file? [s3-file local-file]
(when (some? s3-file)
(let [s3-md5 (str/replace (:ETag s3-file) "\"" "")
local-md5 (digest/md5 local-file)]
(= local-md5 s3-md5))))
Interesting. I will explore that. But S3 wants a base64 encoding passed to it if I want it to validate my upload, and I think that is where I am stuck. Thx, tho! I will learn sth playing with that.
I think you should be able to use digest to calculate the correct md5, at least assuming that AWS wants it in the same format as they pass back to you. I struggled with this quite a bit and it was all related to how I was reading and writing my files - when I brought it in to Clojure to calculate the md5 I was screwing something up and calculating the hash on the wrong data. Had to get it pulling out a byte-array properly, and for most of my troubleshooting I thought I was performing an incorrect hash on the correct data.
@shaun-mahood do you have to strip those quotes in other libraries, or just aws-api?
I've only used aws-api to get that data, so I'm not sure how the etag data is returned in other libraries
I recall s3 adding extra quotes around the etag header you get back
https://github.com/hiredman/propS3t/blob/master/src/propS3t/core.clj#L189 is where I had to slice them off for multipart uploads when doing the s3 thing via the rest api (but that is super old code, I haven't looked at the s3 rest api in a long time)
ok cool... I know aws-api handles all string datatypes in the same way...
so it probably wasn't the lib