and if someone have 10min to help me out with my regex hell I would be a happy man
🖐️
lol
give me 1min to create a snippet
So I am working on tests suite from aws to sign v4 as I find out that some cases were not handled. Anyway, They give a raw text file representing a request and I need to do a req-text->req-map
and to capture elements from the file
some of which are optionnal
this is my (wrong) regex =>
(defn req-text->req-map
"Given a request from AWS test*.req, returns a clj-http request
map."
[input]
(let [[_ verb uri host date]
(re-find #"([A-Z]+)\s(\S+).+\nHost:(\S+)\nX-Amz-Date:(\S+)" input)]
{:request-method verb
:uri uri
:host host
:date date}))
here are the results =>
you’ll find nil because I don’t handle My-Header and params yet
I tried this one without success
(def input "GET / HTTP/1.1\nHost:<http://example.amazonaws.com|example.amazonaws.com>\nMy-Header1:value2\nMy-Header1:value2\nMy-Header1:value1\nX-Amz-Date:20150830T123600Z")
(let [[_ & a]
(re-find #"([A-Z]+)\s(\S+).+\n(My-Header\d:value\d\n)/" input)]
a)
and I can’t figure out how capturing the optional multiple My-Header
hint : I really suxx at regex
That’s the last missing case or are there more tests with more headers? I’m not sure regexes are the answer
Is GET / HTTP/1.1\nHost:<http://example.amazonaws.com|example.amazonaws.com>\nMy-Header1:value1\n value2\n value3\nX-Amz-Date:20150830T123600Z
even valid?
>>> Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT.
@cgrand I don’t catch Param1=value1 also
how else than regex
?
manual parsing or several regexes stages
#“([A-Z]+)\s(\S+).+\nHost:(\S+)\n((?:My-Header\d:.*\n(?:[ \t].*\n)*)*)X-Amz-Date:(\S+)”
But seriously, don’t do that
A HTTP parsing lib?
lol
Ok, I’ll look at some HTTP Parsing lib
But just for information, If you really had to do regex, it’s possible right ?
I would read the file as lines, parse the 1st line as method path protocol
hmmm no
so you consume the 1st line and then re-seq on headers
(let [req “GET / HTTP/1.1\nHost:<http://example.amazonaws.com|example.amazonaws.com>\nMy-Header1:value1\n value2\n value3\nX-Amz-Date:20150830T123600Z”
[_ method path headers] (re-matches #“(?s)([A-Z]+)\s+(\S+).*?\n(.*)” req)
headers (for [[_ header value] (re-seq #“(?s)(\S+):(.*?\n(?:[\t ].*?\n)*)” (str headers “\n”))]
[header value])]
[method path headers])
yields
[“GET”
“/”
([“Host” “<http://example.amazonaws.com|example.amazonaws.com>\n”]
[“My-Header1” “value1\n value2\n value3\n”]
[“X-Amz-Date” “20150830T123600Z\n”])]
I don’t find libs that could do the job, I tried with org.apache.httpclient but I can’t get the request body when a POST request
And my snippet above?
let me try, I was focusing on parsing raw HTTP with apache httpclient ^^
headers are not supposed to be unique ?
key => unique ;
No. Some may be multi valued and it’s a way to encode that.
a first draft that works well for headers but not post param=value
(defn req-text->req-map-revisited [req-text]
(let [is (ByteArrayInputStream. (.getBytes req-text (StandardCharsets/UTF_8)))
session-input-buffer (doto (SessionInputBufferImpl. (HttpTransportMetricsImpl.) (* 8 2048))
(.bind is))
basic-http-request (.parse (DefaultHttpRequestParser. session-input-buffer))
headers (for [h (.getAllHeaders basic-http-request)]
[(.getName h) (.getValue h)])
headers (into {}
(x/by-key (comp (interpose ",")
x/str))
headers)
request-line (.getRequestLine basic-http-request)]
(cond->
{:uri (.getUri request-line)
:request-method (.getMethod request-line)}
(not (or (nil? headers) (empty? headers))) (assoc :headers headers))))
which returns
{:uri "/", :request-method "GET", :headers {"Host" "<http://example.amazonaws.com|example.amazonaws.com>", "My-Header1" "value2,value2,value1", "X-Amz-Date" "20150830T123600Z"}}
For info => find in tests
A note about signing requests to Amazon S3:
In exception to this, you do not normalize URI paths for requests to Amazon S3. For example, if you have a bucket with an object named my-object//example//photo.user, use that path. Normalizing the path to my-object/example/photo.user will cause the request to fail. For more information, see Task 1: Create a Canonical Request in the Amazon Simple Storage Service API Reference: <http://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html#canonical-request>