instaparse

If you're not trampolining your parser, why bother getting up in the morning?
frank 2017-02-17T17:54:26.000169Z

I'm having trouble creating a parser using the grammar specified here: https://developers.google.com/protocol-buffers/docs/reference/proto3-spec

frank 2017-02-17T17:54:52.000170Z

I'm getting the feeling that there are syntax differences

frank 2017-02-17T17:57:26.000171Z

I'm slurping the grammar out of a separate file, but I feel like escaped quotes still aren't being handled as I intend (e.g. quote = "'" | '"')

frank 2017-02-17T18:03:49.000172Z

does anyone know how quotes ought to be escaped in instaparse ebnf strings?

2017-02-17T18:13:21.000173Z

the way you have it looks likely to work to me

frank 2017-02-17T18:17:49.000174Z

maybe there's unmatched quotes somewhere in the grammar that I copied and pasted 😕

2017-02-17T18:19:38.000175Z

try making a trivial grammar that only matches a quote to make sure it works the way you expect

1
seylerius 2017-02-17T18:20:58.000177Z

^ This. So much this. When I'm making grammars, I often make little phrases to match a character I haven't tested before.

frank 2017-02-17T18:21:44.000178Z

I'll try that, thanks

aengelberg 2017-02-17T18:29:20.000180Z

"'" | '"' looks right, but there are sometimes additional layers of escaping you have to deal with.

aengelberg 2017-02-17T18:29:53.000181Z

e.g. if you wrote your grammar as a string in a Clojure file, it would probably have to look like

(def parser (insta/parser "quote \"'\" | '\"'"))

aengelberg 2017-02-17T18:31:11.000182Z

I see this in the protobuf spec

hexEscape = '\'
that will probably throw off instaparse, since it thinks you are escaping the second '

aengelberg 2017-02-17T18:31:25.000183Z

so it should really be

hexEscape = '\\'

aengelberg 2017-02-17T18:32:01.000185Z

@frank ^

aengelberg 2017-02-17T18:32:20.000186Z

also, /[^\0\n\\]/ is not valid EBNF in instaparse (should be #"[^\0\n\\]")

frank 2017-02-17T18:42:38.000189Z

ah, that's probably it!

frank 2017-02-17T18:43:41.000190Z

strangely enough, #"[^\0\n\\]" isn't valid clojure regex syntax, so I stole the same regex syntax from https://github.com/arpagaus/clj-protobuf/blob/master/resources/proto.ebnf

frank 2017-02-17T18:44:04.000192Z

they've got a few extra backslashes: #"[^\\0\\n]"

frank 2017-02-17T18:46:25.000193Z

@aengelberg what's the equivalent of the that they've got littered all over their grammar?

aengelberg 2017-02-17T18:49:37.000194Z

I think they meant that as a shorthand for alternating between all the digits. Sadly instaparse can't infer the intermediate values, so you would have to "0" | "1" | "2" | "3" | "4" | "5" | "6" | "8" | "9"

frank 2017-02-17T18:50:15.000195Z

ah, gotcha

frank 2017-02-17T18:55:57.000196Z

alternatively, #"[0-9]" should work too, right?

aengelberg 2017-02-17T18:56:14.000197Z

correct

1👍