I'm having trouble creating a parser using the grammar specified here: https://developers.google.com/protocol-buffers/docs/reference/proto3-spec
I'm getting the feeling that there are syntax differences
I'm slurping the grammar out of a separate file, but I feel like escaped quotes still aren't being handled as I intend (e.g. quote = "'" | '"'
)
does anyone know how quotes ought to be escaped in instaparse ebnf strings?
the way you have it looks likely to work to me
maybe there's unmatched quotes somewhere in the grammar that I copied and pasted 😕
try making a trivial grammar that only matches a quote to make sure it works the way you expect
^ This. So much this. When I'm making grammars, I often make little phrases to match a character I haven't tested before.
I'll try that, thanks
"'" | '"'
looks right, but there are sometimes additional layers of escaping you have to deal with.
e.g. if you wrote your grammar as a string in a Clojure file, it would probably have to look like
(def parser (insta/parser "quote \"'\" | '\"'"))
I see this in the protobuf spec
hexEscape = '\'
that will probably throw off instaparse, since it thinks you are escaping the second '
so it should really be
hexEscape = '\\'
@frank ^
also, /[^\0\n\\]/
is not valid EBNF in instaparse (should be #"[^\0\n\\]"
)
ah, that's probably it!
strangely enough, #"[^\0\n\\]"
isn't valid clojure regex syntax, so I stole the same regex syntax from https://github.com/arpagaus/clj-protobuf/blob/master/resources/proto.ebnf
they've got a few extra backslashes: #"[^\\0\\n]"
@aengelberg what's the equivalent of the …
that they've got littered all over their grammar?
I think they meant that as a shorthand for alternating between all the digits. Sadly instaparse can't infer the intermediate values, so you would have to "0" | "1" | "2" | "3" | "4" | "5" | "6" | "8" | "9"
ah, gotcha
alternatively, #"[0-9]" should work too, right?
correct