Hey, someone here? :)
I was trying to make this ebnf grammar work with instaparse: https://github.com/cbeust/kash/blob/master/src/main/resources/bash.ebnf But so far it didn't work out
Here's what I got: https://gist.github.com/borkdude/98c5d9e2bf598b227e8e643e4271e61e
user=> (def parser (insta/parser "/Users/borkdude/Downloads/bash.ebnf"))
#'user/parser
user=> (parser "foo")
Parse error at line 1, column 1:
foo
^
Expected:
#"[0-9]"
Hi, when you do not specify a starting rule for the grammar instaparse selects the top rule for a starting point. In you case that is the number rule. https://github.com/engelberg/instaparse#parsing-from-another-start-rule This should work:
(parser "foo" :start :command)
(NB: surrounding an rule with angle brackets makes it hidden, since all commands are hidden you will probably only get an empty list on a successful parse)
aaah
user=> (parser "foo" :start :word)
("f" "o" "o")
btw, it wasn't my choice to use angle brackets, I just copied that from the original ebnf
oh I see, hidden means you don't get it back in the structure, but directly?
why does this succeed if I have set :partial
to false
:
user=> (parser "foo" :start :word :partial false)
[:word]
That was my hunch, which is why i thought a head's up was in it's place:) Yes, these is at good example of hiding here: https://github.com/engelberg/instaparse#hiding-content as mentioned, it is usually used for hiding whitespace and other tokens you do not care about in the final output, but if you hide the top rule, everything disapears
ah I see, it was because of the hiding again:
user=> (parser "foo" :start :word :partial false)
[:word [:word [:word [:letter "f"]] [:letter "o"]] [:letter "o"]]
:partial
allows a partially complete/successful parse to succeed, embedding the failure node in the AST where at the point where the output
ah:)
It seems the original ebnf works a bit differently than instaparse. e.g.:
<for_command> ::= 'for' <word> <newline_list> 'do' <compound_list> 'done'
| 'for' <word> <newline_list> '{' <compound_list> '}'
| 'for' <word> ';' <newline_list> 'do' <compound_list> 'done'
| 'for' <word> ';' <newline_list> '{' <compound_list> '}'
| 'for' <word> <newline_list> 'in' <word_list> <list_terminator>
<newline_list> 'do' <compound_list> 'done'
| 'for' <word> <newline_list> 'in' <word_list> <list_terminator>
<newline_list> '{' <compound_list> '}'
seems to assume that the tokens are automatically separated by whitespace
if I have to rewrite the grammar anyway I'm more inclined to hand-roll my own parser
I think that for most yacc/bison parsers rules are separated by whitespace by default yes, instaparse supports adding this by using the auto-whitespace feature which has worked well for me https://github.com/Engelberg/instaparse/blob/master/docs/ExperimentalFeatures.md#auto-whitespace
I dont know what you are using this parser for, but in my experience using a proper grammar-based parser is more maintainable and flexible in the long run. Of course for small use cases it can be a lot to get into and learn
this parser should parse bash syntax
but bash is not such a big language
This is the original: https://github.com/cbeust/kash/blob/master/src/main/resources/bash.ebnf
I just have some problems getting this to work with instaparse so far
it's not very important, just a fun project
Then i guess comes down to which approach you find most fun:) I think instaparse is quite amazing once you grok it, but again i understand i can be a hassle go get into. On the other side, hand written parsers can also be painful to get correct
There isn’t really a single EBNF syntax specification or RFC, so every “EBNF grammar” you’ll find in the wild will have a slightly varied flavor of the syntax. Sometimes because a certain parser library chose a unique metasyntax, or sometimes because the grammar is meant to serve as documentation rather than compiled and executed.
Instaparse attempts to support most of the different flavors, which is why you can use either x?
or [x]
syntax for example
But sometimes a grammar or a different parser library will make a particularly unusual syntax choice, like using angle brackets in rule names
Or a grammar will make an implicit logical assumption that Instaparse has no way to act upon, like whitespace being parsed between tokens
The angle brackets are particularly unfortunate since Instaparse chose to use angle brackets for an instaparse-specific feature (hiding data from the output parse tree)
ABNF, on the other hand, seems to be a much more regulated metasyntax, so copy and pasting ABNF grammars into instaparse (using :input-format :abnf
) tends to be safer