instaparse

If you're not trampolining your parser, why bother getting up in the morning?
borkdude 2021-06-29T08:56:48.001300Z

Hey, someone here? :)

borkdude 2021-06-29T08:57:09.001800Z

I was trying to make this ebnf grammar work with instaparse: https://github.com/cbeust/kash/blob/master/src/main/resources/bash.ebnf But so far it didn't work out

borkdude 2021-06-29T08:58:31.002300Z

Here's what I got: https://gist.github.com/borkdude/98c5d9e2bf598b227e8e643e4271e61e

user=> (def parser (insta/parser "/Users/borkdude/Downloads/bash.ebnf"))
#'user/parser
user=> (parser "foo")
Parse error at line 1, column 1:
foo
^
Expected:
#"[0-9]"

Sigve 2021-06-29T09:27:28.003700Z

Hi, when you do not specify a starting rule for the grammar instaparse selects the top rule for a starting point. In you case that is the number rule. https://github.com/engelberg/instaparse#parsing-from-another-start-rule This should work:

(parser "foo" :start :command)

Sigve 2021-06-29T09:30:35.005500Z

(NB: surrounding an rule with angle brackets makes it hidden, since all commands are hidden you will probably only get an empty list on a successful parse)

borkdude 2021-06-29T10:03:48.006Z

aaah

borkdude 2021-06-29T10:05:19.006400Z

user=> (parser "foo" :start :word)
("f" "o" "o")

borkdude 2021-06-29T10:06:22.006900Z

btw, it wasn't my choice to use angle brackets, I just copied that from the original ebnf

borkdude 2021-06-29T10:09:06.007300Z

oh I see, hidden means you don't get it back in the structure, but directly?

borkdude 2021-06-29T10:13:27.009200Z

why does this succeed if I have set :partial to false:

user=> (parser "foo" :start :word :partial false)
[:word]

Sigve 2021-06-29T10:13:36.009300Z

That was my hunch, which is why i thought a head's up was in it's place:) Yes, these is at good example of hiding here: https://github.com/engelberg/instaparse#hiding-content as mentioned, it is usually used for hiding whitespace and other tokens you do not care about in the final output, but if you hide the top rule, everything disapears

borkdude 2021-06-29T10:14:55.010600Z

ah I see, it was because of the hiding again:

user=> (parser "foo" :start :word :partial false)
[:word [:word [:word [:letter "f"]] [:letter "o"]] [:letter "o"]]

Sigve 2021-06-29T10:16:00.011200Z

:partialallows a partially complete/successful parse to succeed, embedding the failure node in the AST where at the point where the output

Sigve 2021-06-29T10:16:04.011400Z

ah:)

borkdude 2021-06-29T10:19:17.012Z

It seems the original ebnf works a bit differently than instaparse. e.g.:

<for_command> ::=  'for' <word> <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> <newline_list> '{' <compound_list> '}'
            |  'for' <word> ';' <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> ';' <newline_list> '{' <compound_list> '}'
            |  'for' <word> <newline_list> 'in' <word_list> <list_terminator>
                   <newline_list> 'do' <compound_list> 'done'
            |  'for' <word> <newline_list> 'in' <word_list> <list_terminator>
                   <newline_list> '{' <compound_list> '}'

borkdude 2021-06-29T10:19:33.012400Z

seems to assume that the tokens are automatically separated by whitespace

borkdude 2021-06-29T10:20:42.012800Z

if I have to rewrite the grammar anyway I'm more inclined to hand-roll my own parser

Sigve 2021-06-29T10:23:01.014200Z

I think that for most yacc/bison parsers rules are separated by whitespace by default yes, instaparse supports adding this by using the auto-whitespace feature which has worked well for me https://github.com/Engelberg/instaparse/blob/master/docs/ExperimentalFeatures.md#auto-whitespace

Sigve 2021-06-29T10:24:16.015300Z

I dont know what you are using this parser for, but in my experience using a proper grammar-based parser is more maintainable and flexible in the long run. Of course for small use cases it can be a lot to get into and learn

borkdude 2021-06-29T10:26:34.015600Z

this parser should parse bash syntax

borkdude 2021-06-29T10:26:42.015800Z

but bash is not such a big language

borkdude 2021-06-29T10:27:06.016Z

This is the original: https://github.com/cbeust/kash/blob/master/src/main/resources/bash.ebnf

borkdude 2021-06-29T10:27:15.016400Z

I just have some problems getting this to work with instaparse so far

borkdude 2021-06-29T10:27:47.016800Z

it's not very important, just a fun project

Sigve 2021-06-29T10:32:31.018400Z

Then i guess comes down to which approach you find most fun:) I think instaparse is quite amazing once you grok it, but again i understand i can be a hassle go get into. On the other side, hand written parsers can also be painful to get correct

aengelberg 2021-06-29T19:50:09.020700Z

There isn’t really a single EBNF syntax specification or RFC, so every “EBNF grammar” you’ll find in the wild will have a slightly varied flavor of the syntax. Sometimes because a certain parser library chose a unique metasyntax, or sometimes because the grammar is meant to serve as documentation rather than compiled and executed.

aengelberg 2021-06-29T19:54:38.024300Z

Instaparse attempts to support most of the different flavors, which is why you can use either x? or [x] syntax for example

aengelberg 2021-06-29T19:55:03.024900Z

But sometimes a grammar or a different parser library will make a particularly unusual syntax choice, like using angle brackets in rule names

aengelberg 2021-06-29T19:56:52.026400Z

Or a grammar will make an implicit logical assumption that Instaparse has no way to act upon, like whitespace being parsed between tokens

aengelberg 2021-06-29T19:58:52.027800Z

The angle brackets are particularly unfortunate since Instaparse chose to use angle brackets for an instaparse-specific feature (hiding data from the output parse tree)

aengelberg 2021-06-29T20:03:36.028800Z

ABNF, on the other hand, seems to be a much more regulated metasyntax, so copy and pasting ABNF grammars into instaparse (using :input-format :abnf) tends to be safer