instaparse

If you're not trampolining your parser, why bother getting up in the morning?
mbjarland 2018-01-08T08:51:43.000113Z

I have a question about greedy parsing and ambiguous grammars, I have the following grammar:

(insta/parser
    "layout   = (align | delim | repeat)+
     repeat   = <'{'> (align | delim)+ <'}'>
     delim    = (fill | padding)+
     fill     = 'F'
     padding  = #'[^\\[\\]{}fF]*'
     align    = <'['> ('L' | 'C' | 'R' | 'V') <']'>"
    :string-ci true))
where the only relevant pieces are the layout = (align | delim | repeat)+ and delim = (fill | padding)+ pieces. Assume we get a ‘fill’ followed by a ‘padding’, the parser can here choose between [:delim [:fill "F"]] [:delim [:padding "xxx"]] and [:delim [:fill "F"] [:padding "xxx"]]. This is not really instaparse specific, but rather to do with bnf’s and disambiguating repeating elements. I would like to force the second interpretation where the delim = (fill | padding)+ is greedy and collects all contiguous delim elements into a list before proceeding. Any ideas much appreciated

aengelberg 2018-01-08T09:01:27.000210Z

@mbjarland you could change (align | delim)+ to delim? (align delim?)*, that would force the parser to alternate between delim and align, effectively making the delim rule greedy

mbjarland 2018-01-08T09:03:25.000310Z

and this is why I love the clojure community, thank you for the fast reply! : ) are we talking about the first rule layout = ...? In that case I would have to cook up a repeating pattern with 3 elements

mbjarland 2018-01-08T09:04:09.000327Z

so what I’m doing here is defining a column layout

aengelberg 2018-01-08T09:04:28.000326Z

Actually I was talking about the repeat rule. But you made a good point that we'd also have to apply a similar solution to the layout rule to get a similar effect

mbjarland 2018-01-08T09:04:36.000164Z

where the repeat says “if the user comes in with more columns than we have defined, use the repeating group to fill out the layout”

aengelberg 2018-01-08T09:05:09.000138Z

Column layout? Not sure what you mean

mbjarland 2018-01-08T09:05:27.000109Z

never mind, that is really application related and not related to the grammar

mbjarland 2018-01-08T09:06:07.000158Z

thought actually explaining what the thing does might clarify, but I think it just muddles the waters even more : )

mbjarland 2018-01-08T09:06:41.000257Z

I also thought about the / ordered parsing syntax, but I can not really see how to apply that to these repeating patterns

aengelberg 2018-01-08T09:07:47.000169Z

layout = delim? ((align | repeat) delim?)*
repeat = <'{'> delim? (align delim?)* <'}'>

mbjarland 2018-01-08T09:08:50.000052Z

: )

aengelberg 2018-01-08T09:09:11.000254Z

Since align and repeat both require nonzero characters to be present in order for the rule to succeed, wedging them in the repeat rule enforces some regularity in the repetition and thus unambiguates the parsing

mbjarland 2018-01-08T09:09:59.000175Z

that works, I tried it with insta/parses and my standard examples and it comes out with a single interpretation

mbjarland 2018-01-08T09:10:10.000087Z

I will have to meditate on the exact mechanics

aengelberg 2018-01-08T09:10:28.000196Z

Yeah, ordered choice doesn't really help here unless delim was competing with some other rule and you wanted to establish the priority. But in this case delim is just competing with itself, in a way.

mbjarland 2018-01-08T09:10:58.000032Z

exactly, just a question on what level in the bnf the repetition happens

aengelberg 2018-01-08T09:11:06.000081Z

Yeah

aengelberg 2018-01-08T09:11:27.000211Z

Fortunately that's easily solved by restructuring the combinators, no fancy PEG combinators necessary

mbjarland 2018-01-08T09:11:33.000048Z

this is actually a higher level pattern, I will make sure to grok this properly for the next time I run into an ambiguous repetition

mbjarland 2018-01-08T09:11:50.000159Z

thanks a ton!

aengelberg 2018-01-08T09:12:05.000211Z

No problem

aengelberg 2018-01-08T09:18:08.000172Z

@mbjarland I just thought of another way you could have solved it: ignore the above changes I proposed and instead change delim rule to:

delim = (fill | padding)+ !delim

aengelberg 2018-01-08T09:18:41.000229Z

Pretty sure my first solution would be more performant, albeit more complex

mbjarland 2018-01-08T11:33:02.000082Z

what does the !delim do?