I have a question about greedy parsing and ambiguous grammars, I have the following grammar:
(insta/parser
"layout = (align | delim | repeat)+
repeat = <'{'> (align | delim)+ <'}'>
delim = (fill | padding)+
fill = 'F'
padding = #'[^\\[\\]{}fF]*'
align = <'['> ('L' | 'C' | 'R' | 'V') <']'>"
:string-ci true))
where the only relevant pieces are the layout = (align | delim | repeat)+
and delim = (fill | padding)+
pieces. Assume we get a ‘fill’ followed by a ‘padding’, the parser can here choose between [:delim [:fill "F"]] [:delim [:padding "xxx"]]
and [:delim [:fill "F"] [:padding "xxx"]]
. This is not really instaparse specific, but rather to do with bnf’s and disambiguating repeating elements. I would like to force the second interpretation where the delim = (fill | padding)+
is greedy and collects all contiguous delim elements into a list before proceeding. Any ideas much appreciated@mbjarland you could change (align | delim)+
to delim? (align delim?)*
, that would force the parser to alternate between delim and align, effectively making the delim rule greedy
and this is why I love the clojure community, thank you for the fast reply! : )
are we talking about the first rule layout = ...
? In that case I would have to cook up a repeating pattern with 3 elements
so what I’m doing here is defining a column layout
Actually I was talking about the repeat rule. But you made a good point that we'd also have to apply a similar solution to the layout rule to get a similar effect
where the repeat says “if the user comes in with more columns than we have defined, use the repeating group to fill out the layout”
Column layout? Not sure what you mean
never mind, that is really application related and not related to the grammar
thought actually explaining what the thing does might clarify, but I think it just muddles the waters even more : )
I also thought about the /
ordered parsing syntax, but I can not really see how to apply that to these repeating patterns
layout = delim? ((align | repeat) delim?)*
repeat = <'{'> delim? (align delim?)* <'}'>
: )
Since align
and repeat
both require nonzero characters to be present in order for the rule to succeed, wedging them in the repeat rule enforces some regularity in the repetition and thus unambiguates the parsing
that works, I tried it with insta/parses
and my standard examples and it comes out with a single interpretation
I will have to meditate on the exact mechanics
Yeah, ordered choice doesn't really help here unless delim was competing with some other rule and you wanted to establish the priority. But in this case delim is just competing with itself, in a way.
exactly, just a question on what level in the bnf the repetition happens
Yeah
Fortunately that's easily solved by restructuring the combinators, no fancy PEG combinators necessary
this is actually a higher level pattern, I will make sure to grok this properly for the next time I run into an ambiguous repetition
thanks a ton!
No problem
@mbjarland I just thought of another way you could have solved it: ignore the above changes I proposed and instead change delim
rule to:
delim = (fill | padding)+ !delim
Pretty sure my first solution would be more performant, albeit more complex
what does the !delim
do?