Hello all.
I’m starting to learn parsing and EBNF and I am struggling to remove the ambiguity from my parser. I have constructed a simple example to demonstrate the problem I am having.
The following parser tags text marked as emphasises like ***emphasis***
(def remove-ambiguity
(insta/parser
"S = (em / char)+ | epsilon
em = <'*' '*'> char* <'*' '*'>
<char> = #'.'")
Although with an input such as **em** **em**
there are many possible parse results:
([:S [:em "e" "m" "*" "*" " " "*" "*" "e" "m"]]
[:S "*" "*" "e" "m" "*" "*" " " "*" "*" "e" "m" "*" "*"]
[:S [:em "e" "m" "*" "*" " "] "e" "m" "*" "*"]
[:S "*" "*" "e" "m" "*" "*" " " [:em "e" "m"]]
[:S "*" "*" "e" "m" [:em " " "*" "*" "e" "m"]]
[:S "*" "*" "e" "m" [:em " "] "e" "m" "*" "*"]
[:S [:em "e" "m"] " " "*" "*" "e" "m" "*" "*"]
[:S [:em "e" "m"] " " [:em "e" "m"]] <-- This is the one I want
;; This makes sense since there are a few ways you can match up the asterisks to match the rule. However I only ever want to allow results like this `[:em "e" "m"] " " [:em "e" "m"]]
It’s almost like I want it to greedily take the first match possible and then ignore all others. But I have no idea how to express this. Any help would be greatly appreciated 😄.your grammar says '' is both the start of an em sequence, and two chars, and that is the ambiguity
Is there a way I can force the correct behavior? I always want it to be the first found pair
How do I specify that a char is any character or sequence of characters except **