instaparse

If you're not trampolining your parser, why bother getting up in the morning?
Zaymon 2020-12-08T04:31:42.048800Z

Hello all. I’m starting to learn parsing and EBNF and I am struggling to remove the ambiguity from my parser. I have constructed a simple example to demonstrate the problem I am having. The following parser tags text marked as emphasises like ***emphasis***

(def remove-ambiguity
  (insta/parser
   "S = (em / char)+ | epsilon
    em = <'*' '*'> char* <'*' '*'>
    <char> = #'.'")
Although with an input such as **em** **em** there are many possible parse results:
([:S [:em "e" "m" "*" "*" " " "*" "*" "e" "m"]]
 [:S "*" "*" "e" "m" "*" "*" " " "*" "*" "e" "m" "*" "*"]
 [:S [:em "e" "m" "*" "*" " "] "e" "m" "*" "*"]
 [:S "*" "*" "e" "m" "*" "*" " " [:em "e" "m"]]
 [:S "*" "*" "e" "m" [:em " " "*" "*" "e" "m"]]
 [:S "*" "*" "e" "m" [:em " "] "e" "m" "*" "*"]
 [:S [:em "e" "m"] " " "*" "*" "e" "m" "*" "*"]
 [:S [:em "e" "m"] " " [:em "e" "m"]] <-- This is the one I want

;; This makes sense since there are a few ways you can match up the asterisks to match the rule. However I only ever want to allow results like this `[:em "e" "m"] " " [:em "e" "m"]]
It’s almost like I want it to greedily take the first match possible and then ignore all others. But I have no idea how to express this. Any help would be greatly appreciated 😄.

2020-12-08T17:48:03.050100Z

your grammar says '' is both the start of an em sequence, and two chars, and that is the ambiguity

Zaymon 2020-12-08T23:16:35.051200Z

Is there a way I can force the correct behavior? I always want it to be the first found pair

Zaymon 2020-12-08T23:53:05.051400Z

How do I specify that a char is any character or sequence of characters except **