New to Instaparse and wrapping my head around grammars: how would I write a grammar that can do nested tag pairs like in xml: "<p><span>text</span></p>" => [:p [:span text]]
?
it’s possible to parse XML hierarchies into Clojure data, however I don’t think you can enforce that the tags must be matching.
You can enforce that manually with your own custom logic after the fact, just not as part of the parser.
So I would just trust, that tags are properly matched/paired/nested.
and take each closing tag as the next needed
caveat: I haven’t had my coffee yet, but the basic idea is that you say something like “a BLOCK element is a P or a DIV or a TABLE (etc), an INLINE element is a SPAN or a B or TEXT (etc),” and then say “a P element is the literal string ‘<P>’ or ‘<p>’ followed by zero or more INLINE elements, followed by the literal string ‘</P>’ or ‘</p>’.” And similarly with SPAN.
ah! thanks ... that should start it
The other caveat is that Instaparse is incredibly fun to work with and may be addictive. 😉
Confirmed! 😁