Hi, very basic question probably not specific to instaparse. From this basic example : "S = N | (N ('+' N)+); N = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';" if I want to enforce that not all N are 0s, should this be done in the grammar definition or by adding some logic on processing the parsed result ? I suspect the latter, but just in case anyone knows other ways to enforce this restriction directly in the grammar, I'd like to know. TIA
Hi @wongiseng, I saw your question on gitter as well. Instaparse's job is to turn strings into meaningful data; any validation you want to do on that data probably should happen after the parse.
The only real way to have more sophisticated validation on an input is to use lookahead and negative lookahead.
Well, those are the only ways to do sophisticated validation within instaparse.
In this particular example you could use negative lookahead, e.g. S = !('0'*) (N | (N ('+' N)+));
this works, but it's ambiguous:
(def minimum-one-not-zero
(insta/parser
"EXP = N | S;
S = (ZN '+')* N ('+' ZN)*;
N = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
ZN = '0' | N;"))
^ That would work as well
(tested)
The advantage to writing your own validation after the parse is that when the input is wrong, you can write your own error message to say whatever you want instead of instaparse's failure message which might not be as readable.
^definitely
Parse error at line 1, column 4:
0+0
^
Expected:
"+"
oops, my negative lookahead approach definitely wouldn't work because I totally didn't see the pluses in the input
Maybe
S = &(#".*[1-9]") (N | (N ('+' N)+);
e.g. "make sure there's some nonzero number somewhere, then parse as usual"
that's lookahead not negative lookahead
if errors aren't important, and the fact you might get the "wrong" evaluation (e.g. "1+0+1" could be [:EXP [:S [:N "1"] "+" [:ZN "0"] "+" [:ZN [:N "1"]]]]
or [:EXP [:S [:ZN [:N"1"]] "+" [:ZN 0] "+" [:N 1]]]
) is also unimportant (e.g. you eval N and ZN the same), then you should be fine with the ambiguous grammar
(instaparse gives you the former)
I'm just writing these off the top of my head, not evaluating them to be sure. I think that would be unambiguous though
hmm, that's definitely wrong :simple_smile:
not sure where that came from
Using lookahead would likely be the easiest path, since the grammar would be unambiguous and easy to understand
Cool, thanks for the explanations, I'll play a bit with look ahead, but eventually I guess i'll validate after the parse
The negative lookaheads makes the grammar hard to digest for me
For now I use @socksy's approach :simple_smile: https://github.com/wibisono/gnip-rule-validator-clj/blob/master/gnip-rule.bnf thanks a lot!
My actual problem was OR to have at least one positive term