instaparse

If you're not trampolining your parser, why bother getting up in the morning?
2016-02-08T20:42:02.000004Z

Hi, very basic question probably not specific to instaparse. From this basic example : "S = N | (N ('+' N)+); N = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';" if I want to enforce that not all N are 0s, should this be done in the grammar definition or by adding some logic on processing the parsed result ? I suspect the latter, but just in case anyone knows other ways to enforce this restriction directly in the grammar, I'd like to know. TIA

aengelberg 2016-02-08T21:32:12.000006Z

Hi @wongiseng, I saw your question on gitter as well. Instaparse's job is to turn strings into meaningful data; any validation you want to do on that data probably should happen after the parse.

aengelberg 2016-02-08T21:33:11.000007Z

The only real way to have more sophisticated validation on an input is to use lookahead and negative lookahead.

aengelberg 2016-02-08T21:33:25.000008Z

Well, those are the only ways to do sophisticated validation within instaparse.

aengelberg 2016-02-08T21:34:17.000009Z

In this particular example you could use negative lookahead, e.g. S = !('0'*) (N | (N ('+' N)+));

socksy 2016-02-08T21:34:20.000010Z

this works, but it's ambiguous:

(def minimum-one-not-zero
  (insta/parser
    "EXP = N | S;
    S = (ZN '+')* N ('+' ZN)*;
    N =  '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
    ZN = '0' | N;"))

aengelberg 2016-02-08T21:34:35.000011Z

^ That would work as well

socksy 2016-02-08T21:35:02.000013Z

(tested)

aengelberg 2016-02-08T21:36:02.000014Z

The advantage to writing your own validation after the parse is that when the input is wrong, you can write your own error message to say whatever you want instead of instaparse's failure message which might not be as readable.

socksy 2016-02-08T21:36:34.000015Z

^definitely

Parse error at line 1, column 4:
0+0
   ^
Expected:
"+"

aengelberg 2016-02-08T21:36:57.000016Z

oops, my negative lookahead approach definitely wouldn't work because I totally didn't see the pluses in the input

aengelberg 2016-02-08T21:38:07.000017Z

Maybe

S = &(#".*[1-9]") (N | (N ('+' N)+);

aengelberg 2016-02-08T21:38:25.000018Z

e.g. "make sure there's some nonzero number somewhere, then parse as usual"

aengelberg 2016-02-08T21:39:09.000019Z

that's lookahead not negative lookahead

socksy 2016-02-08T21:40:53.000020Z

if errors aren't important, and the fact you might get the "wrong" evaluation (e.g. "1+0+1" could be [:EXP [:S [:N "1"] "+" [:ZN "0"] "+" [:ZN [:N "1"]]]] or [:EXP [:S [:ZN [:N"1"]] "+" [:ZN 0] "+" [:N 1]]]) is also unimportant (e.g. you eval N and ZN the same), then you should be fine with the ambiguous grammar

socksy 2016-02-08T21:41:32.000021Z

(instaparse gives you the former)

aengelberg 2016-02-08T21:45:04.000022Z

@socksy how about

S = N ('+' N)* | (N '+')* '0' ('+' ZN)*;

aengelberg 2016-02-08T21:45:27.000023Z

I'm just writing these off the top of my head, not evaluating them to be sure. I think that would be unambiguous though

aengelberg 2016-02-08T21:46:43.000024Z

hmm, that's definitely wrong :simple_smile:

aengelberg 2016-02-08T21:46:54.000025Z

not sure where that came from

aengelberg 2016-02-08T21:47:59.000026Z

Using lookahead would likely be the easiest path, since the grammar would be unambiguous and easy to understand

2016-02-08T21:57:54.000027Z

Cool, thanks for the explanations, I'll play a bit with look ahead, but eventually I guess i'll validate after the parse

2016-02-08T21:58:40.000028Z

The negative lookaheads makes the grammar hard to digest for me

2016-02-08T23:21:51.000029Z

For now I use @socksy's approach :simple_smile: https://github.com/wibisono/gnip-rule-validator-clj/blob/master/gnip-rule.bnf thanks a lot!

2016-02-08T23:26:16.000031Z

My actual problem was OR to have at least one positive term