instaparse

If you're not trampolining your parser, why bother getting up in the morning?
be9 2016-11-04T15:58:11.000009Z

Hi, I need to parse strings like some text with spaces XXX 12345678 98765 43 222 11. Here are 3 parts: “some text with spaces”, “XXX 12345678”, and "98765 43 222 11”. While the last part is required, the “XXX 12345678” part is optional and will be considered as text by a naive greedy regex. How could I prevent this with Instaparse?

seylerius 2016-11-04T15:59:47.000010Z

@be9 Can you describe the requirements your text needs to meet?

seylerius 2016-11-04T16:00:04.000011Z

Or give a few more specific examples?

be9 2016-11-04T16:08:11.000012Z

@seylerius Ok, let’s simplify even more. Two examples: John Doe AGE 50, Dohn Joe. An input string contains a name and might contain this age thing. I want to parse those eventually to {:name “John Doe” :age 50} and {:name “Dohn Joe”}.

be9 2016-11-04T16:08:55.000014Z

First one should not be {:name “John Doe AGE 50”} 🙂

seylerius 2016-11-04T16:09:18.000015Z

Okay. This is a problem I've run into before.

be9 2016-11-04T16:09:32.000016Z

Names can be long and contain digits too

be9 2016-11-04T16:11:52.000017Z

John Doe AGE 50 AGE 50 would be preferrably parsed as {:name “John Doe AGE 50” :age 50}

seylerius 2016-11-04T16:12:07.000019Z

Basically what you need is to have a name token, token, and then a name+age token. You then parse for this: "name-age / name"

seylerius 2016-11-04T16:12:30.000020Z

The slash allows you to express a preference for one over the other.

seylerius 2016-11-04T16:13:04.000021Z

Basically, you're saying "if this string can match an age too, do that, otherwise it's just a name"

seylerius 2016-11-04T16:13:33.000022Z

I do this a lot in my rebuild of organum, if you want to take a look at the repo.

be9 2016-11-04T16:13:54.000023Z

oh, the slash. I see, thanks!

seylerius 2016-11-04T16:14:26.000024Z

Yep. The slash is for preferential parsing.

be9 2016-11-04T16:14:52.000025Z

👍 @seylerius, I guess that’s it 🙂