Hi, I need to parse strings like some text with spaces XXX 12345678 98765 43 222 11
. Here are 3 parts: “some text with spaces”, “XXX 12345678”, and "98765 43 222 11”. While the last part is required, the “XXX 12345678” part is optional and will be considered as text by a naive greedy regex. How could I prevent this with Instaparse?
@be9 Can you describe the requirements your text needs to meet?
Or give a few more specific examples?
@seylerius Ok, let’s simplify even more. Two examples: John Doe AGE 50
, Dohn Joe
. An input string contains a name and might contain this age thing. I want to parse those eventually to {:name “John Doe” :age 50}
and {:name “Dohn Joe”}
.
First one should not be {:name “John Doe AGE 50”}
🙂
Okay. This is a problem I've run into before.
Names can be long and contain digits too
John Doe AGE 50 AGE 50
would be preferrably parsed as {:name “John Doe AGE 50” :age 50}
Basically what you need is to have a name token, token, and then a name+age token. You then parse for this: "name-age / name"
The slash allows you to express a preference for one over the other.
Basically, you're saying "if this string can match an age too, do that, otherwise it's just a name"
I do this a lot in my rebuild of organum, if you want to take a look at the repo.
oh, the slash. I see, thanks!
Yep. The slash is for preferential parsing.
👍 @seylerius, I guess that’s it 🙂