I'm playing around with instaparse and for kicks and giggles I wrote a parser to parse some log files I have laying around
is there a way to define a fixed width "anything goes" string in instaparse
i.e. if I just want to gobble up a few characters into a tree node and don't care about the content there, is that possible?
Fixed width? Maybe #'.{N}'
?
right, yes regex does the job but is probably not very performant for just "take substring of 10 from where you are"
ok, so regex is the way to go for this in instaparse?
I think regex is the most performant way to grab a not-static set of characters
: ) well I should probably mention that I think instaparse is excellent and by far the best parser lib I've run across....so my intent was not to come here and critique it
Thanks! And no worries, I was just answering your question from the perspective of what instaparse actually supports
that being said...if I parse 2G of log files (without instaparse) and compare the simplest regex match with (subs line 10 20)
, regex performace doesn't exactly shine
But I see your point that if it theoretically supported a dedicated "substring" combinator, that would be faster
anyway, figured I would ask, but regex does indeed do the job and perhaps what I'm doing with this parser is a bit of an edge case
Maybe we should support "custom combinators" so people like you with special use cases can write their own more performant specialized versions
that would be awesome
you would have to add some kind of extension point to the instaparse bnf syntax I guess
Maybe, or we don't allow extensions to the EBNF syntax and just let people make custom combinators for the combinator syntax
ah, ok, hadn't grokked the combinators syntax until now
right now I'm considering writing my own mini language for this log parsing, I could use instaparse to parse that language and then do custom, optimized parsing based on the format specification tree coming out from instaparse...so still useful
hmm, how come I need to double escape the not-inclusive rule in the following grammmar:
(def my-p
(instaparse.core/parser
"spec = (field-spec <' '?>)+
field-spec = <'['>name ' '* <':'> ' '* (width | not-inclusive | not-exclusive | rest)<']'>
name = #'[^:]+'
width = <'{'> #'\\d+' <'}'>
not-inclusive = <'\\\\'> #'.'
not-exclusive = <'/'> #'.'
rest = '*'
"))
you mean the '\\\\'
?
yeah
shouldn't two have been enough?
because 1) you need to tell Clojure that you aren't escaping a character within a string 2) you need to tell Instaparse that you aren't escaping a character within a string combinator
ok, missed point 2 there