instaparse

If you're not trampolining your parser, why bother getting up in the morning?
mbjarland 2017-12-14T16:31:28.000597Z

I'm playing around with instaparse and for kicks and giggles I wrote a parser to parse some log files I have laying around

mbjarland 2017-12-14T16:31:50.000334Z

is there a way to define a fixed width "anything goes" string in instaparse

mbjarland 2017-12-14T16:32:33.000731Z

i.e. if I just want to gobble up a few characters into a tree node and don't care about the content there, is that possible?

aengelberg 2017-12-14T16:32:36.000605Z

Fixed width? Maybe #'.{N}'?

mbjarland 2017-12-14T16:33:09.000338Z

right, yes regex does the job but is probably not very performant for just "take substring of 10 from where you are"

mbjarland 2017-12-14T16:34:30.000018Z

ok, so regex is the way to go for this in instaparse?

aengelberg 2017-12-14T16:35:01.000532Z

I think regex is the most performant way to grab a not-static set of characters

mbjarland 2017-12-14T16:37:08.000382Z

: ) well I should probably mention that I think instaparse is excellent and by far the best parser lib I've run across....so my intent was not to come here and critique it

aengelberg 2017-12-14T16:38:00.000011Z

Thanks! And no worries, I was just answering your question from the perspective of what instaparse actually supports

mbjarland 2017-12-14T16:38:23.000194Z

that being said...if I parse 2G of log files (without instaparse) and compare the simplest regex match with (subs line 10 20), regex performace doesn't exactly shine

aengelberg 2017-12-14T16:38:32.000419Z

But I see your point that if it theoretically supported a dedicated "substring" combinator, that would be faster

mbjarland 2017-12-14T16:39:59.000897Z

anyway, figured I would ask, but regex does indeed do the job and perhaps what I'm doing with this parser is a bit of an edge case

aengelberg 2017-12-14T16:40:25.000280Z

Maybe we should support "custom combinators" so people like you with special use cases can write their own more performant specialized versions

mbjarland 2017-12-14T16:40:42.000280Z

that would be awesome

mbjarland 2017-12-14T16:42:47.000926Z

you would have to add some kind of extension point to the instaparse bnf syntax I guess

aengelberg 2017-12-14T16:47:05.000180Z

Maybe, or we don't allow extensions to the EBNF syntax and just let people make custom combinators for the combinator syntax

mbjarland 2017-12-14T16:50:49.000614Z

ah, ok, hadn't grokked the combinators syntax until now

mbjarland 2017-12-14T16:56:23.000245Z

right now I'm considering writing my own mini language for this log parsing, I could use instaparse to parse that language and then do custom, optimized parsing based on the format specification tree coming out from instaparse...so still useful

mbjarland 2017-12-14T17:21:35.000419Z

hmm, how come I need to double escape the not-inclusive rule in the following grammmar:

(def my-p 
  (instaparse.core/parser 
    "spec = (field-spec <' '?>)+
     field-spec = <'['>name ' '* <':'> ' '* (width | not-inclusive | not-exclusive | rest)<']'>
     name = #'[^:]+'
     width = <'{'> #'\\d+' <'}'>
     not-inclusive = <'\\\\'> #'.'
     not-exclusive = <'/'> #'.'
     rest = '*'    
    "))

aengelberg 2017-12-14T17:22:25.000423Z

you mean the '\\\\'?

mbjarland 2017-12-14T17:22:27.000648Z

yeah

mbjarland 2017-12-14T17:22:41.000042Z

shouldn't two have been enough?

aengelberg 2017-12-14T17:23:09.000479Z

because 1) you need to tell Clojure that you aren't escaping a character within a string 2) you need to tell Instaparse that you aren't escaping a character within a string combinator

mbjarland 2017-12-14T17:23:34.000533Z

ok, missed point 2 there