instaparse

If you're not trampolining your parser, why bother getting up in the morning?
ska 2016-04-12T09:06:36.000015Z

What kind of combination are you thinking about, @conaw ?

ska 2016-04-12T09:10:04.000016Z

Oh, and regarding your string question, I did something similar with finding regexps in a query language which would be enclosed by slashes and allowed backslash-escaped slashed inside. The regexp for this was so weird, I completely forgot, how it worked, but here it is:

REGEXP = <'/'> #'(?:.(?!(?<![\\\\])/))+.?' <'/'>
(the grammar is defined in a Clojure string, thus the massive escaping)

conaw 2016-04-12T09:10:55.000017Z

not sure yet to be honest — I’d like to be doing POS tagging, and tokenizing, but really enjoying instaparse and curious if anyone has used it in conjunction with something like opennlp

ska 2016-04-12T09:13:00.000018Z

I once did a workshop on Clojure with very basic NLP examples (it was at a faculty for computational linguistics), but I did not combine it with any existing NLP libraries. Here at work, the NLP stuff is mostly self-written as much of it predates the open source libs. And we do not (yet?) use Clojure in that area.

ska 2016-04-12T09:15:19.000019Z

Hm, looks like I never polished that workshop to put it online somewhere. Sorry.

ska 2016-04-12T09:16:35.000020Z

But you may be interested in the instaparse talk here: https://github.com/ska2342/clojure-talks/blob/master/instaparse/de.skamphausen.instaparse/src/de/skamphausen/instaparse.clj

ska 2016-04-12T09:16:56.000022Z

(enough boasting now; please excuse the self-plugging)

conaw 2016-04-12T09:25:51.000023Z

Not boasting at all, I appreciate the link.

conaw 2016-04-12T09:39:24.000024Z

Another thing — Is there an idiomatic way to get the matched portion of a string for a given portion of a parse into the final transformed clojure data

conaw 2016-04-12T09:39:56.000025Z

I’m trying to parse the same text multiple times iteratively — passing the result to a different more granular parser based on the first

conaw 2016-04-12T09:40:16.000026Z

basically I’m trying to split the text up using a parse

conaw 2016-04-12T09:44:52.000027Z

spans looks like

ska 2016-04-12T11:14:02.000028Z

There is a :partial option but it only returns the parse tree as far as it could be parsed. Maybe the total mode would help? Can't say. Sorry.

ska 2016-04-12T14:34:07.000029Z

@conaw, I just found the span function which takes a parse tree (result of parsing) and returns start and end index into the string. So, you could first parse partially and then as your input string for the covered substring.

ska 2016-04-12T14:37:38.000030Z

Like this:

(let [s "abcd"
               g "Q='a' 'b'"
               p (i/parser g)
               t (p s :partial true)]
               (apply subs
                             (into [s] (i/span t))))

ska 2016-04-12T14:39:32.000034Z

(sorry for the broken indentation)