I managed to basically completely replicate the Clojure regex functions while wrapping Stanford’s Semgrex DSL, since the Java classes underneath themselves mimic the Java regex classes. Pretty fun exercise! https://github.com/simongray/datalinguist/blob/master/src/dk/simongray/datalinguist/dependency.clj#L295-L349
Why? I mean what’s the reason to use a different regex engine?
Oh it’s not regex at all. I got confused. Sorry!
Yup, it's a DSL for matching against dependency grammar.
While regex matches characters in strings, this matches grammar and other kinds of language data.
Within a directed graph of nodes (words) related by grammatical relations.
That’s sounds pretty cool!
Are there examples of interesting usages?
Not really interesting usages, but there are a few examples in the rich comment block https://github.com/simongray/datalinguist/blob/master/src/dk/simongray/datalinguist/dependency.clj#L351-L376
I am thinking about making another DSL on top of it since I actually kinda dislike using text-based DSLs in Clojure 😆
since it is matching against nodes in a directed graph it should be possible to represent it using Datomic-style triples
I want to use it for building patterns to detect various Chinese sentence patterns
I actually made my own Java API for doing the same stuff years ago, not knowing CoreNLP included such a feature already… https://github.com/simongray/StatementAnnotator/tree/master/src/main/java/statements/patterns
💪
(and good morning)
Morning