Will anyone on http://lobste.rs consider giving me an invite?
Anyone has a recommended Clojure library for simple text analysis in English?
What do you mean with simple text analysis? Part of speech? You can try stanford NLP. Here is a demo: https://corenlp.run/
@simongray has made a little wrapper lib for this
yup, it’s available at https://github.com/simongray/datalinguist, but currently requires you to use deps.edn since I have not packaged it as a JAR yet. Nevertheless, it’s probably still the most full-featured CoreNLP experience you will get in Clojure right now.
Another option is to use CoreNLP directly through interop, but I don’t recommend that… there’s a reason I’m trying to wrap it.
We use Standard NLP at work: https://covid-search.doctorevidence.com/
I was looking for that!
Google and github did not cooperate with me
oh wow the models are heavy Does not seem suitable for a small script?
Yeah, they're usually a couple hundred MBs apiece AFAIK. I think most language models produced through machine learning tend to be quite heavy and the memory requirements are usually pretty substantial too for most of the interesting things you wanna do.
How do you use NLP? Like, as a better full-text search or doing more interesting stuff like trying to extract information from texts?
Ouch, Standford and CoreNLP are GPL -- probably no go for us then 😞
Yup - it sucks
I used the https://github.com/facebookarchive/duckling_old on a personal project a couple of years ago when I was starting my Clojure journey. I enjoyed the experience. The https://github.com/facebook/duckling.
I just wanted a simple way to lint commit messages
Ah @ben.sless, maybe just roll your own then? Depending on how sophisticated your linting is… that maybe be easier than figuring out some NLP thingy.