data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
val_waeselynck 2020-05-07T12:31:24.191900Z

Not sure if this has been pointed out already, but it occurs to me that ClojureScript might be a killer feature for ML in Clojure. I'm constructing a classification pipeline for Reddit comments, and to do so I need to build a large dataset by labelling sample comments. I've quickly made a specialized UI in ClojureScript for that. Now I'm able to label a few thousands examples per day. I doubt that generic annotation tools would have made such a high throughput possible, and I have a hard time imagining making such a UI in so little time in Python.

đź‘Ť 2
vlaaad 2020-05-07T12:33:38.192200Z

can you show an example?

val_waeselynck 2020-05-07T12:58:21.192500Z

@vlaaad https://i.imgur.com/4VSazww.gif

val_waeselynck 2020-05-07T12:59:25.193Z

(Sorry the content's in French, that's the data I'm working on).

val_waeselynck 2020-05-07T13:00:17.193900Z

I've got keyword shortcuts for labelling actions. I can usually label at that speed because there's not much reading required.

vlaaad 2020-05-07T13:27:04.194600Z

very interesting!

vlaaad 2020-05-07T13:29:30.195100Z

but your main data processing is still in clojure?

vlaaad 2020-05-07T13:30:10.195900Z

so you need to have “server” as a “feedback receiver” and a this is a “client”?

val_waeselynck 2020-05-07T14:35:11.197500Z

Yes exactly

val_waeselynck 2020-05-07T14:36:20.198500Z

And in my case, it's not like I lose much by requiring a client-server communication, because the data processing often occurs on a remote machine anyway.

vlaaad 2020-05-07T14:48:15.198700Z

Ah, I see

vlaaad 2020-05-07T14:49:41.200100Z

Just wanted to point out that cljfx exists 🙂 — it has declarative UI in java process, so no client-server communication is necessary if data processing happens on your machine

val_waeselynck 2020-05-07T15:09:02.201400Z

I'm well aware 🙂 in my use case, another argument in favour of a browser-based UI is that Reddit content is designed to be viewed in the browser, with hyperlinks etc.