data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
chrisn 2020-07-02T13:30:56.358800Z

I was honored to be a part of a panel discussion comparing machine learning across languages. Got some nice opportunities to plug the techascent stack, Neanderthal, Smile, and of course Clojure in general 🙂. https://twimlai.com/the-great-ml-language-un-debate/?utm_source=tw_linkedin&utm_medium=social&utm_campaign=advancing_data_sci

đź‘Ť 8
đź’Ż 8
3
2020-07-04T07:28:02.386Z

Great discussion @chris441!

chrisn 2020-07-04T15:25:26.386400Z

I really tried to, without being annoying, let people know there was a lot more here (in Clojure) than they thought. I was very, very happy to be on a Panel with Chris Lattner and would have enjoyed talking to just him quite a bit about some of my theories/ideas but the panel wasn't the place for that. I definitely had a chance to talk about clojisr though and this was the perfect forum 🙂.

2020-07-02T14:07:17.359100Z

🆒

Santiago 2020-07-02T20:03:33.364600Z

I’ve been looking at feature stores and feature engineering systems lately. Big tech has shared some solutions (some here http://featurestore.org/) but these sound very big and complex because they handle large amounts of data, streams etc. My use case doesn’t need something so high-throughput, but having a service that creates and curates features is still useful, and it does sound like a use case where clojure would shine. Is there actually such a project in clj that I’m not aware about?

respatialized 2020-07-02T20:58:40.369700Z

I've successfully sold my boss on Datomic for this exact purpose (+ also doing model metrics tracking), but we're not yet at the implementation stage for it. I think Datomic (or Datahike if you prefer open source) is particularly suited to use as a feature store because of its vastly more flexible data model and more powerful query engine. You're not constrained by the Hadoop-like architecture that so many of the off-the-shelf projects seem to have, so you can build the schema and data model that actually suits your workflow. Time travel on schemas (supported by Datomic) allows you to grow your feature schema organically rather than locking you in to a single data model.

respatialized 2020-07-02T21:04:19.371200Z

https://www.logicalclocks.com/blog/how-to-build-your-own-feature-store Go through this flowchart and look at how many of these decisions you don't have to make if you use something like Datahike or Crux.

Santiago 2020-07-02T21:10:51.373900Z

Feature store is one side of the equation, what about feature engineering? I haven’t used Datomic so far, but last time I checked it was a bit Clojure focused right? as in there’s no client for PHP (what my company uses), or was there REST interface?

respatialized 2020-07-02T21:15:55.377300Z

https://docs.datomic.com/cloud/analytics/analytics-concepts.html Datomic's REST API is deprecated, it looks like. If all you need is to pull the data you could use the PrestoDB connector and SQL. Getting it back in would be a little more tricky - you'd probably need to engineer something that passes between your front-end and the Datomic transactor.