data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
Santiago 2020-11-14T06:55:56.094400Z

@daslu ☝️ the type of thing I felt clj would be the perfect candidate for

👍 1
zane 2020-11-15T00:59:16.095Z

I’d be interested in hearing more about what you mean. 🙂

Santiago 2020-11-16T07:20:01.096800Z

Most of my time as a data scientist these days is not spent thinking about which model should I use, or which packages or which language etc it’s mostly “how can I turn this into a DAG to make everything reproducible”. I think immutable data and a functional style of programming is ideal for creating not only the actual DAG pipeline, but also the individual steps — because reproducibility is a first-class concept 🙂 Clojure AFAIK doesn’t have something like this and this dagli that you posted @zane seems to go in a nice direction of being a single library to build and contain every step of an ML pipeline

Santiago 2020-11-16T09:29:32.097Z

we use http://dvc.org at work and I’m personally in love because it’s language agnostic. I have a DAG getting data, cleaning, transforming, splitting, training models, saving artefacts including plots and metrics and saves everything in S3. we have steps written in babashka, R and will probably add a python deployment script — all in one workflow

zane 2020-11-16T17:52:42.100500Z

Thanks for the response, @slack.jcpsantiago! http://dvc.org was new to me, and believe it or not I’ve been looking around for something like it!

zane 2020-11-16T17:53:59.100700Z

Other candidates: • https://github.com/Factual/drake (written in Clojure(!), deprecated) • https://www.digdag.io/https://airflow.apache.org/make

Santiago 2020-11-16T17:54:44.101400Z

give it a try, you won’t regret it 🙂 the team behind it is also super approachable. they also have another tool called https://cml.dev which you use as a github action. I’m not sponsored by them btw, I wish haha

zane 2020-11-16T17:55:09.101700Z

Based on the README it’s more or less exactly what I’ve been looking for.

zane 2020-11-16T17:55:25.101900Z

Will do, and thanks for the pointer to https://cml.dev!

Santiago 2020-11-16T18:40:45.102100Z

:thumbsup: anytime ;D