@daslu ☝️ the type of thing I felt clj would be the perfect candidate for
I’d be interested in hearing more about what you mean. 🙂
Most of my time as a data scientist these days is not spent thinking about which model should I use, or which packages or which language etc it’s mostly “how can I turn this into a DAG to make everything reproducible”. I think immutable data and a functional style of programming is ideal for creating not only the actual DAG pipeline, but also the individual steps — because reproducibility is a first-class concept 🙂 Clojure AFAIK doesn’t have something like this and this dagli that you posted @zane seems to go in a nice direction of being a single library to build and contain every step of an ML pipeline
we use http://dvc.org at work and I’m personally in love because it’s language agnostic. I have a DAG getting data, cleaning, transforming, splitting, training models, saving artefacts including plots and metrics and saves everything in S3. we have steps written in babashka, R and will probably add a python deployment script — all in one workflow
Thanks for the response, @slack.jcpsantiago! http://dvc.org was new to me, and believe it or not I’ve been looking around for something like it!
Other candidates:
• https://github.com/Factual/drake (written in Clojure(!), deprecated)
• https://www.digdag.io/
• https://airflow.apache.org/
• make
give it a try, you won’t regret it 🙂 the team behind it is also super approachable. they also have another tool called https://cml.dev which you use as a github action. I’m not sponsored by them btw, I wish haha
Based on the README
it’s more or less exactly what I’ve been looking for.
Will do, and thanks for the pointer to https://cml.dev!
:thumbsup: anytime ;D