data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
2020-06-08T18:41:08.299100Z

Yup; For the record, I mentioned Darkstar above 🙂

2020-06-08T18:41:53.299300Z

And doesn't http://tech.ml.dataset have some basic vega utilities?

2020-06-08T18:45:58.301300Z

@nick.romer You can also take a look at semantic-csv, which gives you some nice utilities for working with large csv files lazily, and composes with clojure.data.csv and friends. https://github.com/metasoarous/semantic-csv

2020-06-08T18:46:30.302Z

I agree though that if you can fit in memory, using <http://tech.ml|tech.ml>.dataset is likely the way to go

2020-06-08T18:47:10.302100Z

Dude; That memory-meter shit is dope! Going to stow that one away in my toolbox.

chrisn 2020-06-08T19:27:49.302400Z

Haha, yeah totally. I really wish I had found that earlier as tracking down which object graph in a program is hogging ram is a serious problem sometimes 🙂.

niveauverleih 2020-06-08T20:42:19.306500Z

@vlaaad @chris441 @metasoarous Thank you all! For the time being I was able to load some interesting columns like this: (defn reducer [ac row] (conj ac (map #(nth (first (csv/read-csv row)) %) [16 3 17 18]))) (def master (with-open [rdr (io/reader data-local)] (reduce reducer [] (line-seq rdr)))) I will have a look at tech-ml-dataset.

1👍
chrisn 2020-06-08T22:47:32.306900Z

Sorry, yes I see that now. I just quickly scanned. Not any more, everything like that was moved to tech.viz. That is why I now have the simpledata repository that people can just and then use immediately; the tech system now requires 3 separate dependencies if you want to train a classifier and see a plot of the output.