data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
2020-04-20T19:22:29.120800Z

The http://tech.ml.dataset library is becoming an amazing thing. In some aspects, it already does better than what one usually expects from their dataframe libraries in the R/Python worlds. We were having some discussion of the new surprises, with @chris441 and joinr providing some beautiful explanations of the internals. One thing that I'm excited about is that, alongside the typical dataframe-like API, we can also keep doing our usual sequence-of-maps processing that we love, but now with a much more efficient underlying data structure. See here: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/tech.2Eml.2Edataset/near/194609499 and here: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/datasets.20of.20strange.20shapes

2020-04-20T19:51:24.124700Z

Yeah; One of the things I love about it is that it does embrace the sequence of maps model over map of vectors, like pandas does, which has always irked me, and which I noticed in teaching intro bioinformatics courses has warped a generation of pythonists into thinking the latter is the "natural" way to represent data. Fine to store data that way for performance when your interface protects you from messing up the ordering of your vectors. But when your interfaces steer you towards storing data this way outside of the dataframe, you really quickly start getting into trouble. Fundamentally, the notion of identity and value become complected with order, which is... not good. 😬

👍 4
2020-04-20T20:05:07.125100Z

@daslu is there a lot of ML discussion on Zulip?

2020-04-20T20:14:13.129500Z

@albaker some of the data-science discussions take place at Zulip too. It offers a different tempo of discussion, somewhat complementary to the quick slack chat. People can create discussions under topics and come back to them on their pace. Some library authors in the data-science field use the Zulip platform to have ongoing discussions with their users. In the last few weeks, the attention was mostly to data visualization, interop and data processing, and not so much ML itself. I imagine in the near future it will be about ML again, when the attention of development will return to that topic. More about the relevant Zulip chat streams -- here: https://scicloj.github.io/pages/chat_streams/

🚀 1