data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
Ronny Li 2020-11-05T17:58:39.075Z

Hi everyone, what's your favorite library for working with tabular data (column-/row-slicing, joining, grouping, etc)? For context I am working with a lot of time-series data where I would like to join, group, and filter on dates.

jsa-aerial 2020-11-05T18:18:32.075700Z

Additionally, if you like a dplyr type API, there is [tablecloth](https://github.com/scicloj/tablecloth) which is very cool.

jsa-aerial 2020-11-05T18:23:58.076Z

@ronny463 this is a great walkthrough of TC with many examples from data.table and dplyr : https://scicloj.github.io/tablecloth/index.html#Introduction

Ronny Li 2020-11-05T18:28:52.076200Z

thank you @jsa-aerial! Yeah I checked out <http://tech.ml|tech.ml>.dataset and tablecloth and thought they looked interesting. I found it strange that most of the tablecloth features weren't already in dataset so it kind of turned me away from those libraries. How have you found your experience with dataset so far? I'll check out Zulip, thank you for the recommendation!

jsa-aerial 2020-11-05T18:34:23.076400Z

@ronny463 It is great - nothing else really compares. Most of the tablecloth features are already in TMD. TC is really mostly a thin layer that abstracts TMD into a dplyr like API. TMD is extremely fast and scalable: https://github.com/zero-one-group/geni/blob/develop/docs/simple_performance_benchmark.md#results

2020-11-05T18:43:46.076700Z

@ronny463 @jsa-aerial it seems like great timing for bringing up the time series aspect to the story. <http://tech.ml|tech.ml>.dataset does already have good support for time-typed columns, but additional layers for time-series indexing, processing, and analysis are still missing there, afaik. It would be great to learn from this use case and use it to push the stack forward and add some of the missing pieces (but as @jsa-aerial suggested, it may be better to bring that discussion to Zulip).

Ronny Li 2020-11-05T18:55:00.076900Z

great, thanks for the feedback everyone! I'll move the convo to Zulip 🙂

zane 2020-11-05T19:07:57.077100Z

@ronny463 You could also consider just using xsv for stuff like this. https://github.com/BurntSushi/xsv