data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
2020-04-30T21:46:00.167400Z

maybe a github repo that folks could commit/push links to current ds projects might be in order? or any other mechanism. A long, long, long time ago clojure toolbox was just this kind of thing. I'm not sure if it still is updated (https://www.clojure-toolbox.com/)

vlaaad 2020-05-01T12:45:27.181700Z

Oh hey, I'm building such a thing I guess, it's work in progress

vlaaad 2020-05-01T12:48:38.181900Z

I sometimes have very wide data structures that are clunky to look at if you just pprint it. This is sort of the next thing I'm working on β€” adding support for custom visualizations, such as tables with limited height, that make it much more convenient for looking at the current level, for example

πŸ‘ 1
phronmophobic 2020-05-01T16:09:58.182100Z

for what I’m interested in. there are two aspects: 1. exploration - given some data, can I navigate through it and look at all its parts 2. summary - given some data, what can I know about it without looking at every single detail it has to offer I think exploration is important, but more straightforward. The problem of summarizing medium size, heterogeneous data is much more interesting to me. We have lots of ways to summarize large sequences of numbers (mean, median, mode, histograms, etc). We have fewer tools for summarizing something like a json blob that comes from an API call.

πŸ‘ 1
1
teodorlu 2020-05-02T14:30:40.183200Z

@smith.adriane I've been using malli.provider/provide for exploring JSON structure. See #malli

teodorlu 2020-05-02T14:31:56.183500Z

Caveat: you'll have to figure out where your sequences are, then you can run provide on those.

teodorlu 2020-05-02T15:09:19.183700Z

@nickstares0 - perhaps you're interested in this πŸ™‚

2020-05-04T15:47:06.184900Z

@smith.adriane your points (1) and (2) sound like automl to me, e.g. https://arxiv.org/pdf/1810.13306.pdf

phronmophobic 2020-05-04T17:07:43.185100Z

@aaelony, I’m not sure I follow. It looks like automl is meant to take people out of the loop rather than a tool for people to quickly form an intuition about some data

2020-05-04T19:42:55.185400Z

depends on the intuition, I suppose.

phronmophobic 2020-05-04T19:46:00.185600Z

using something like automl sounds cool. I just have zero experience with machine learning so I’m not sure how I would use it

2020-05-04T21:37:13.185800Z

it's an active topic with aspirational goals riffing off of exploration and summary as you had mentioned (esp for predictive models)

phronmophobic 2020-04-30T21:47:24.167600Z

it’s still being updated. I had a repo added within the last month

πŸ‘ 2
2020-04-30T21:48:03.167900Z

I don't see DS categories there though. Perhaps that's one idea

2020-04-30T21:49:08.168100Z

e.g. Python Integration lists pickler but no libpython-clj

phronmophobic 2020-04-30T21:49:37.168300Z

those seem like good additions

2020-04-30T21:50:17.168500Z

I don't see http://tech.ml.dataset either

phronmophobic 2020-04-30T21:50:25.168700Z

I was able to get my library added by making a pull request, https://github.com/weavejester/clojure-toolbox.com/pulls

2020-04-30T21:50:56.168900Z

perhaps the word can get out to the lib creators to register there if desired

phronmophobic 2020-04-30T21:51:05.169100Z

I think anyone can make a pull request

2020-04-30T21:51:06.169300Z

Oz, Saite, etc..

phronmophobic 2020-04-30T21:51:28.169500Z

Oz is under data exploration

2020-04-30T21:52:05.169700Z

I did put together this a long time ago; Idea was supposed to be the same as far as PRing libraries: https://github.com/metasoarous/clojure-datascience

2020-04-30T21:52:07.170Z

well, it used to be the case that a few libs fell under >1 category

2020-04-30T21:52:44.170200Z

Yeah, it would be nice to have a little database of these things, which can be submitted and approved dynamically

2020-04-30T21:52:54.170400Z

maybe that's cleaned up, but still possible to have multiple uses

2020-04-30T21:52:56.170600Z

Searchable UI; blahblah

2020-04-30T21:53:14.170800Z

yep, exactly that, metasoarous

2020-04-30T21:53:50.171Z

BERT enabled search would be awesome πŸ˜‰

πŸ™‚ 1
2020-04-30T22:32:58.171300Z

@aaelony @metasoarous @smith.adriane Hi. :) We maintain some relevant lists at the scicloj website. https://github.com/scicloj/scicloj/blob/master/resources/templates/md/pages/libraries.md https://github.com/scicloj/scicloj/blob/master/resources/templates/md/pages/reading.md https://github.com/scicloj/scicloj/blob/master/resources/templates/md/pages/chat_streams.md If anyone wants to have push permissions -- please tell. Our current method is that people can push changes to a draft branch, and one person (the "editor") merges them to master and tidies up. Any thoughts?

🦜 1
2020-04-30T22:37:11.171600Z

@aaelony @metasoarous Searchable UI is a great idea. We have been thinking for some time about migrating the website from Cryogen to something hiccup-driven such as Oz. Then (I think) it will be more fun to create some interactive views.

2020-04-30T22:37:59.171800Z

Yes! That would be awesome. I've wanted to do that; Build viz tools for library discovery/evaluation.

πŸ‘ 1
2020-04-30T22:39:20.172300Z

On a related thread, @teodorlu is exploring some ideas of knowledge organization using Roam, that should be exportable to some comfortable data format.

2020-04-30T22:40:02.172500Z

πŸ™‚

2020-04-30T22:40:13.172700Z

Roam seems to be everywhere these days doesn't it...

2020-04-30T22:40:34.172900Z

Your tweet about it was inspiring.

2020-04-30T22:41:50.173100Z

Regarding the scicloj website -- it gets too little attention these days. If anyone wants to take over and make it fresh and beautiful, that would be more than welcome of course. πŸ™ƒ

phronmophobic 2020-04-30T22:41:56.173300Z

fyi, the links to vega and vega lite in libraries.md are broken

2020-04-30T22:42:19.173500Z

thanks!

phronmophobic 2020-04-30T22:42:59.173700Z

is there a library for exploring a giant edn blob? like if you want to just open up a visualizer for your app state or a response from an API?

2020-04-30T22:43:37.173900Z

curious too!

phronmophobic 2020-04-30T22:44:56.174100Z

i guess giant is the wrong adjective. anyway, there’s lots of tools for wide data sets (eg. this db table a million rows), but I’m interested in a generic data viewer for data sets that are 5-20 levels deep and maybe up to 100s of levels wide under certain branches

phronmophobic 2020-04-30T22:45:20.174300Z

REBL is in the same concept space, but I’m wondering if there are others

2020-04-30T22:49:49.174600Z

Maybe @vlaaad may comment about Reveal, as a REBL-related UI. https://github.com/vlaaad/reveal

phronmophobic 2020-04-30T22:50:09.175100Z

oh yea, i’ve been meaning to look into that