data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
joelkuiper 2020-03-20T11:06:44.048200Z

Not sure how useful it will be, but it's at least a nice demonstration of a full Clojure+Clojurescript data science tool: we've loaded the CORD19 database into our Medical Search platform DOC Search and made it freely available (http://covid19.doctorevidence.com) (or https://search.doctorevidence.com/ with user/pass: covid19 / covid19) feel free to play around or provide much needed feedback ๐Ÿ˜›

๐Ÿ’ฏ 1
joelkuiper 2020-03-20T11:07:41.048700Z

I gave a talk about the platform at the Dutch Clojure Days last year https://www.youtube.com/watch?v=EM61rn9Gxl4 for a little bit more background on what the platform tries to achieve ๐Ÿ™‚

jumar 2020-03-20T11:53:51.049900Z

Thanks for sharing. I guess no code is public though

joelkuiper 2020-03-20T11:55:00.050100Z

Unfortunately not, we're looking into open sourcing some libraries from it, but the whole stack itself is maybe a bit much (and that takes a bit more corporate convincing probably ๐Ÿ˜‰ )

2020-03-20T15:38:21.050500Z

This is awesome! What did you use for the visualizations?

joelkuiper 2020-03-20T15:45:32.051Z

Thanks! Just vanilla d3 ๐Ÿ˜Š

zane 2020-03-20T17:42:39.051500Z

Anyone have experience with Apache Commons Math? https://commons.apache.org/proper/commons-math/

zane 2020-03-20T17:43:16.052Z

Or JavaScript's "stdlib"? https://github.com/stdlib-js/stdlib

2020-03-20T18:23:12.053Z

When it'll be the data-science meetups of scicloj ?

genmeblog 2020-03-20T18:41:12.053100Z

I wrapped certain parts into fastmath library.

genmeblog 2020-03-20T18:41:51.053300Z

Hackaton tomorrow for example.

genmeblog 2020-03-20T18:42:41.053500Z

See this link: https://scicloj.github.io/posts/2020-03-17-covid-19-hackathon-planning/

2020-03-20T18:45:08.053700Z

๐Ÿ˜ƒ thanks!

zane 2020-03-20T19:13:21.053900Z

How'd that go? Any issues?

kenny 2020-03-20T19:41:19.054100Z

We recently removed our commons-math wrapped functions from our math library. A lot of the code in commons-math3 is pretty gross and doesn't handle edge cases very well.

zane 2020-03-20T20:46:20.054600Z

I see! That's helpful, thanks. I'd love to hear more about that if you feel like sharing.

kenny 2020-03-20T20:54:11.054800Z

Most of our code uses generative testing. We found that many of apache's functions exhibit unwanted and inconsistent behavior when numbers get large or very small, and passed Infinity or NaN. As it turned out, we weren't really using much of commons-math because we'd often prefer to write the function in Clojure (both for speed and known, consistent behavior).

genmeblog 2020-03-20T21:12:36.055Z

I actually didn't find any corner cases, maybe I didn't use too much. Mostly I rely on optimization, randomness, partly statistics and distributions.

kenny 2020-03-20T21:13:42.056Z

Do you gen test? It finds just about everything.

genmeblog 2020-03-20T21:14:28.056200Z

I had issues with empirical and enumerated distributions only.

genmeblog 2020-03-20T21:15:10.056400Z

No, I don't. I rely on tests done by lib authors.

genmeblog 2020-03-20T21:15:38.056600Z

Curious what you've found.

kenny 2020-03-20T21:35:03.059400Z

I donโ€™t recall the specifics. https://github.com/Provisdom/math/commit/b59140d76501b7e5e56ca425ad444be900b304b5 was the main commit of us removing it. Not sure if thatโ€™d give any insight though. Itโ€™s been more of a death by a thousand cuts for us with apache-math. Iโ€™d wager if you start adding in gen tests to your existing code, youโ€™d quickly hit these sort of issues.