data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
daveliepmann 2020-03-27T06:54:01.002200Z

Hey folks, over at Applied Science we're playing around with accessing :r-project: RData files from Clojure :clj: with a minimum of interop. We build a single-purpose library: https://github.com/appliedsciencestudio/rdata/ Before we polish it up for release, we'd like to get your feedback. :male-detective::skin-tone-3: 👷 We'd appreciate it if you try it out and let us know what you think. :clojure-berlin:

👏 5
2020-03-27T20:55:47.022100Z

For those interested in R (especially from Clojure), there is a free remote conference tomorrow: http://dc2020.netlify.com/

2020-03-27T09:16:49.002800Z

Our friend Danjela is interested in Clojure and data science and is looking for a teammeate for the Rails Girls Summer of Code project. https://www.reddit.com/r/Clojure/comments/fpkz98/rails_girls_summer_of_code_teammate/ Do you happen to know anyone who may like to join Danjela?

hindol 2020-03-27T14:41:53.005100Z

Hi, do you have any Clojure data science talk suggestions? I have seen the two listed in Oz's README (one on Oz, one on Vega Lite). I know Clojure. Want to learn data science.

🎉 1
practicalli-john 2020-03-29T13:56:46.028200Z

@hindol.adhya If you just want some basic intros, I have done one on Dragan's baby steps in data science and another on the basics of Oz https://www.youtube.com/playlist?list=PLpr9V-R8ZxiDUXIR2z8Y8wvhpoPyl0t_D

😎 1
2020-03-27T17:00:01.006700Z

Do you have any math background @hindol.adhya?

hindol 2020-03-27T17:02:14.009400Z

How much math? If you mean probability and statistics, I know the basics like mean, mode and median. A little shaky on k-means, SVM and totally green on neural network etc.

2020-03-27T17:07:35.012Z

OK; That's a good start. IMHO (not that my math background biases me or anything), math/prob/stats skills are really the bedrock of data science. So the more you can learn on that front the better.

2020-03-27T17:10:49.015300Z

I studied probability in school, and tutored basic stats (among other things), but this was my first introduction to higher level statistics and machine learning. What I really like about this book is that it endeavors to teach them together, seeing them as opposite sides of the same coin, which is %100 in my book: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf

👍 2
hindol 2020-03-27T17:10:52.015500Z

I should mention, I am not trying to change careers or anything. I am interested, and will spend my own time learning.

👍 1
hindol 2020-03-27T17:11:40.016300Z

One part I love is visualization. This I enjoy much more than exploratory data analysis.

2020-03-27T17:12:01.016600Z

I wouldn't say this book is the easiest to get, but take a look and see how you take to it. If you are having a hard time getting through, you can pull at the bits that are you giving you trouble from other resources.

2020-03-27T17:12:50.017300Z

Visualization is huge; A picture is worth a thousand words, right?

2020-03-27T17:13:05.017700Z

Good luck! Interested to see what other recommendations folks have!

hindol 2020-03-27T17:14:31.017800Z

One book I studied as part of coursework is https://nlp.stanford.edu/IR-book/information-retrieval-book.html But this does not go too much in depth. I have a CS major.

hindol 2020-03-27T17:15:59.018700Z

This book only touched upon various clustering, supervised/unsupervised learning techniques.

hindol 2020-03-27T17:16:28.018900Z

Nothing related to modern machine learning like neural networks, deep learning etc.

val_waeselynck 2020-03-27T17:56:20.019900Z

The IR book does a surprisingly good job at introducing and motivating ML techniques, more so than many ML-specific books!

val_waeselynck 2020-03-27T17:59:57.020200Z

(Also, despite the hype, there is more to modern ML than just deep learning/NNs - e.g graphical models / Gaussian Processes / TDA to name just a few- and non-modern ML often works quite well 😉 )

➕ 1
2020-03-27T18:01:34.020500Z

^ 100% this! NN can do certain things really well, but it's often difficult to figure out what they're doing or why they're doing it. Good advise is to choose a model and approach based on the details of the situation, and not just grab the latest fad.

2020-03-27T18:02:41.020700Z

Using a NN when you have a reasonable and principled probabilistic model, taylored to the situation at hand, that can be interpretted, etc, is always the way to go if you have a choice.

hindol 2020-03-27T18:07:54.020900Z

This is great advice. Thank you so much.

2020-03-27T18:55:51.021100Z

I recommend:

1.  The Book of Why for an introduction to causal inference (<http://bayes.cs.ucla.edu/WHY/>)  

2. Richard McElreath lectures (<https://www.youtube.com/watch?v=4WVelCswXo4>) and his Statistical Rethinking book

3. Anything Tensorflow (books, documentation, examples) <http://tensorflow.org|tensorflow.org>

4. Dragan Djuric's books and software <https://dragan.rocks/>

hindol 2020-03-27T19:13:55.021500Z

Great list. I have seen some of Dragan's materials. They are great.

😎 1