Hey folks, over at Applied Science we're playing around with accessing :r-project: RData files from Clojure :clj: with a minimum of interop. We build a single-purpose library: https://github.com/appliedsciencestudio/rdata/ Before we polish it up for release, we'd like to get your feedback. :male-detective::skin-tone-3: 👷 We'd appreciate it if you try it out and let us know what you think. :clojure-berlin:
For those interested in R (especially from Clojure), there is a free remote conference tomorrow: http://dc2020.netlify.com/
Our friend Danjela is interested in Clojure and data science and is looking for a teammeate for the Rails Girls Summer of Code project. https://www.reddit.com/r/Clojure/comments/fpkz98/rails_girls_summer_of_code_teammate/ Do you happen to know anyone who may like to join Danjela?
Hi, do you have any Clojure data science talk suggestions? I have seen the two listed in Oz's README (one on Oz, one on Vega Lite). I know Clojure. Want to learn data science.
@hindol.adhya If you just want some basic intros, I have done one on Dragan's baby steps in data science and another on the basics of Oz https://www.youtube.com/playlist?list=PLpr9V-R8ZxiDUXIR2z8Y8wvhpoPyl0t_D
Do you have any math background @hindol.adhya?
How much math? If you mean probability and statistics, I know the basics like mean, mode and median. A little shaky on k-means, SVM and totally green on neural network etc.
OK; That's a good start. IMHO (not that my math background biases me or anything), math/prob/stats skills are really the bedrock of data science. So the more you can learn on that front the better.
I studied probability in school, and tutored basic stats (among other things), but this was my first introduction to higher level statistics and machine learning. What I really like about this book is that it endeavors to teach them together, seeing them as opposite sides of the same coin, which is %100 in my book: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf
I should mention, I am not trying to change careers or anything. I am interested, and will spend my own time learning.
One part I love is visualization. This I enjoy much more than exploratory data analysis.
I wouldn't say this book is the easiest to get, but take a look and see how you take to it. If you are having a hard time getting through, you can pull at the bits that are you giving you trouble from other resources.
Visualization is huge; A picture is worth a thousand words, right?
Good luck! Interested to see what other recommendations folks have!
One book I studied as part of coursework is https://nlp.stanford.edu/IR-book/information-retrieval-book.html But this does not go too much in depth. I have a CS major.
For probability, it is pretty hard to beat "The Probability Tutoring Book" https://smile.amazon.com/Probability-Tutoring-Book-Revised-Printing/dp/0780310519/ref=smi_www_rco2_go_smi_g8217842112?_encoding=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&ie=UTF8
This book only touched upon various clustering, supervised/unsupervised learning techniques.
Nothing related to modern machine learning like neural networks, deep learning etc.
The IR book does a surprisingly good job at introducing and motivating ML techniques, more so than many ML-specific books!
(Also, despite the hype, there is more to modern ML than just deep learning/NNs - e.g graphical models / Gaussian Processes / TDA to name just a few- and non-modern ML often works quite well 😉 )
^ 100% this! NN can do certain things really well, but it's often difficult to figure out what they're doing or why they're doing it. Good advise is to choose a model and approach based on the details of the situation, and not just grab the latest fad.
Using a NN when you have a reasonable and principled probabilistic model, taylored to the situation at hand, that can be interpretted, etc, is always the way to go if you have a choice.
This is great advice. Thank you so much.
I recommend:
1. The Book of Why for an introduction to causal inference (<http://bayes.cs.ucla.edu/WHY/>)
2. Richard McElreath lectures (<https://www.youtube.com/watch?v=4WVelCswXo4>) and his Statistical Rethinking book
3. Anything Tensorflow (books, documentation, examples) <http://tensorflow.org|tensorflow.org>
4. Dragan Djuric's books and software <https://dragan.rocks/>
Great list. I have seen some of Dragan's materials. They are great.