https://github.com/techascent/tech.ml.dataset has some great new features and capabilities. How often have you seen a Clojure system that soundly beats C, Julia, Python, Spark, and R systems in a https://github.com/zero-one-group/geni-performance-benchmark? We have gone further down the https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.html by adding statistical operations, called colloquially https://datasketches.apache.org/, that give you memory efficient and accurate probabilistic estimates for some statistical operations including algorithms such as https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.apache-data-sketch.html#var-prob-set-cardinality and https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.html#var-prob-quantile. So, if you want to process GB's of data in minutes or seconds on commodity hardware, check us out: https://github.com/techascent/tech.ml.dataset
Played with http://tech.ml.dataset last week and I was really impressed. I showed it to the Python people in my org, they didn’t expect that sort of perf 😉
I've been really happy with the balance between speed and API sweetness for the stuff I've been doing. Elapsed time and memory is great compared to my transducer work
This feedback is really great and very encouraging. The API sweetness mostly all @tsulej.
Thank you. We've put a lot of effort to make it happen.
This is great work. I used it on nextjournal to get my some graphs for the covid-19 pandemic for my local counties. I had to fiddle around a little but I think <http://tech.ml|tech.ml>.dataset
and nextjournal and vega are an awesome toolset.
https://nextjournal.com/ordnungswidrig/covid-19-in-neu-ulm-experimental
I had no idea! Thanks for letting me know 🙂
Wow