Guten Morgen!
Guten Morgen!
Morgen!
moin moin
@borkdude I should probably get over shelling out feeling like cheating
In babashka it's natural, on the JVM it feels like cheating :)
shelling out is the whole point of babashka 😄
moin moin
does it then tie you to running JVM on a specific platform?
Yes, of course: it then depends on that executable being available in your external environment.
I'm writing a package manager that should help solve this problem: https://github.com/borkdude/glam
morning
mogge
@borkdude this seems to have worked reasonably well
(defn ->nippy [dirname data]
(run!
(fn [data]
(let [idx (-> data first :simulation)]
(nippy/freeze-to-file (str dirname "simulated-transitions-" idx ".npy") data)))
(partition-by (fn [{:keys [simulation]}] (quot simulation 100)) data)))
(defn nippy->data [dirname]
(into []
(comp
(filter (fn [f] (re-find #"npy$" (.getName f))))
(x/sort-by (fn [f] (.getName f)))
(mapcat nippy/thaw-from-file))
(.listFiles (io/file dirname))))
the data is at least pretty easy to partition
nice!
I ran out of memory (heap space) when trying to do it all as one vector
reading in takes 78 seconds from nippy compared to 394 seconds converting from csv (with no compression)
and what about zip or gzip?
I guess nippy is nice to use since it can deserialize to EDN directly
I've not had a go with zip or gzip, tbh, I'm pretty happy I can dump my code that was doing the type conversions from csv
and nippy uses LZ4 for compression which is pretty fast and compact
@borkdude any reason why you think I should not use nippy? Other than compatibility with other languages (which would probably drive me to arrow and http://tech.ml.dataset really)
Don't know. Btw, there's also a CLI for nippy (which can also be used as a pod from babashka: https://github.com/justone/brisk)
So then you could use it from other languages as well, by shelling out, or from bb scripts
cool
arrow is really good for going to R (which we use) or Python (which we sometimes use but not often)
the benefit of using zip or gzip is that it's natively supported in many stdlibs (of Java as well)
that is true
I'm sort of enjoying the faster save and load times and not having more conversion code to maintain 🙂
I will probably regret this one day
but then babashka will save me right??!?!?! right?!?!?! 😉
I hope! You could also write your own company branded GraalVM-based CLI tool around your data and then call this from Python, R, whatever.
If I did that then I'd have to use even more interrobangs :interrobanghugs:
or if it's seconds anyway, just use a Clojure on the JVM script. That has better perf than bb
true enough
it isn't like I'm going to notice that latency
it is quite nice working with an annoying size of data again
so many of my datasets have been small lately
Morning