clojure-europe

For people in Europe... or elsewhere... UGT https://indieweb.org/Universal_Greeting_Time
javahippie 2020-10-13T07:11:58.401700Z

Guten Morgen!

ordnungswidrig 2020-10-13T07:21:16.403800Z

Guten Morgen!

borkdude 2020-10-13T07:39:33.404Z

Morgen!

2020-10-13T07:52:51.404200Z

moin moin

2020-10-13T07:53:14.404600Z

@borkdude I should probably get over shelling out feeling like cheating

borkdude 2020-10-13T07:58:31.405300Z

In babashka it's natural, on the JVM it feels like cheating :)

2020-10-13T08:00:36.405600Z

shelling out is the whole point of babashka 😄

plexus 2020-10-13T08:13:52.405800Z

moin moin

2020-10-13T09:02:10.405900Z

does it then tie you to running JVM on a specific platform?

borkdude 2020-10-13T09:03:27.406100Z

Yes, of course: it then depends on that executable being available in your external environment.

borkdude 2020-10-13T09:03:37.406300Z

I'm writing a package manager that should help solve this problem: https://github.com/borkdude/glam

1👍
raymcdermott 2020-10-13T09:14:18.406800Z

morning

thomas 2020-10-13T09:38:15.407Z

mogge

2020-10-13T10:14:38.407400Z

@borkdude this seems to have worked reasonably well

(defn ->nippy [dirname data]
  (run!
   (fn [data]
     (let [idx (-> data first :simulation)]
       (nippy/freeze-to-file (str dirname "simulated-transitions-" idx ".npy") data)))
   (partition-by (fn [{:keys [simulation]}] (quot simulation 100)) data)))


(defn nippy->data [dirname]
  (into []
        (comp
         (filter (fn [f] (re-find #"npy$" (.getName f))))
         (x/sort-by (fn [f] (.getName f)))
         (mapcat nippy/thaw-from-file))
        (.listFiles (io/file dirname))))

2020-10-13T10:14:48.407600Z

the data is at least pretty easy to partition

borkdude 2020-10-13T10:15:09.408100Z

nice!

2020-10-13T10:15:15.408300Z

I ran out of memory (heap space) when trying to do it all as one vector

2020-10-13T10:15:41.408900Z

reading in takes 78 seconds from nippy compared to 394 seconds converting from csv (with no compression)

borkdude 2020-10-13T10:16:02.409200Z

and what about zip or gzip?

borkdude 2020-10-13T10:16:27.409500Z

I guess nippy is nice to use since it can deserialize to EDN directly

2020-10-13T10:20:24.410100Z

I've not had a go with zip or gzip, tbh, I'm pretty happy I can dump my code that was doing the type conversions from csv

2020-10-13T10:20:54.410500Z

and nippy uses LZ4 for compression which is pretty fast and compact

2020-10-13T10:21:59.411300Z

@borkdude any reason why you think I should not use nippy? Other than compatibility with other languages (which would probably drive me to arrow and http://tech.ml.dataset really)

borkdude 2020-10-13T10:22:56.411800Z

Don't know. Btw, there's also a CLI for nippy (which can also be used as a pod from babashka: https://github.com/justone/brisk)

borkdude 2020-10-13T10:23:49.412400Z

So then you could use it from other languages as well, by shelling out, or from bb scripts

2020-10-13T10:23:58.412700Z

cool

2020-10-13T10:24:16.413300Z

arrow is really good for going to R (which we use) or Python (which we sometimes use but not often)

borkdude 2020-10-13T10:25:47.414200Z

the benefit of using zip or gzip is that it's natively supported in many stdlibs (of Java as well)

2020-10-13T10:27:03.414500Z

that is true

2020-10-13T10:27:25.415Z

I'm sort of enjoying the faster save and load times and not having more conversion code to maintain 🙂

2020-10-13T10:27:32.415200Z

I will probably regret this one day

2020-10-13T10:27:49.415800Z

but then babashka will save me right??!?!?! right?!?!?! 😉

borkdude 2020-10-13T10:28:22.416400Z

I hope! You could also write your own company branded GraalVM-based CLI tool around your data and then call this from Python, R, whatever.

2020-10-13T10:29:03.416900Z

If I did that then I'd have to use even more interrobangs :interrobanghugs:

borkdude 2020-10-13T10:29:38.417200Z

or if it's seconds anyway, just use a Clojure on the JVM script. That has better perf than bb

2020-10-13T10:29:59.417400Z

true enough

2020-10-13T10:30:07.417800Z

it isn't like I'm going to notice that latency

2020-10-13T10:30:30.418100Z

it is quite nice working with an annoying size of data again

1😂
2020-10-13T10:30:37.418400Z

so many of my datasets have been small lately

2020-10-13T10:35:32.418900Z

👋 @misurovic.milos

1👋
jasonbell 2020-10-13T16:14:10.419300Z

Morning

2020-10-13T16:18:35.419700Z

👋 @jasonbell

1👋