Good Morning!
Good morning
Morning
moin
morning
Morning
good AM me
morning.
@ordnungswidrig cool. Glad it worked. What stack did you use in the end?
http://Tech.ml and vega
Anyone here got a good suggestion for something like nippy but that writes out records in a file rather than just a big take it or leave it data structure?
file per record?
I've done something before with baldr and record separators, but that felt a bit janky
file per record would overwhelm the OS file handles I think. There are about 2-10 million records
I'd really be into seeing how you did that if you can share the repo
I like the speed of nippy, and the compression is pretty good too, but I lose a lot of compression by needing to split things up and I lose a lot of file efficiency by having each file be a single vector of records that gets read in
probably not the performance you are looking for, but this is the main reason for ednl https://github.com/lambdaisland/edn-lines
thx 🙂
let me collect this into a gist
😄
https://gist.github.com/ordnungswidrig/a28ad3939c71b8a554e2307b84ebc530
it might run out of the box
😛
as this is often the eduction channel, I've been looking at @ben.hammond's blog post here: https://juxt.pro/blog/ontheflycollections-with-reducible and thinking that you don't need to have a reducible for the directory of files, you just need a reducible for each file type, you can then have a vector of eduction of those reducibles which would give you all your short circuiting/ reduced? functionality if you did something like
(eduction ;; changed from sequence thanks to Ben Hammond's advice
cat
[(eduction mappify-record (reducible-type-1 file-1))
(eduction mappfiy-record (redcucible-type-1 file-2))])
you can replace sequence with eduction depending on whether or not you want to have the results in memory or recalculate them each time (from what I understand)
(errors of misunderstanding of the blog post are mine)
I think this simplifies the chaining-reducible
bit. I think
I'm sure it is as bug free as all code is
the real magic happening in cat
looks like transit, based on fressian, might be the sweet spot? Looks like you can read and write individual objects from a stream. https://cognitect.github.io/transit-clj/#cognitect.transit/read
and there is a reducible friendly wrapper already https://gitlab.com/pjstadig/reducibles
An eduction of a reducible might not implement ISeq
, at which point things start breaking
Ah, TIL.