Morning, fwiw, Iβve always thought about seagull management as drive by management.
moin
Morning
morning
@slipset it is, but messier and noisier (unless your drive by management includes shooting)
morningus
good october morning
morning!
moin
morn
first time I've used eduction and thought "ah, that feels like a good use"
(defn csv->nippy [in-file out-dir]
(with-open [reader (io/reader in-file)]
(run!
(fn [data]
(let [idx (-> data first :simulation)]
(nippy/freeze-to-file (str out-dir "simulated-transitions-" idx ".npy") data)))
(eduction
(drop 1)
(map #(zipmap header %))
(map scrub-transition-count)
(partition-by (fn [{:keys [simulation]}] (quot simulation 100)))
(csv/read-csv reader)))))
so the question is... Is this a good use?
and I think all the data in there will get GC'd
@otfrom I think there's no real benefit of using eduction vs transduce here probably
@otfrom an eduction is basically just an xform + a source, which delays running that xform over that coll, and offers the ability to compose with more xforms.
(deftype Eduction [xform coll]
...)
it's probably one of those things that you will need when you know you need it, in other cases yagni
eduction (and sequence) are essentially lazy tho IIUC
eduction calculates each time
sequence will cache the results of the calculation
or are you thinking I should replace run!
with transduce
?
no, eduction with transduce
(transduce (comp ...) (csv/read-csv ...))
an eduction is only useful when you want to pass it around, in this function you have everything you need already, there's no need to create a wrapper around that
you could also make csv/read-csv an IReducible thing, so you create even less garbage
ok... I had thought that eduction would only realise a portion of what was coming in whereas transduce would realise the whole collection which would then go to run!
an IReducible read-csv would be great π
an eduction will also run transduce when reduced
@otfrom that's not very much different from that blog for processing lines of text maybe?
yeah, I just need to get my head around it again
and understand why people keep saying that it isn't the right way to do it
(finding good examples everyone agrees on for doing that feels hard, unless that ETL blog is the right way)
your example above is right, it's just not necessary to use eduction, since that boils down to just transduce. It's like writing (+ (identity 1) (identity 2))
while you could also write (+ 1 2)
doing what I did above at least had the advantage of working, whereas before getting all 500MB of csv into a vector of maps and then passing that to nippy ran out of memory
@borkdude ok, from my reading around it felt like the difference between transduce
/`into` & sequence
/`eduction` was similar to the difference between []
and a seq
and the difference between eduction and sequence was that sequence would hold the results in memory while eduction would recalculate each time
and it feels like I've got the wrong end of the stick on some of those differences
user=> (into [] (comp (drop 2) (take 1)) (range))
[2]
This doesn't realize the entire range, does it?which is basically:
(transduce (comp (drop 1) (take 1)) conj (range))
anyway, if it works what you're doing, keep doing it :)
it's not wrong
user=> (transduce (comp (drop 10) (take 1)) (fn ([]) ([x]) ([x y] (prn y))) (range))
10
Note that this skips over 10 numbers, then takes 1 number, prints it and then quits.so you could do your side effect in the transducing function maybe, instead of first realizing it into a lazy seq
anyway, maybe not important
user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil
I want to grok transduce. It doesn't βclickβ yet for me. Anyone seen a tuturial about it that can be recommended?
@pez Have you seen https://clojure.org/reference/transducers?
morning
the problem with some docs is that you need to understand it before you can actually understand the documentation. The transducers might well fall in that catergory.
re: https://clojurians.slack.com/archives/CBJ5CGE0G/p1602760474478900 I was more thinking about how much of it was realised at any one time w/o the possibility of garbage collection. My understanding was that transduce would put all of the transduced things in memory whereas eduction would only have (1?) some things in memory at any one time
I don't think that's true
@pez The basic idea: What would be a more performant way of writing:
(->> [1 -10 11 -2] (filter pos?) (map inc))
You could squash filter and map into one function that runs over the seq:
(defn f [x] (when (pos? x) (inc x)))
and then do:
user=> (keep f [1 -10 11 -2])
(2 12)
Transducers basically give you the implementation of that idea for free
Why do you assume transduce holds everything in memory at once?
It's more or less like reduce
Eduction is built on top of transduce
maybe he means eager cos that's how transduce is advertised
yes, reduce is also eager. but that doesn't mean it will realize the entire input or hold everything in memory at once. Transducers know when to stop similar to how reduce knows to stop using a reduced value
I think reading the source might make more sense than speculate.
I mean that the result of the reduce will be held in memory all at once, whereas the eduction will only realise as much as has been asked for
so (take 10 (eduction (map identity) (range 100000))
would only realise the first 10 things, whereas (take 10 (transduce (map identity) my-conj (range 1000000))
would realise the whole result of processing the range and then the take would take from the fully realised thing.
correct.
but in your example you use run!
over the entire result, so the eduction is not relevant there?
perhaps it is run! I'm not understanding. I thought run! would only have the one element in memory at a time that it was trying to process (unless the collection or collection producing function realised more than one)
run! is effectively just reduce, but you're reducing your entire eduction right. you're not lazily doing anything with your eduction. so in this case transduce or eduction boil down to the same thing
Scary to share it here, but Iβve given a talk on them https://youtu.be/_4sgTq4_OjM
Still donβt use nor understand them :)
but reducing into a hash is going to take less memory than reducing into a seq of all the data. My understanding is that run! would not hold all of the seq in memory to do its work
but that if it was working on the result of transducing something into a vector then the whole vector would be in memory
yeah, you cannot lazily create a vector result.
but that's not an eduction/transducer problem?
not sure if I still follow :)
it's a fine use of eduction, you don't need the return value of transduce so eduction+run! is ok
I mean you can juggle around not returning anything with transduce, but it's more work
using eduction just to get a reducible for input somewhere else is ok
it's not just for "partial application" of xforms imho
I'm not sure if I'm explaining myself badly or if my massive gap in knowledge is tripping me up
Or both π
about "laziness" (not the right term here imho), the eduction will be pulled in value by value, then if the input is realized? or not is another matter
(run! prn
(eduction (map (fn [x]
(prn :x x)
(Thread/sleep 1000)
x))
(range 10)))
The problem I'm trying to solve is that I need to transform data from a CSV and write it out as partitioned nippy files without blowing up memory
I agree that laziness isn't quite right
it's a pull based thing
(tm)
But eduction is only going to realise values as they are pulled
yes
an eduction is just partial application of xforms over something
And run! isn't going to hold them in memory
no
afaik
Unless I put it in an atom or something
it's like going over an iterator, item by item
it's the same as if you're just running over a lazy seq, no difference there
value by value (sounds better)
kinda sorta, without the cost of a lazy seq
I mean wrt to holding in memory
could be db rows, rs.next
But transduce would produce the whole vector which then run! would operate on
transduce is a bit like reduce, you could throw-away the accumulation every-time, but then why use transduce in the first place (if you mean using transduce instead of run!)
@otfrom My point with using transduce was: you're running over with a side effect. you could do the side effect in transduce instead, saving you the realisation of an eduction. But as I also pointed out, it may not be so important. mpenet has repeated this.
you never really realize an eduction, it never materializes, it's really just (sort of) an iterator
ok, that's a good point yes. no garbage from the eduction.
user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil
the docstring probably gives a better description than me
I posted that above example to show the run! equivalent for transducers
but run! + eduction works equally well
eductions are awesome π
it's the new juxt
but actually useful
I should create a company with that name maybe
Are you disputing the usefulness of juxt....?
;)
it's muscle flexing in most cases imho!
I usually use it with keywords (map (juxt :field-a :field-b) [{:field-a 1 :field-b 2}])
yeah I prefer select-keys, but I get your point
not really actually, different use
but yes, there are good uses for it. It's just quite rate
rare*
I use juxt all the time, but then I need to create a lot of vector from maps of data to go into excel or csv files, so select-keys doesn't work for me
most of my work is in and out of csv or excel
@borkdude I see what you are getting at with using transduce there now
yeah, but has mpenet has pointed out, the overhead from the eduction might be small enough not to make this an issue
esp as run! is just a reduce with a proc according to the docs
yes it will be very efficient
same here: vectors from maps, for some reason I do this fairly regularly
I not only use juxt, I use (apply juxt vec-of-keys) b/c I'm a monster who does (into [] (map #(friendly-key-lookup %) vec-of-keys) as well
thx for having the patience to go through this with me. I feel I understand a lot more of what is going on. π
I remember a Clojure meetup in Amsterdam with the author of Midje doing a talk and somehow he needed matrix transposition. I just yelled: apply map vector. It's one of these things you just know ;)
it is indeed
and rarely need! I got to use it once, on a job interview π
got an offer for that one
I've used it a few times, but then transposing a matrix isn't entirely odd for me
especially not in the case of CSVs where you want to have a column instead of a row
Assum talk about transducers, there, @slipset.
Indeed