clojure-europe

For people in Europe... or elsewhere... UGT https://indieweb.org/Universal_Greeting_Time
slipset 2020-10-15T05:11:34.461400Z

Morning, fwiw, I’ve always thought about seagull management as drive by management.

ordnungswidrig 2020-10-15T07:09:24.461600Z

moin

dominicm 2020-10-15T07:32:47.461800Z

Morning

2020-10-15T07:47:08.462Z

morning

2020-10-15T07:47:26.462500Z

@slipset it is, but messier and noisier (unless your drive by management includes shooting)

πŸ˜‚ 1
agigao 2020-10-15T08:16:07.462800Z

morningus

raymcdermott 2020-10-15T09:30:44.463400Z

good october morning

synthomat 2020-10-15T09:35:28.463800Z

morning!

plexus 2020-10-15T10:00:31.464Z

moin

borkdude 2020-10-15T10:02:10.464200Z

morn

2020-10-15T10:46:01.464700Z

first time I've used eduction and thought "ah, that feels like a good use"

(defn csv->nippy [in-file out-dir]
  (with-open [reader (io/reader in-file)]
    (run!
     (fn [data]
       (let [idx (-> data first :simulation)]
         (nippy/freeze-to-file (str out-dir "simulated-transitions-" idx ".npy") data)))
     (eduction
      (drop 1)
      (map #(zipmap header %))
      (map scrub-transition-count)
      (partition-by (fn [{:keys [simulation]}] (quot simulation 100)))
      (csv/read-csv reader)))))

2020-10-15T10:46:06.464900Z

so the question is... Is this a good use?

2020-10-15T10:47:44.465200Z

and I think all the data in there will get GC'd

borkdude 2020-10-15T10:56:43.466400Z

@otfrom I think there's no real benefit of using eduction vs transduce here probably

borkdude 2020-10-15T10:59:15.467200Z

@otfrom an eduction is basically just an xform + a source, which delays running that xform over that coll, and offers the ability to compose with more xforms.

(deftype Eduction [xform coll]
  ...)

borkdude 2020-10-15T11:03:29.467600Z

it's probably one of those things that you will need when you know you need it, in other cases yagni

2020-10-15T11:03:54.468100Z

eduction (and sequence) are essentially lazy tho IIUC

2020-10-15T11:04:00.468300Z

eduction calculates each time

2020-10-15T11:04:11.468600Z

sequence will cache the results of the calculation

2020-10-15T11:04:44.469200Z

or are you thinking I should replace run! with transduce?

borkdude 2020-10-15T11:04:50.469400Z

no, eduction with transduce

borkdude 2020-10-15T11:05:14.469800Z

(transduce (comp ...) (csv/read-csv ...))

borkdude 2020-10-15T11:06:06.470400Z

an eduction is only useful when you want to pass it around, in this function you have everything you need already, there's no need to create a wrapper around that

borkdude 2020-10-15T11:07:20.471300Z

you could also make csv/read-csv an IReducible thing, so you create even less garbage

2020-10-15T11:07:42.471700Z

ok... I had thought that eduction would only realise a portion of what was coming in whereas transduce would realise the whole collection which would then go to run!

2020-10-15T11:08:14.472300Z

an IReducible read-csv would be great πŸ™‚

borkdude 2020-10-15T11:08:25.472600Z

an eduction will also run transduce when reduced

borkdude 2020-10-15T11:09:21.473300Z

@otfrom that's not very much different from that blog for processing lines of text maybe?

2020-10-15T11:09:40.473700Z

yeah, I just need to get my head around it again

2020-10-15T11:09:57.474200Z

and understand why people keep saying that it isn't the right way to do it

2020-10-15T11:10:14.474600Z

(finding good examples everyone agrees on for doing that feels hard, unless that ETL blog is the right way)

borkdude 2020-10-15T11:11:14.476100Z

your example above is right, it's just not necessary to use eduction, since that boils down to just transduce. It's like writing (+ (identity 1) (identity 2)) while you could also write (+ 1 2)

2020-10-15T11:11:28.476400Z

doing what I did above at least had the advantage of working, whereas before getting all 500MB of csv into a vector of maps and then passing that to nippy ran out of memory

2020-10-15T11:13:11.477600Z

@borkdude ok, from my reading around it felt like the difference between transduce/`into` & sequence/`eduction` was similar to the difference between [] and a seq

2020-10-15T11:13:38.478200Z

and the difference between eduction and sequence was that sequence would hold the results in memory while eduction would recalculate each time

2020-10-15T11:13:51.478600Z

and it feels like I've got the wrong end of the stick on some of those differences

borkdude 2020-10-15T11:14:34.478900Z

user=> (into [] (comp (drop 2) (take 1)) (range))
[2]
This doesn't realize the entire range, does it?

borkdude 2020-10-15T11:15:34.479200Z

which is basically:

(transduce (comp (drop 1) (take 1)) conj (range))

borkdude 2020-10-15T11:16:18.479500Z

anyway, if it works what you're doing, keep doing it :)

borkdude 2020-10-15T11:16:25.479700Z

it's not wrong

borkdude 2020-10-15T11:18:42.480600Z

user=> (transduce (comp (drop 10) (take 1)) (fn ([]) ([x]) ([x y] (prn y))) (range))
10
Note that this skips over 10 numbers, then takes 1 number, prints it and then quits.

borkdude 2020-10-15T11:19:43.481100Z

so you could do your side effect in the transducing function maybe, instead of first realizing it into a lazy seq

borkdude 2020-10-15T11:19:56.481500Z

anyway, maybe not important

borkdude 2020-10-15T11:22:47.481700Z

user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil

pez 2020-10-15T12:00:06.482800Z

I want to grok transduce. It doesn't ”click” yet for me. Anyone seen a tuturial about it that can be recommended?

borkdude 2020-10-15T12:10:35.483Z

@pez Have you seen https://clojure.org/reference/transducers?

thomas 2020-10-15T12:15:14.483400Z

morning

thomas 2020-10-15T12:16:05.484400Z

the problem with some docs is that you need to understand it before you can actually understand the documentation. The transducers might well fall in that catergory.

❀️ 1
2020-10-15T12:26:16.486Z

re: https://clojurians.slack.com/archives/CBJ5CGE0G/p1602760474478900 I was more thinking about how much of it was realised at any one time w/o the possibility of garbage collection. My understanding was that transduce would put all of the transduced things in memory whereas eduction would only have (1?) some things in memory at any one time

borkdude 2020-10-15T12:27:32.486800Z

I don't think that's true

borkdude 2020-10-15T12:28:14.487200Z

@pez The basic idea: What would be a more performant way of writing:

(->> [1 -10 11 -2] (filter pos?) (map inc))

borkdude 2020-10-15T12:28:57.487800Z

You could squash filter and map into one function that runs over the seq:

(defn f [x] (when (pos? x) (inc x)))
and then do:
user=> (keep f [1 -10 11 -2])
(2 12)

borkdude 2020-10-15T12:29:24.488600Z

Transducers basically give you the implementation of that idea for free

borkdude 2020-10-15T12:30:40.488700Z

Why do you assume transduce holds everything in memory at once?

borkdude 2020-10-15T12:30:59.488900Z

It's more or less like reduce

borkdude 2020-10-15T12:31:19.489100Z

Eduction is built on top of transduce

raymcdermott 2020-10-15T12:31:47.489300Z

maybe he means eager cos that's how transduce is advertised

borkdude 2020-10-15T12:33:01.489500Z

yes, reduce is also eager. but that doesn't mean it will realize the entire input or hold everything in memory at once. Transducers know when to stop similar to how reduce knows to stop using a reduced value

borkdude 2020-10-15T12:33:46.489800Z

I think reading the source might make more sense than speculate.

2020-10-15T12:43:25.490100Z

I mean that the result of the reduce will be held in memory all at once, whereas the eduction will only realise as much as has been asked for

2020-10-15T12:45:24.490300Z

so (take 10 (eduction (map identity) (range 100000)) would only realise the first 10 things, whereas (take 10 (transduce (map identity) my-conj (range 1000000)) would realise the whole result of processing the range and then the take would take from the fully realised thing.

borkdude 2020-10-15T12:47:03.490500Z

correct.

borkdude 2020-10-15T12:48:02.490700Z

but in your example you use run! over the entire result, so the eduction is not relevant there?

2020-10-15T13:02:28.491Z

perhaps it is run! I'm not understanding. I thought run! would only have the one element in memory at a time that it was trying to process (unless the collection or collection producing function realised more than one)

borkdude 2020-10-15T13:05:04.491200Z

run! is effectively just reduce, but you're reducing your entire eduction right. you're not lazily doing anything with your eduction. so in this case transduce or eduction boil down to the same thing

slipset 2020-10-15T13:32:47.492200Z

Scary to share it here, but I’ve given a talk on them https://youtu.be/_4sgTq4_OjM

πŸ‘ 1
slipset 2020-10-15T13:34:01.492800Z

Still don’t use nor understand them :)

2020-10-15T13:46:30.492900Z

but reducing into a hash is going to take less memory than reducing into a seq of all the data. My understanding is that run! would not hold all of the seq in memory to do its work

2020-10-15T13:47:01.493100Z

but that if it was working on the result of transducing something into a vector then the whole vector would be in memory

borkdude 2020-10-15T13:52:38.493300Z

yeah, you cannot lazily create a vector result.

borkdude 2020-10-15T13:52:47.493500Z

but that's not an eduction/transducer problem?

borkdude 2020-10-15T13:52:55.493700Z

not sure if I still follow :)

mpenet 2020-10-15T13:57:34.494600Z

it's a fine use of eduction, you don't need the return value of transduce so eduction+run! is ok

mpenet 2020-10-15T13:57:55.495Z

I mean you can juggle around not returning anything with transduce, but it's more work

mpenet 2020-10-15T13:58:13.495300Z

using eduction just to get a reducible for input somewhere else is ok

πŸ‘ 1
mpenet 2020-10-15T13:58:36.495800Z

it's not just for "partial application" of xforms imho

2020-10-15T14:03:43.496Z

I'm not sure if I'm explaining myself badly or if my massive gap in knowledge is tripping me up

2020-10-15T14:03:50.496200Z

Or both πŸ˜‰

mpenet 2020-10-15T14:04:50.496400Z

about "laziness" (not the right term here imho), the eduction will be pulled in value by value, then if the input is realized? or not is another matter

mpenet 2020-10-15T14:05:06.496600Z

(run! prn
      (eduction (map (fn [x]
                       (prn :x x)
                       (Thread/sleep 1000)
                       x))
                (range 10)))

2020-10-15T14:05:08.496800Z

The problem I'm trying to solve is that I need to transform data from a CSV and write it out as partitioned nippy files without blowing up memory

2020-10-15T14:05:44.497Z

I agree that laziness isn't quite right

mpenet 2020-10-15T14:06:11.497200Z

it's a pull based thing

mpenet 2020-10-15T14:06:32.497400Z

(tm)

2020-10-15T14:06:49.497600Z

But eduction is only going to realise values as they are pulled

mpenet 2020-10-15T14:06:54.497800Z

yes

mpenet 2020-10-15T14:07:12.498Z

an eduction is just partial application of xforms over something

2020-10-15T14:07:25.498200Z

And run! isn't going to hold them in memory

mpenet 2020-10-15T14:07:38.498400Z

no

mpenet 2020-10-15T14:07:44.498600Z

afaik

2020-10-15T14:07:45.498800Z

Unless I put it in an atom or something

mpenet 2020-10-15T14:07:55.499Z

it's like going over an iterator, item by item

borkdude 2020-10-15T14:08:02.499200Z

it's the same as if you're just running over a lazy seq, no difference there

mpenet 2020-10-15T14:08:05.499400Z

value by value (sounds better)

mpenet 2020-10-15T14:08:28.499700Z

kinda sorta, without the cost of a lazy seq

borkdude 2020-10-15T14:08:40.499900Z

I mean wrt to holding in memory

mpenet 2020-10-15T14:08:43Z

could be db rows, rs.next

2020-10-15T14:08:47.000200Z

But transduce would produce the whole vector which then run! would operate on

mpenet 2020-10-15T14:09:37.000400Z

transduce is a bit like reduce, you could throw-away the accumulation every-time, but then why use transduce in the first place (if you mean using transduce instead of run!)

borkdude 2020-10-15T14:09:44.000600Z

@otfrom My point with using transduce was: you're running over with a side effect. you could do the side effect in transduce instead, saving you the realisation of an eduction. But as I also pointed out, it may not be so important. mpenet has repeated this.

mpenet 2020-10-15T14:10:37.000800Z

you never really realize an eduction, it never materializes, it's really just (sort of) an iterator

borkdude 2020-10-15T14:11:07.001Z

ok, that's a good point yes. no garbage from the eduction.

user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil

mpenet 2020-10-15T14:11:12.001200Z

the docstring probably gives a better description than me

borkdude 2020-10-15T14:11:29.001400Z

I posted that above example to show the run! equivalent for transducers

borkdude 2020-10-15T14:12:11.001600Z

but run! + eduction works equally well

mpenet 2020-10-15T14:13:17.001900Z

eductions are awesome πŸ™‚

mpenet 2020-10-15T14:13:26.002100Z

it's the new juxt

mpenet 2020-10-15T14:13:33.002300Z

but actually useful

mpenet 2020-10-15T14:13:41.002500Z

I should create a company with that name maybe

πŸ˜† 2
borkdude 2020-10-15T14:13:55.002700Z

Are you disputing the usefulness of juxt....?

borkdude 2020-10-15T14:14:08.002900Z

;)

mpenet 2020-10-15T14:16:03.003100Z

it's muscle flexing in most cases imho!

borkdude 2020-10-15T14:16:58.003300Z

I usually use it with keywords (map (juxt :field-a :field-b) [{:field-a 1 :field-b 2}])

mpenet 2020-10-15T14:20:46.003500Z

yeah I prefer select-keys, but I get your point

mpenet 2020-10-15T14:21:11.003700Z

not really actually, different use

mpenet 2020-10-15T14:21:23.003900Z

but yes, there are good uses for it. It's just quite rate

mpenet 2020-10-15T14:21:25.004100Z

rare*

2020-10-15T14:33:41.004500Z

I use juxt all the time, but then I need to create a lot of vector from maps of data to go into excel or csv files, so select-keys doesn't work for me

2020-10-15T14:33:59.004700Z

most of my work is in and out of csv or excel

2020-10-15T14:34:34.004900Z

@borkdude I see what you are getting at with using transduce there now

borkdude 2020-10-15T14:35:26.005100Z

yeah, but has mpenet has pointed out, the overhead from the eduction might be small enough not to make this an issue

2020-10-15T14:35:31.005300Z

esp as run! is just a reduce with a proc according to the docs

mpenet 2020-10-15T14:36:06.005500Z

yes it will be very efficient

borkdude 2020-10-15T14:36:12.005700Z

same here: vectors from maps, for some reason I do this fairly regularly

2020-10-15T14:48:02.005900Z

I not only use juxt, I use (apply juxt vec-of-keys) b/c I'm a monster who does (into [] (map #(friendly-key-lookup %) vec-of-keys) as well

2020-10-15T14:48:55.006100Z

thx for having the patience to go through this with me. I feel I understand a lot more of what is going on. πŸ™‚

borkdude 2020-10-15T14:53:12.006300Z

I remember a Clojure meetup in Amsterdam with the author of Midje doing a talk and somehow he needed matrix transposition. I just yelled: apply map vector. It's one of these things you just know ;)

2020-10-15T15:03:34.006500Z

it is indeed

mpenet 2020-10-15T15:15:19.006700Z

and rarely need! I got to use it once, on a job interview πŸ˜›

mpenet 2020-10-15T15:15:54.006900Z

got an offer for that one

2020-10-15T19:56:39.007100Z

I've used it a few times, but then transposing a matrix isn't entirely odd for me

borkdude 2020-10-15T19:58:34.007400Z

especially not in the case of CSVs where you want to have a column instead of a row

pez 2020-10-15T21:14:56.008Z

Assum talk about transducers, there, @slipset.

πŸ™‚ 1
2020-10-15T21:35:10.008200Z

Indeed