clojure-dev

Issues: https://clojure.atlassian.net/browse/CLJ | Guide: https://insideclojure.org/2015/05/01/contributing-clojure/
Ben Sless 2020-11-13T18:09:42.217100Z

Out of curiosity I poked around clojure.walk/walk and it looks like its performance can be improved, with one change requiring almost no care, and another plenty of care: The quick win is changing the coll? case to use a transducer:

(coll? form) (outer (into (empty form) (map inner form)))
   (coll? form) (outer (into (empty form) (map inner) form))
Another more elaborate change which will require lots of care is replacing the entire cond with a protocol then extending that protocol to all of Clojure's collections. I ran a few benchmarks and it seems like most of the gain is from switching to a transducer. Should I post it on ask.clojure?

2020-11-13T18:10:18.217300Z

that would reverse lists

2020-11-13T18:10:29.217600Z

which I don't think any consumer of walk actually wants

Ben Sless 2020-11-13T18:10:37.217900Z

lists are covered by

(list? form) (outer (apply list (map inner form)))

2020-11-13T18:10:46.218100Z

oh, OK

2020-11-13T18:11:11.218300Z

I see now, it was already using into / empty

alexmiller 2020-11-13T18:16:45.219600Z

you can post it on ask, please make sure to include the tests you're using

👍 1
Ben Sless 2020-11-13T18:17:10.220200Z

The performance difference is very noticeable with maps and vectors. The protocol saves about 5% from that (give or take)

alexmiller 2020-11-13T18:18:40.221100Z

in general, a single transducer is not necessarily faster and can even be slower vs chunked seqs for small seqs.

alexmiller 2020-11-13T18:19:32.221800Z

the next step of analysis is to get a sense of what people use walk for and a rough frequency of that

alexmiller 2020-11-13T18:20:04.222200Z

basically toward answering the question of whether this change is usually a win

Ben Sless 2020-11-13T18:21:03.222800Z

Alright. I'll organize the results and post everything on ask

alexmiller 2020-11-13T18:22:05.223Z

thx

Ben Sless 2020-11-13T18:22:25.223300Z

Won't chunked seqs be caught before the coll? case in

(seq? form) (outer (doall (map inner form)))

2020-11-13T18:23:05.224Z

(map f [...]) operates on a chunked seq of the vector

alexmiller 2020-11-13T18:23:29.224300Z

if you're interested in actually providing a patch, feel free to become a https://clojure.org/dev/contributor_agreement and https://clojure.atlassian.net/servicedesk/customer/portal/1

👍 1
alexmiller 2020-11-13T18:24:17.224400Z

sorry, not looking at the code, just talking generally of things to consider

Ben Sless 2020-11-13T18:24:39.224600Z

alright

Ben Sless 2020-11-13T20:16:46.227700Z

There, found the mistake. Protocols do offer a speedup after all https://ask.clojure.org/index.php/9801/clojure-walk-walk-can-use-transducers-and-protocols

borkdude 2020-11-13T20:23:21.228200Z

@ben.sless How do you explain speedups with protocols? How can they be faster than instance checks?

borkdude 2020-11-13T20:23:51.228500Z

(btw, I haven't read through the benchmarks)

borkdude 2020-11-13T20:25:34.228900Z

A summary of the benchmark would help, it's a bit difficult to read imo. I don't see significant differences?

Ben Sless 2020-11-13T20:28:28.230700Z

1. The speedup with protocols isn't dramatic 2. They may be faster in a REPL environment but won't be in a direct-linked AOTed jar 3. It isn't just one instance check. To get to coll? you need 5 instance checks and non-taken branches. In a heterogeneous collection does that mean branch prediction goes out the window?

alexmiller 2020-11-13T20:28:44.231300Z

protocols cache at the call-site

alexmiller 2020-11-13T20:29:38.232400Z

I think there is actually an existing ticket about clojure.walk and protocols btw

Ben Sless 2020-11-13T20:30:05.232800Z

A summary of the benchmarks: tested a few use cases: • (1 2 [3 4 5] {:a 6 7 8} [9 [10]] #{:b 7}) a heterogeneous collection • (defn walk* ,,,) a big collection with lots of lists and symbols • {1 {2 {3 {4 {5 {6 {7 {8 {9 {10 11}}}}}}}}}} a deeply nested map • [1 [2 [3 [4 [5 [6 [7 [8 [9 [10 11]]]]]]]]]] a deeply nested vector

souenzzo 2020-11-15T01:13:27.242300Z

@ben.sless if you need a "walk but fast and impl via protocols" take a look at #specter

alexmiller 2020-11-13T20:30:30.233Z

https://clojure.atlassian.net/browse/CLJ-1239

👀 1
alexmiller 2020-11-20T14:12:07.247900Z

Upload and comment is great

Ben Sless 2020-11-20T15:21:30.250100Z

thanks, will do

Ben Sless 2020-11-21T16:42:09.250700Z

Done, thanks Alex 🙂

alexmiller 2020-11-13T20:31:00.233200Z

https://github.com/stuartsierra/clojure.walk2 is an old impl of that

alexmiller 2020-11-13T20:31:15.233500Z

prob predates transducers though

Ben Sless 2020-11-13T20:35:31.235200Z

speaking of transducers, core.set is full into which predates the transducers API.