Out of curiosity I poked around clojure.walk/walk
and it looks like its performance can be improved, with one change requiring almost no care, and another plenty of care:
The quick win is changing the coll?
case to use a transducer:
(coll? form) (outer (into (empty form) (map inner form)))
(coll? form) (outer (into (empty form) (map inner) form))
Another more elaborate change which will require lots of care is replacing the entire cond
with a protocol then extending that protocol to all of Clojure's collections.
I ran a few benchmarks and it seems like most of the gain is from switching to a transducer.
Should I post it on ask.clojure?that would reverse lists
which I don't think any consumer of walk actually wants
lists are covered by
(list? form) (outer (apply list (map inner form)))
oh, OK
I see now, it was already using into / empty
you can post it on ask, please make sure to include the tests you're using
The performance difference is very noticeable with maps and vectors. The protocol saves about 5% from that (give or take)
in general, a single transducer is not necessarily faster and can even be slower vs chunked seqs for small seqs.
the next step of analysis is to get a sense of what people use walk for and a rough frequency of that
basically toward answering the question of whether this change is usually a win
Alright. I'll organize the results and post everything on ask
thx
Won't chunked seqs be caught before the coll?
case in
(seq? form) (outer (doall (map inner form)))
(map f [...])
operates on a chunked seq of the vector
if you're interested in actually providing a patch, feel free to become a https://clojure.org/dev/contributor_agreement and https://clojure.atlassian.net/servicedesk/customer/portal/1
sorry, not looking at the code, just talking generally of things to consider
alright
There, found the mistake. Protocols do offer a speedup after all https://ask.clojure.org/index.php/9801/clojure-walk-walk-can-use-transducers-and-protocols
@ben.sless How do you explain speedups with protocols? How can they be faster than instance checks?
(btw, I haven't read through the benchmarks)
A summary of the benchmark would help, it's a bit difficult to read imo. I don't see significant differences?
1. The speedup with protocols isn't dramatic
2. They may be faster in a REPL environment but won't be in a direct-linked AOTed jar
3. It isn't just one instance check. To get to coll?
you need 5 instance checks and non-taken branches. In a heterogeneous collection does that mean branch prediction goes out the window?
protocols cache at the call-site
I think there is actually an existing ticket about clojure.walk and protocols btw
A summary of the benchmarks:
tested a few use cases:
• (1 2 [3 4 5] {:a 6 7 8} [9 [10]] #{:b 7})
a heterogeneous collection
• (defn walk* ,,,)
a big collection with lots of lists and symbols
• {1 {2 {3 {4 {5 {6 {7 {8 {9 {10 11}}}}}}}}}}
a deeply nested map
• [1 [2 [3 [4 [5 [6 [7 [8 [9 [10 11]]]]]]]]]]
a deeply nested vector
@ben.sless if you need a "walk but fast and impl via protocols" take a look at #specter
Upload and comment is great
thanks, will do
Done, thanks Alex 🙂
https://github.com/stuartsierra/clojure.walk2 is an old impl of that
prob predates transducers though
speaking of transducers, core.set
is full into
which predates the transducers API.