@val_waeselynck read-string
is your secret weapon https://github.com/bhauman/advent-of-clojure-2016/blob/master/src/advent_of_clojure_2016/day23.clj#L7
wow I love how everyone is getting into it this year
@grzm I'm digging how you generated the direction list for day3
sent a PR for readme!
being brand new to Clojure and trying to figure out AoC is incredibly frustrating haha
Hmmmmm. Day 5 Part B is taking an awful long time to run with my soln
Probably should've gone mutable or something
... oooooh
not too tricky, 2nd part took a while to execute
Mine's not finishing, even giving it a fair chunk of time...
don't forget you can go out both ends of the list
switching to a transient for the "instruction tape" sped it up a bit
..actually a lot .. 2s vs 11s
I've got an infinite loop or something in mine
Ostensibly it should always exit, all instructions gravitate towards +3
Checking for both bounds
In the end, I made a mistake by using http://repl.it
-_-
Lightning fast on local
that was fun. did we all just use maps and update?
I started my solution with also updating the new location, made the problem a lot harder and incorrect š Part B took some time indeed
(time (solve2)) "Elapsed time: 22630.602584 msecs"
for map with update
Heading to bed, but I used a vector and update
, no transients, and part 2 took about 10s to run for me.
I used iterate
- easy to think about a state -> state function and an exit condition on that state.
I want to start on day 5 but suddenly Iāve got CIDER problemsā¦
Still managed with inf-clojure
š
@dpsutton I used an int-array, took about 120ms to solve 2 š
mine is taking forever š
So yeah, Clojure is fast š
wooo, itās finally finished! Probably took around 10 minutes, though I was running it through a REPL instance in Visual studio code.
Using update-in and a vector - Part 2 took 36 seconds on my little computer.
https://github.com/bloat/aoc2017/blob/master/src/net/slothrop/aoc2017/day5.clj
This is my slow solution: https://pastebin.com/Cj4VZVR3 Iām new to Clojure, if anyone can see something Iām doing thatās really stupid (and maybe explains the slowness) thatād be very helpful
@andofryn you should inline inc-accordingly, when you define it as a function it will convert primitive ints to objects
@andofryn and maybe more importantly, you should probably type-hint nrs with ^ints nrs
evaluate (set! *warn-on-reflection* true)
before running your code to see if that's the case
thanks for the suggestions, very interesting! Iām missing the understanding of why those things make a difference though, Iāll do some googling to find that out š
I am indeed getting some warnings after setting warn-on-reflection
null:12 recur arg for primitive local: current_index is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: current-index
Reflection warning, projects/clojure/advent-of-code/day5/solution.clj:8:37 - call to static method alength on clojure.lang.RT can't be resolved (argument types: unknown).
Reflection warning, projects/clojure/advent-of-code/day5/solution.clj:10:21 - call to static method aget on clojure.lang.RT can't be resolved (argument types: int, int).
How can I get rid of this warning:
Boxed math warning, ... - call: public static boolean <http://clojure.lang.Numbers.lt|clojure.lang.Numbers.lt>(java.lang.Object,long).
Trying mutable arrays for speedupI used vectors and assoc
, āElapsed time: 23169.810878 msecsā for part 2.
Mine takes about 5 seconds
I have one variant with a vector, a transient and an array: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj Iām still wondering why the array version is so slow
@val_waeselynck Do you have your solution somewhere?
yeah, transient seems fastest for me too:
advent.2017.day5> (c/quick-bench (solve2 input))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 6.091951 sec
Execution time std-deviation : 254.233063 ms
Execution time lower quantile : 5.895512 sec ( 2.5%)
Execution time upper quantile : 6.518767 sec (97.5%)
Overhead used : 1.887221 ns
Found 1 outliers in 6 samples (16.6667 %)
low-severe 1 (16.6667 %)
Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
advent.2017.day5> (c/quick-bench (solve2-int-array input))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 1.403268 sec
Execution time std-deviation : 18.226029 ms
Execution time lower quantile : 1.381179 sec ( 2.5%)
Execution time upper quantile : 1.419574 sec (97.5%)
Overhead used : 1.887221 ns
advent.2017.day5> (c/quick-bench (solve2-transient input)))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 1.306124 sec
Execution time std-deviation : 15.056565 ms
Execution time lower quantile : 1.291127 sec ( 2.5%)
Execution time upper quantile : 1.320360 sec (97.5%)
Overhead used : 1.887221 ns
@orestis https://github.com/vvvvalvalval/advent-of-code-2017/blob/master/src/aoc2017/day05.clj
@andofryn you have a bug actually, you want to do (alength nrs)
oh maybe it was not casting to int in the comparison
doing so gives:
advent.2017.day5> (c/quick-bench (solve2-int-array input))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 1.393645 sec
Execution time std-deviation : 10.345934 ms
Execution time lower quantile : 1.381836 sec ( 2.5%)
Execution time upper quantile : 1.404963 sec (97.5%)
Overhead used : 1.887221 ns
ah, alength instead of count
Some performance tips: - Vars lookups are slower and harder to optimize than locals lookups - Make sure to type hint primitives and primitive arrays when necessary - Primitive values (ints, bytes, booleans etc.) cannot travel across functions (they get boxed into objects) - Protocols are way faster than multimethods - Looking up a key in a record type is 5 to 10x faster than looking it up in a plain map
length doesn't change so i set that in a let before the loop.. for me it was the comparison not being (int 3)
which was slowing things down
well not a bug actually, but probably not what you intended either
ok, fixed: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L54
240ms
@val_waeselynck I modified your version a bit, almost halved the execution time (650ms -> 380ms) - just moved the alength call outside.
I thought I would be able to typehint a single ^int but I get this: https://stackoverflow.com/questions/15230061/cant-type-hint-a-local-with-a-primitive-initializer
Adding the (int i)
in comparisons actually makes things slower.
wow, borkdude's version smokes mine
advent.2017.day5> (c/quick-bench (bork-int-arr input))
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 195.021813 ms
Execution time std-deviation : 516.193459 Āµs
Execution time lower quantile : 194.408773 ms ( 2.5%)
Execution time upper quantile : 195.624313 ms (97.5%)
Overhead used : 1.887221 ns
(defn solve2-int-array [input]
(let [i (into-array Integer/TYPE input)
len (count input)]
(loop [idx (int 0) c (int 0)]
(if (or (> 0 idx) (<= len idx)) c
(let [v (aget ^ints i idx)]
(if (<= (int 3) v)
(aset-int i idx (dec v))
(aset-int i idx (inc v)))
(recur (+ idx v)
(inc c)))))))
Any ideas of where to type hint that?
@borkdudeās version takes 1000ms on my machineā¦ Is running this inside a CIDER repl a bad idea?
@minikomi put this in your code:
(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
I have both, but neither is complaining
@orestis Did you pull the most recent?
@minikomi Maybe at ^ints
before i
in let
@minikomi also donāt use count
, but alength
@minikomi also donāt use aset-int
, I discovered thatās slower as well
OK making those changes
sped up some
@borkdude Just did, got it to 600ms.
@orestis what kind of machine are you on?
woah
I am using int-array
instead of into-array
, would that make a difference?
sped up a ton
haha
@borkdude 2014 Macbook Pro, 2.8GHz i5
@orestis Are you running exactly my code or an adaptation ?
Tiny adaptation; I create the array with int-array, and donāt have the reducible-resource
stuff; but I do this for all the adaptations.
This is from 2015 btw: https://stackoverflow.com/questions/34153369/why-is-this-clojure-program-working-on-a-mutable-array-so-slow
the ^ints in the let by itself took me from 1.3s -> 98ms
@borkdude Have a look at https://github.com/orestis/adventofcode/blob/6537d8dda0825871fdd9acad0daf95570d82ef87/clojure/aoc/src/aoc/2017_day5.clj#L95 ; itās the fastest in this file.
weird.. it's like, make an array of type integer (by the way, it's an integer array!)
@orestis thatās effectively the same as mine?
oh yeah int-array
I wonder if try/catch OutOfBounds exception would work.
btw I pushed a new version, but the performance is the same
BTW, everyone has different inputs so we need a baseline if we are going optimize this silly thing š
good point š
haha oh yeah
my input file is on github as well
tfw when you get out of bed somewhat earlier for Advent of Code, but CIDER wonāt connect to your nrepl anymore and you have to fall back on inf-clojure ā¦
Is there any project that could easily take those different implementations and run proper benchmarks etc? Not sure how warm/cold JIT makes any difference here.
after more poking around, aset
vs aset-int
does indeed seem to be the culprit.
aset:
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 102.748740 ms
aset-int:
Evaluation count : 6 in 6 samples of 1 calls.
Execution time mean : 1.391871 sec
At this point you might want to write it in Java and just call from Clojure š
true. still fun to overthink š
@borkdude Well, the ānaiveā Clojure version that uses a vector can look really similar to the one that uses int arrays. Still a 10 line function anyway š
true, but dealing with things like āshould I use aset or aset-intā isnāt fun imho
why so slow, aset-int
Thatās because yāall know too many core functions for your own good š
Googling aset-int slow, this book seems to concur: https://books.google.co.jp/books?id=4QacPa1vwMUC&pg=PA148&lpg=PA148&dq=%22aset-int%22+slow&source=bl&ots=2AACgvj4qk&sig=ysLjEPVkInOBPUdMj9Wgdgyqrz0&hl=en&sa=X&ved=0ahUKEwitpbCt0_LXAhUDNJQKHYiRDo8Q6AEIQzAE#v=onepage&q=%22aset-int%22%20slow&f=false Seems like something to stay away from in the future then!
good catch!
@orestis Do you mean that moving the alength
call outside improved the perf of my version by 2x?
Is there a usecase for aset-int
if it is slower than aset
? Or is it just a legacy function?
@chrisblom very good question
It's looking like loop
/ recur
is the most popular for day 5, followed by iterate
I usually start with reduce, but when I need multiple things as the argument of the reducing function, I give up and fall back on loop
if i macro expand the definition and clean it up a bit there are only 2 differences: aget has :inline metadata and uses .set instead of .setInt
I actually used reduce
(on (range)
as the step counter)
@mfikes Yeah, I think loop can mostly be rewritten to a reduce with destructuring
@mfikes Neat that youāre doing it with cljc now
The nice trick I missed is relying on get
to return nil
, as opposed to pre-checking the index
@mfikes That trick worked for me, until I got to primitive arrays
(You can learn so much by studying the range of solutions. š )
Yeah, that's the story. You can write clean Clojure, and then when you want it to go faster, you delve into messiness. (I suppose that's true in any language.)
At least in Clojure we can really push hard on the "clean" end of the spectrum.
Well, Scala really is better (not the right word, I must say, friendlier/easier) at this kind of stuff, while still allowing you to do functional. I think the type system helps to generate more primitive stuff if needed.
But I only need mutability this during Advent of Code really, and then only for extra performance, not for getting the answer, so š
Ahh, right. Does Scala essentially generate code like C++ does when instantiating templates?
(The closest I think I've seen Clojure come to this is how ClojureScript has macro variants of functions that can try to generate optimal code based on what is available at compile time.)
@mfikes Donāt know, I just didnāt have as much trouble with it as in Clojure: https://gist.github.com/borkdude/b37563939639b40013f483361a7c5d8f
(I was just learning Scala back then for fun)
An example of how (I suppose essentially anything) can be done with macros is how str
in ClojureScript passes the literal strings in something like (str "a" 1 "b")
untouched. (I think in Clojure it may still end up invoking (.toString "a")
and (.toString "b")
.)
That may not be the best example, but it illustrates that anything can be done at compile time with macros if you are trying to eek out perf for the generated code.
@mfikes How does this relate to primitive array performance?
I'm just saying that if Scala can do nice things with types, Clojure may also be capable of the same if you go down the road of using macros (and if you have access to inferred types).
With the macro approach you can get rid of some calls like (str "foo" "bar")
, but will it get you very far, I donāt know?
I suppose Scala has the benefit of much richer type inference.
There was also a talk about this topic at EuroClojure 2017: http://2017.euroclojure.org/nada-amin/
where the type system was used to generate fast low level code: https://www.youtube.com/watch?v=QuJ-cEvH_oI
I haven't thought about this too much, being on the JavaScript side of things š In that world, the engine watches the values fly by and infers their types, dynamically generating optimal machine code, and de-opting if it sees a value that violates assumptions, etc.
The end effect is that where in Clojure you can manually hint, in ClojureScript, you need to let the lower layers do it for you.
Yeah, I kind of wondered why you donāt need to add type hints in ClojureScript, so thatās how it works
PureScript solution: https://github.com/krisajenkins/AdventOfCode/blob/master/src/Year2017/Day5.purs
The primary type hints in ClojureScript are ones that help avoid checked if
: If the compiler knows it is dealing with a Boolean value for example, it can just use JavaScript's notion of truthiness.
@mfikes Do you perhaps have a clojurescript version of day 5 which runs on Node, to compare it with the performance of PureScript day 5?
I'll convert it to Lumo real quick
Or are you interested in Closure-optimized code?
I could write that fairly easily.
does it help for performance? and mutable array version is fine
I can take your fastest mutable array version, and produce an optimized Node-compatible JavaScript file for you to use to compare with PureScript.
What is your recommended mutable array version?
I donāt about the absolute fastest, but this is my fastest: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L54
Cool. I'm putting together a GitHub repo so it is possible to revise the code if needed and build and run it in Node
@mfikes I only asked if you had this laying around, not if you wanted to put time in it, but if you like to do it, still thanks š
Nah, it is close to trivial
@borkdude For me, ClojureScript / Node.js is running that array-based solution 25% faster than Clojure. Here it is, set up so it is trivial for anyone to reproduce perf results or change the code: https://github.com/mfikes/advent-day-5-cljs-node
@mfikes Awesome, Iāll try
Interestingly Planck runs it at about the same speed as the Closure optimized JavaScript under Node.js. (JavaScriptCore is wicked fast.)
I get 240 ms on the JVM, 400 on Node
hmm, weird enough, in lein I get 800 ms.. itās slower than from my REPL it seems
Maybe the JVM didn't have enough time to warm up. V8 is probably more optimized than the JVM for quick warmup
@mfikes boot: 240 ms, lein 800 ms?
I think I need to add the -server
JVM option to the lein
setup in that project
@val_waeselynck I specifically have the perf test running over and over again to avoid warm up issues. (Good point, though š )
I think that might be it, yes. Iām still a noob so Iām not really sure what Iām doing there; type hinting is a bit opaque, but I donāt get any reflection warnings soā¦
@mfikes yeah I noticed but I'm wondering if that's enough, Criterium typically seems to do much more sophisticated things to take both warmup and GC into account :thinking_face:
Ooh even faster with your input (I forgot the inputs differs)
@val_waeselynck I agree. But we have a huge perf diff right now to sort out š
Yeah, it needs -server
really weird, I run both tests from the repl. Exact same code now.
Patching the repo
does boot run with server by default maybe?
or maybe thatās the default and lein adds client
Yeah, lein run
evidently wants to minimize startup latency
yeah, now it works š 180ms
@mfikes Thanks, that was a nice adventure
Yeah, with any perf tests you really really need to provide the code so others can try it as well š
@borkdude so what are your numbers now?
@val_waeselynck same as mikeās repo, ~ 180 ms on my machine in the JVM
My numbers are Clojure: 167 ms, ClojureScript, Closure / Node.js: 384 ms Planck: 570 ms
Alright, thanks!
Some guy on Reddit had 75 ms using C++ā¦ weāre not done yet š
Maybe we can write a Clojure program which generates C++ and run that one
Didnāt Rich write C++ like that as well in Common Lisp? š
Yes, but was it the same input? š
Assembly would probably be not so difficult for this problem
or maybe straight JVM bytecode?
https://www.reddit.com/r/adventofcode/comments/7hngbn/2017_day_5_solutions/dqsdslv/
well, twice the speed of C++ aināt bad
especially when you haven't compiled it with heavy optimizations
JIT is probably the best strategy for low latency (including the whole write + debug + compile + run process)
so why does the array implementation take so long???
checking it out with ints instead of longs
@mfikes AFAIK, Scala's generics aren't like C++ templates. Scala's generics are mostly the same as Java's generics - type erasure and boxing. There are conveniences, like implicit parameters to capture class tokens. But they're fundamentally not a code generation mechanism, as C++ templates are.
@bytecrafter Cool. I vaguely (perhaps mis-) recall something in Scala generating code covering the possible permutations of primitives, but that was perhaps 5 years ago for me.
Ahh, "specializing" is what I recall. For example http://www.scala-notes.org/2011/04/specializing-for-primitive-types/
@mfikes are you guys using the same set of inputs for benchmarking?
No, we realized that this causes a problem @bhauman
At least, we caught one instance recently were we forgot the data changes. š
the final solution should determine the complexity
mine is in the 28 million
range
I haven't been too involved in the perf game, until some fun stuff this morning
well my fastest is 5 seconds! where as yours is in the ms
My day 5 result was 24,315,397
ok so same complexity
darn it son
My reduced-based solution using persistent data structures takes about 5 seconds
That benchmark repo is banging on primitive arrays
It is Java in disguise š
my array implementation performs much worse, so I'm going to take this opportunity to learn some things ....
I optimized for readability, not worrying about perf: https://github.com/mfikes/advent-of-code/blob/master/src/advent_2017/day_05.cljc#L14-L19
Came up with another perf optimization, but didnāt help: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L127 Still 232 ms
@bhauman Didnāt see your solution, but aset-int
are not so good for performance, we found out today
@bhauman Also:
(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
helpsoh dems some tricks
cool I just posted mine
@bhauman According to the docs, I think you probably want (int ...)
instead of ^int ...
https://clojure.org/reference/java_interop#optimization
@borkdude mine is similar, but i'm not passing the mutable array via recur, runs in 130ms on my machine: https://github.com/ChrisBlom/advent-of- @code/blob/master/src/adventofcode/2017/day05.clj#L16-L33
yeah thanks I'm still working on the scratch area
just added that thanks!
Maybe try the same input for comparison?
@borkdude What kind of black magic is involved in annotating a local as a primitive ^int
but treating it as a Boolean?
I never have to optimize stuff like this so this is kind of a blast
I tried without passing the array in the loop, but it did not matter I think
Hmm. Something to learn there...
itās just a reference right
I'm referring to the annotation on this line https://github.com/mfikes/advent-day-5/blob/master/src/advent_day_5/core.cljc#L17
@mfikes The bound value is an int, because and returns the int. But if itās non-true, it wonāt be bound at all
Ahh. Thank you. I need to macroexpand to see that.
Hah!
(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
(if-let [^int x (and true 1)] (inc x) :else) ;; good
(if-let [x (and true 1)] (inc x) :else) ;; bad, boxed
(macroexpand ā(if-let [^int x (and true 1)] (inc x) :else))
(let* [temp__4655__auto__ (and true 1)] (if temp__4655__auto__ (clojure.core/let [x temp__4655__auto__] (inc x)) :else))
it seems macroexpand doesnāt show the type hints?
@bhauman I'm currently writing a home-made GraphQL engine, so I'm totally in the performance business right now š
oh darn, sounds fun?????
it will if you (set! *print-meta* true)
it is š actually not exactly a GraphQL engine, since it's an alternative to GraphQL with some added features.
@mfikes Interesting, I didn't know about that in Scala. Thanks!
However, I guess the core server-side algorithm could be reused to implement both GraphQL engines and Om Next routers
Is [1 -3]
a legitimate input? (In other words, can it exit by going negative?)
@mfikes I thought about that, but since I got the answer, I didnāt worry about it anymore š
OK. Cool.
This is just awesome. Advent of Code in pure Postgresā¦ https://github.com/xocolatl/advent-of-code-2017/blob/master/Day05/script.sql
@borkdude I got yours to run in 80 ms. I'll put the intersting change in a branch
wowie
that is very close to the C++ version
I was trying to eliminate a checked if in what ClojureScript produces, and got it to run faster, but it also made the Clojure faster (if everything pans out)
if this works, we have world domination
Here is the change https://github.com/mfikes/advent-day-5/commit/45aa111e8df606884f5af4ffd886aa2a8af393eb?w=1
It is actually around 87 ms
veeery nice, letās see if I understand
Wow. Same speed in Node
still the correct answer?
24315397
https://clojurians.slack.com/archives/C0GLTDB2T/p1512487050000252
@mfikes I donāt understand why you need the if-some at all there
(aget maze cur-pos)
should always return something when (< cur-pos length)
OK. Let me address that š
OK I got it in the ball park
by switching to aget
@mfikes turned it to a let
, seems to help a bit
so, if-let
made it slower.. hmm
Perhaps crappy expansion?
Hmm
It is surprising that ternary <
in Clojure botches the perf by an order of magnitude, but not so in ClojureScript. Specificially, trying to do the right thing and testing (< -1 cur-pos length)
instead of just (< cur-pos length)
makes it run in 900 ms instead of 87 ms.
Anyway, I put the 87 ms-version on master https://github.com/mfikes/advent-day-5/blob/master/src/advent_day_5/core.cljc#L8-L27
Well, I could in fact shave of a couple of ms with this trick: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L115
Oh, so check for zero and then optimize for that case?
@mfikes with your data input 78 ms
yes
Wow. I'll apply that one! We must dominate.
Or, if you want I can add you as a committer and you can push your change in.
ok, add me š
you could check for negative as well right?
I'm not seeing why it replaces 0 with 2. Should it be 1 instead?
nope
but maybe this is no good, I donāt see a real change when applied to your benchmark with dotimes
ok
Run 58 of 60.
āElapsed time: 102.893639 msecsā
24315397
Run 59 of 60.
āElapsed time: 90.53835 msecsā
24315397
Run 60 of 60.
āElapsed time: 100.259889 msecsā
24315397
---
Run 1 of 60.
āElapsed time: 129.65845 msecsā
24315397
Run 2 of 60.
āElapsed time: 104.405081 msecsā
24315397
Run 3 of 60.
āElapsed time: 96.121724 msecsā
Oh, I see why 2. You are trying to skip a loop cycle?
Clever
Hence the (+ steps 2)
with criterium I do see a difference of about 4-5 ms
which may not be significant
@bhauman negatives?
@mfikes we have world domination nowā¦ congrats.
Posted it under the C++ solution: https://www.reddit.com/r/adventofcode/comments/7hngbn/2017_day_5_solutions/dqt0s5v/ š
In Node, my reduce-based "normal" solution takes 8 seconds, while the optimized version takes 80 ms in Node.
I should post this as an 80-ms JavaScript based solution: https://gist.github.com/mfikes/4ef3d2c3efc8f72a848e9149e1229e84
is that the output of the compiler?
Yeah! š
cool!
74ms now
@chrisblom Removed the maze from the loop arguments, another 4 ms š
@mfikes Could you try this commit? It seems to be 30 ms faster on Node on my machine than the original: https://github.com/mfikes/advent-day-5/commit/3fdda4ba34d144fb795674f51ba81927c9125d1c
has any tried unrolling the loop yet?
@mfikes also pushed a cljc fix, but it worked nonetheless
to unroll a loop you need to know in advance how many times you will loop, no? and that isn't known until you get there
unless you mean make the loop include several iterations with escape hatches in each?
but i thought loop unrolling was a strategy to evade the loop test which we cannot escape so we always pay the branch penalty
@borkdude Yes, Node is faster for me with your commit
How fast now?
Oh wait. It actually went for an original speed of 86 ms to slower at around 93 ms
On my machine, branch master:
24315397
Run 57 of 60.
āElapsed time: 122.000000 msecsā
24315397
Run 58 of 60.
āElapsed time: 136.000000 msecsā
24315397
Run 59 of 60.
āElapsed time: 125.000000 msecsā
24315397
Run 60 of 60.
āElapsed time: 128.000000 msecsā
24315397
Branch faster:
āElapsed time: 114.000000 msecsā
24315397
Run 58 of 60.
āElapsed time: 109.000000 msecsā
24315397
Run 59 of 60.
āElapsed time: 106.000000 msecsā
24315397
Run 60 of 60.
āElapsed time: 109.000000 msecsā
24315397
(Node)
Right, something was messed up with the squashed commit I put on master. It was causing an inexplicable regression.
I've since removed it.
I noticed when compiling some code on the background, severely impacted the performance, so these times also depend on what your machine is doing. For some real benchmarking criterium or something is needed.
In my repo I used criterium, which gives better time for this commit
Maybe node also has something like this
FWIW, I fixed the master branch so it now has the better perf number
(At least better under Node)
@mfikes do you mean this commit is the newest now? https://github.com/mfikes/advent-day-5/commit/45aa111e8df606884f5af4ffd886aa2a8af393eb
Yeah, on my instance of Node that one runs faster
thatās weird, because the if-some is not neededā¦ š
on my machine itās faster too
Right... the if-some
should be cleaned up. It is unnecessary, but AFAICT it doesn't hurt perf.
when I change the if-some to let it becomes slower
WAT
I was referring to perf on Node. Let me make the change as well to be sure.
@mfikes I was also talking about that
With let:
"Elapsed time: 132.000000 msecs"
24315397
Run 59 of 60.
"Elapsed time: 129.000000 msecs"
24315397
Run 60 of 60.
"Elapsed time: 138.000000 msecs"
24315397
With if-some:
Run 57 of 60.
"Elapsed time: 111.000000 msecs"
24315397
Run 58 of 60.
"Elapsed time: 106.000000 msecs"
24315397
Run 59 of 60.
"Elapsed time: 111.000000 msecs"
24315397
Run 60 of 60.
"Elapsed time: 106.000000 msecs"
24315397
(Node)Ahh... indeed, converting it to a let
and dropping the else branch makes it slower for me on node than having an if-some
... going to look at the generate JavaScript
So far we have learned: if-let slows down, but if-some speeds up š
@bhauman thanks š I'm amazed at all of the different ways people approach this, and I'm sure what seems obvious to one seems clever to someone else. I'm learning a lot by reading. Trying to figure out a way to make it all stick as efficiently as possible.
So AoC leads to the discovery of obscure perf regressionsā¦great!
@spfeiffer We discovered that Node gets speedup from adding obsolete codeā¦
@borkdude I think that is a very nice metaphor for the Node ecosystem āŗļø
@borkdude I did a bunch of tests. It seems to boil down to the fact that an extra nil
-check, while not free, pays off because it lets the Node optimizer know that it is doing arithmetic on non-`null` values. (We as humans know that it the values pulled out of the array are not nil
, but the optimizer doesn't.) In fact, I saw one test run with Lumo where it started off slow, and then "figured it out" on its own, and started running at the faster speed, while in a loop, without me doing anything. I also found that you can get a lesser speedup by changing (aget maze cur-pos)
to (int (aget maze cur-pos))
. This gets compiled down to JavaScript that involves | 0
, which is evidently a ostensibly useless appendage added to JavaScript arithmetic code to "hint" to the optimizer that it is doing integer arithmetic. So, when-some
or even a more straightforward (when-not (nil? ...)) ...
, while not changing the functionality, ends up acting like a hint, or an assertion that the optimizer can use to its advantage.
Interesting and at the same time frightening
One interesting question in my mind: In the early days, types were used for performance. Will we reach a point where types are inferred at runtime sufficiently well so that the performance of such code behaves just as well as statically typed code. Or, to put it another way, with enough money being dumped into JavaScript VMs, will ClojureScript end up running faster than Clojure one day. It can already do this at times for unhinted Clojure, but I'm wondering if it will all get to be so good where you can just write your dynamic unhinted Lispy code and it will run extremely fast.
That would be nice, but nothing beats full type inference for this I think?
I mean, if you have all type info, what else is there to guess?
PureScript probably wonāt have much benefit from typing to performance because of its target?
An analogy: I used to find it odd that you didn't pass flags like -O3
to the javac
compiler. It was difficult to just "let go" of the idea optimizing at compile time, and defer it all to runtime. One argument is that if you let the runtime do the optimization, it can leverage the call and memory access patterns in your program, and perhaps do a better job than a static analyzer could do. If static type info is available, then, sure, it can be useful. I'm just wondering if we as human developers will be in the low-level business of type hinting a decade or two from now.
Maybe there will be a more fuzzy type of static type inference based on ML?
That could also assist in IDEs and for performance, without getting in the way
I think Steve Yegge once wrote about this
http://steve-yegge.blogspot.nl/2008/05/dynamic-languages-strike-back.html
Thatās 10 years agoā¦ heās now fond of Kotlin š
@mfikes I'd argue we're already there with pypy
i don't follow why types hints allow for anything more than asserted types
I've seen entire transducer pipelines compile down to half a dozen assembly instructions.
if a jit finally believes you that they are ints, wouldn't it believe you if you said they were ints in the first place?
@mfikes what you describe is pretty much the premise of tracing JITs. https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation
@dpsutton hints can also be used for storage optimization. In that case they're a bit stronger than assertions.
But yeah, hints are a way of papering over the inability of a JIT to deoptimize.
JS and PyPy do deoptimization. Put only ints in a list and the JIT will create a typed array of ints. Once you try to put an object in the array it will convert the list into an list of objects and then add the object to it.
So the question becomes, why should you ever have to hint at all? In Clojure it's because the JVM doesn't allow user-level deoptimization.
And it's a method jit, and that doesn't help either
i'm not sure i follow. deoptimization sounds like recognizing a bad assumption?
and that wouldn't happen with types? or does it
I thought these were ints but there's a map so now they're all objects
is what i'm imagining the JIT is doing
Well with types and something like C# or C++ deoptimization can never occur, since all the types are known at compile time.
On the JVM it's a bit strange, but it will deoptimize a callsite. So as you implement IFoo on more and more types the JVM may recompile every method that calls IFoo.foo.
That's why adding stuff like tuples to Clojure doesn't work since the perf improvement is a wash compared to the de-optimized callsites.
interesting. thanks for explaining. my jvm knowledge is very slim
What tracing jits like PyPy do, is look at the context for a loop. Within a single loop they inline all methods and reduce all types to their lowest level parts. So as long as you never call a method with a map within that single loop, the code for handling that map will never exist in the loop.
Hmm. So, transducers are a little more friendly to tracing JITs, it soundsā¦
Right, on the JVM transducers are always boxed. On JS the JIT is a bit more tracing-like, so you get primitive math loops.
Do a transduce over an array in CLJ and CLJS and the CLJS version will be much faster if all you're doing is math.
so that wasn't a good idea? you starting to get hungry?
@bhauman did you comment on my deleted blob?
yeah š
ah - yeah, it wasnāt a feasible idea
For the day 5 problem, it seems that it reads and writes rapidly to lots of parts of a big array. I wonder if memory bandwidth is affecting things more than any of the math / logic.
@mfikes in what language? and what access pattern?
I've seen 4x perf improvement from making sure I scanned an array from 0 to X instead of walking it randomly.
Pretty satisfied that we got it down to 75 ms, the same number the C++ and Rust guys are mentioning on Reddit š
and thats for CLJS? right?
in my case Clojure JVM
oh dang
I canāt reliably benchmark node, I use criterium in Clojure
I wonder how much using unchecked math would improve that?
oh thats been tried
or has it?
@tbaldridge https://github.com/borkdude/aoc2017/blob/master/src/day5.clj ?
ah, cool, I missed that post
using :warn-on-boxed
@tbaldridge Mike discovered that if-let generates less performant code than just if and letā¦
this and a couple of other tweaks shaved off some ms here and there
The array is only about 1000 integers. We read from a spot which tells us where to jump to next, and we write to the spot we just read from with a new offset derived from the old one. Repeat until we hop out of the array.
Oh, I'd summarize it this way: By guarding a block of code with a nil
check, that code runs faster under Node, even though we know (as humans) that none of the values are nil
.
yeah, but on JVM Clojure the code ran also faster without if-let
Oh. OK.
yeah if and then let is faster than if let
we went from if-let to if-some and then to letā¦ if-some turned out to be beneficial on Nodeā¦ itās hard to optimize for both worlds
wow it's a lot faster
yeah, I doubt memory perf is the issue then. As 1000 integers at 8 bytes that should easily fit in the cache.
Yeah. I mistakenly assumed it was a much larger array.
using if
and then let
cut my time in half
while the if-some was an obsolete check, the Node runtime still benefited from that, crazy
Don't we already have a solution that is around the same as one of the C++ ones out there?
@mfikes Yes, I posted it 5 minutes ago: (quick-bench (part-2-array-faster?)) ;; 74 ms
, or did you mean something else?
(on a Macbook Pro 2015)
We are running through 20 million steps in 70 ms. Around a few nanoseconds per loop?
If that's around 10 machine cycles, it might be hard to improve upon :thinking_face:
The assembly is probably straightforward, but I only did that in universityā¦
hey @mfikes you had a twitter thing about emitting machine code form JS right?
got the machine code for this?
Yes, you can enable JavaScriptCore's logging of compilation. But it is very hard to actually catch the code you'd like. (It spews stuff faster than you can consume it.) I'll try and see š
maybe writing a C program is easier
Yeah, this is probably a few lines of C
Here is the tweet @tbaldridge referred to https://twitter.com/mfikes/status/935527618274873344
In short if you
JSC_dumpDisassembly=true planck src/advent_day_5/core.cljc
in https://github.com/mfikes/advent-day-5, you will see lots of disassemblyPerhaps this is some of the assembly JavaScriptCore is creating for our loop, but it is hard for me to tell: https://gist.github.com/mfikes/5c68510b6ab2207b85cc92697a005a3e
Iām trying a C program now, about 195 ms on my machine
@borkdude Did you compile it with optimizations enabled?
No, I just did gcc -o day05 day05.c && ./day05
, what optimizations do you suggest?
ah 57 ms with -O3
š
thatās probably all the speed youāre gonna get
It's remarkable we are able to produce pretty much the same result with some carefully crafted Clojure.
FWIW I tried this one: https://github.com/vesche/adventofcode-2017/blob/master/day05.c
this has got to be the best channel on slack right now
I can reproduce the 57 ms. I also tried inlining the input data array (instead of having the program read it from disk), but it still runs at the same speedādominated by the loop.
I also applied the zero checking trick to the C program, but it didnāt help there
an extra branch (simple boolean/int check in C) is more costly on the lower level than just doing the same loop all the time?
Talking about this one:
while (current <= 1036 && current >= 0) {
move = numbers[current];
/* if (move == 0) { */
/* numbers[current] = 2; */
/* steps += 2; */
/* current += 1; */
/* } */
/* else */
{
if (move >= 3)
numbers[current] -= 1;
else
numbers[current] += 1;
current += move;
steps += 1;
}
}
With code uncommented, it becomes slower, not faster.I also tried to move the >= 3
branch to the first place, but that becomes even slower
Calling it a night. Tomorrow is another puzzleā¦
branch prediction at work most likely
easier to to calculate something than do a conditional