adventofcode

Happy Advent 2020! Please put answers in the pinned threads or create one if it does not exist yet. | https://github.com/adventofcode-clojurians/adventofcode-clojurians | Join the private leaderboard with code 217019-4a55b8eb
bhauman 2017-12-05T01:27:05.000019Z

wow I love how everyone is getting into it this year

bhauman 2017-12-05T01:46:26.000152Z

@grzm I'm digging how you generated the direction list for day3

2017-12-05T02:38:07.000075Z

sent a PR for readme!

2017-12-05T04:16:56.000124Z

being brand new to Clojure and trying to figure out AoC is incredibly frustrating haha

fellshard 2017-12-05T05:12:20.000028Z

Hmmmmm. Day 5 Part B is taking an awful long time to run with my soln

fellshard 2017-12-05T05:13:20.000117Z

Probably should've gone mutable or something

fellshard 2017-12-05T05:14:08.000099Z

... oooooh

2017-12-05T05:22:46.000086Z

not too tricky, 2nd part took a while to execute

fellshard 2017-12-05T05:24:32.000086Z

Mine's not finishing, even giving it a fair chunk of time...

2017-12-05T05:24:45.000062Z

don't forget you can go out both ends of the list

2017-12-05T05:31:48.000113Z

switching to a transient for the "instruction tape" sped it up a bit

2017-12-05T05:33:03.000080Z

..actually a lot .. 2s vs 11s

fellshard 2017-12-05T05:39:12.000128Z

I've got an infinite loop or something in mine

fellshard 2017-12-05T05:39:43.000159Z

Ostensibly it should always exit, all instructions gravitate towards +3

fellshard 2017-12-05T05:39:55.000017Z

Checking for both bounds

fellshard 2017-12-05T05:46:19.000175Z

In the end, I made a mistake by using http://repl.it

fellshard 2017-12-05T05:46:26.000001Z

-_-

fellshard 2017-12-05T05:47:32.000065Z

Lightning fast on local

šŸ¦œ 1
dpsutton 2017-12-05T06:41:53.000079Z

that was fun. did we all just use maps and update?

erwin 2017-12-05T06:45:23.000155Z

I started my solution with also updating the new location, made the problem a lot harder and incorrect šŸ˜• Part B took some time indeed

dpsutton 2017-12-05T06:46:48.000108Z

(time (solve2)) "Elapsed time: 22630.602584 msecs" for map with update

dyankowsky 2017-12-05T06:56:11.000057Z

Heading to bed, but I used a vector and update, no transients, and part 2 took about 10s to run for me.

cjmurphy 2017-12-05T07:09:39.000003Z

I used iterate - easy to think about a state -> state function and an exit condition on that state.

borkdude 2017-12-05T07:10:22.000012Z

I want to start on day 5 but suddenly Iā€™ve got CIDER problemsā€¦

borkdude 2017-12-05T07:15:30.000044Z

Still managed with inf-clojure šŸ™‚

val_waeselynck 2017-12-05T08:12:45.000161Z

@dpsutton I used an int-array, took about 120ms to solve 2 šŸ™‚

2017-12-05T08:13:55.000354Z

mine is taking forever šŸ˜„

val_waeselynck 2017-12-05T08:16:14.000149Z

So yeah, Clojure is fast šŸ˜‡

2017-12-05T08:41:39.000416Z

wooo, itā€™s finally finished! Probably took around 10 minutes, though I was running it through a REPL instance in Visual studio code.

2017-12-05T08:45:32.000257Z

Using update-in and a vector - Part 2 took 36 seconds on my little computer.

2017-12-05T08:54:12.000002Z

This is my slow solution: https://pastebin.com/Cj4VZVR3 Iā€™m new to Clojure, if anyone can see something Iā€™m doing thatā€™s really stupid (and maybe explains the slowness) thatā€™d be very helpful

val_waeselynck 2017-12-05T08:56:44.000028Z

@andofryn you should inline inc-accordingly, when you define it as a function it will convert primitive ints to objects

val_waeselynck 2017-12-05T08:59:03.000252Z

@andofryn and maybe more importantly, you should probably type-hint nrs with ^ints nrs

val_waeselynck 2017-12-05T08:59:46.000021Z

evaluate (set! *warn-on-reflection* true) before running your code to see if that's the case

2017-12-05T09:12:22.000144Z

thanks for the suggestions, very interesting! Iā€™m missing the understanding of why those things make a difference though, Iā€™ll do some googling to find that out šŸ™‚

2017-12-05T09:12:57.000021Z

I am indeed getting some warnings after setting warn-on-reflection

2017-12-05T09:12:59.000001Z

null:12 recur arg for primitive local: current_index is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: current-index
Reflection warning, projects/clojure/advent-of-code/day5/solution.clj:8:37 - call to static method alength on clojure.lang.RT can't be resolved (argument types: unknown).
Reflection warning, projects/clojure/advent-of-code/day5/solution.clj:10:21 - call to static method aget on clojure.lang.RT can't be resolved (argument types: int, int).

borkdude 2017-12-05T09:16:53.000328Z

How can I get rid of this warning:

Boxed math warning, ... - call: public static boolean <http://clojure.lang.Numbers.lt|clojure.lang.Numbers.lt>(java.lang.Object,long).
Trying mutable arrays for speedup

orestis 2017-12-05T09:24:29.000073Z

I used vectors and assoc, ā€œElapsed time: 23169.810878 msecsā€ for part 2.

borkdude 2017-12-05T09:25:32.000027Z

Mine takes about 5 seconds

borkdude 2017-12-05T09:28:25.000053Z

I have one variant with a vector, a transient and an array: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj Iā€™m still wondering why the array version is so slow

orestis 2017-12-05T09:30:36.000235Z

@val_waeselynck Do you have your solution somewhere?

2017-12-05T09:33:06.000033Z

yeah, transient seems fastest for me too:

advent.2017.day5&gt; (c/quick-bench (solve2 input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 6.091951 sec
    Execution time std-deviation : 254.233063 ms
   Execution time lower quantile : 5.895512 sec ( 2.5%)
   Execution time upper quantile : 6.518767 sec (97.5%)
                   Overhead used : 1.887221 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
advent.2017.day5&gt; (c/quick-bench (solve2-int-array input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.403268 sec
    Execution time std-deviation : 18.226029 ms
   Execution time lower quantile : 1.381179 sec ( 2.5%)
   Execution time upper quantile : 1.419574 sec (97.5%)
                   Overhead used : 1.887221 ns

advent.2017.day5&gt; (c/quick-bench (solve2-transient input)))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.306124 sec
    Execution time std-deviation : 15.056565 ms
   Execution time lower quantile : 1.291127 sec ( 2.5%)
   Execution time upper quantile : 1.320360 sec (97.5%)
                   Overhead used : 1.887221 ns

val_waeselynck 2017-12-05T09:34:19.000178Z

@andofryn you have a bug actually, you want to do (alength nrs)

2017-12-05T09:35:46.000143Z

oh maybe it was not casting to int in the comparison

2017-12-05T09:36:02.000326Z

doing so gives:

advent.2017.day5&gt; (c/quick-bench (solve2-int-array input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.393645 sec
    Execution time std-deviation : 10.345934 ms
   Execution time lower quantile : 1.381836 sec ( 2.5%)
   Execution time upper quantile : 1.404963 sec (97.5%)
                   Overhead used : 1.887221 ns

borkdude 2017-12-05T09:36:34.000283Z

ah, alength instead of count

val_waeselynck 2017-12-05T09:39:04.000252Z

Some performance tips: - Vars lookups are slower and harder to optimize than locals lookups - Make sure to type hint primitives and primitive arrays when necessary - Primitive values (ints, bytes, booleans etc.) cannot travel across functions (they get boxed into objects) - Protocols are way faster than multimethods - Looking up a key in a record type is 5 to 10x faster than looking it up in a plain map

2017-12-05T09:39:07.000166Z

length doesn't change so i set that in a let before the loop.. for me it was the comparison not being (int 3) which was slowing things down

val_waeselynck 2017-12-05T09:41:00.000017Z

well not a bug actually, but probably not what you intended either

šŸ‘ 1
borkdude 2017-12-05T09:47:29.000199Z

ok, fixed: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L54

borkdude 2017-12-05T09:47:31.000245Z

240ms

orestis 2017-12-05T09:52:52.000303Z

@val_waeselynck I modified your version a bit, almost halved the execution time (650ms -> 380ms) - just moved the alength call outside.

orestis 2017-12-05T09:53:59.000481Z

I thought I would be able to typehint a single ^int but I get this: https://stackoverflow.com/questions/15230061/cant-type-hint-a-local-with-a-primitive-initializer

orestis 2017-12-05T09:54:17.000125Z

Adding the (int i) in comparisons actually makes things slower.

2017-12-05T09:55:17.000314Z

wow, borkdude's version smokes mine

2017-12-05T09:55:57.000461Z

advent.2017.day5&gt; (c/quick-bench (bork-int-arr input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 195.021813 ms
    Execution time std-deviation : 516.193459 Āµs
   Execution time lower quantile : 194.408773 ms ( 2.5%)
   Execution time upper quantile : 195.624313 ms (97.5%)
                   Overhead used : 1.887221 ns

2017-12-05T09:57:29.000465Z

(defn solve2-int-array [input]
  (let [i (into-array Integer/TYPE input)
        len (count input)]
    (loop [idx (int 0) c (int 0)]
      (if (or (&gt; 0 idx) (&lt;= len idx)) c
          (let [v (aget ^ints i idx)]
            (if (&lt;= (int 3) v)
              (aset-int i idx (dec v))
              (aset-int i idx (inc v)))
            (recur (+ idx v)
                   (inc c)))))))

2017-12-05T09:57:43.000032Z

Any ideas of where to type hint that?

orestis 2017-12-05T09:58:35.000486Z

@borkdudeā€™s version takes 1000ms on my machineā€¦ Is running this inside a CIDER repl a bad idea?

borkdude 2017-12-05T09:58:53.000082Z

@minikomi put this in your code:

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)

2017-12-05T09:59:17.000523Z

I have both, but neither is complaining

borkdude 2017-12-05T09:59:18.000056Z

@orestis Did you pull the most recent?

borkdude 2017-12-05T09:59:43.000507Z

@minikomi Maybe at ^ints before i in let

borkdude 2017-12-05T10:00:03.000404Z

@minikomi also donā€™t use count, but alength

borkdude 2017-12-05T10:00:27.000275Z

@minikomi also donā€™t use aset-int, I discovered thatā€™s slower as well

2017-12-05T10:01:17.000426Z

OK making those changes

2017-12-05T10:01:20.000030Z

sped up some

orestis 2017-12-05T10:01:26.000406Z

@borkdude Just did, got it to 600ms.

borkdude 2017-12-05T10:01:41.000546Z

@orestis what kind of machine are you on?

2017-12-05T10:01:44.000335Z

woah

orestis 2017-12-05T10:01:46.000280Z

I am using int-array instead of into-array, would that make a difference?

2017-12-05T10:01:47.000138Z

sped up a ton

2017-12-05T10:01:47.000397Z

haha

orestis 2017-12-05T10:02:15.000574Z

@borkdude 2014 Macbook Pro, 2.8GHz i5

borkdude 2017-12-05T10:02:34.000021Z

@orestis Are you running exactly my code or an adaptation ?

orestis 2017-12-05T10:03:28.000308Z

Tiny adaptation; I create the array with int-array, and donā€™t have the reducible-resource stuff; but I do this for all the adaptations.

2017-12-05T10:04:05.000193Z

the ^ints in the let by itself took me from 1.3s -> 98ms

orestis 2017-12-05T10:04:23.000335Z

@borkdude Have a look at https://github.com/orestis/adventofcode/blob/6537d8dda0825871fdd9acad0daf95570d82ef87/clojure/aoc/src/aoc/2017_day5.clj#L95 ; itā€™s the fastest in this file.

2017-12-05T10:04:38.000293Z

weird.. it's like, make an array of type integer (by the way, it's an integer array!)

borkdude 2017-12-05T10:05:15.000218Z

@orestis thatā€™s effectively the same as mine?

borkdude 2017-12-05T10:05:34.000514Z

oh yeah int-array

orestis 2017-12-05T10:06:30.000191Z

I wonder if try/catch OutOfBounds exception would work.

borkdude 2017-12-05T10:06:30.000307Z

btw I pushed a new version, but the performance is the same

orestis 2017-12-05T10:07:01.000238Z

BTW, everyone has different inputs so we need a baseline if we are going optimize this silly thing šŸ™‚

2017-12-05T10:07:26.000218Z

good point šŸ˜†

borkdude 2017-12-05T10:07:41.000008Z

haha oh yeah

borkdude 2017-12-05T10:07:48.000432Z

my input file is on github as well

borkdude 2017-12-05T10:08:35.000089Z

tfw when you get out of bed somewhat earlier for Advent of Code, but CIDER wonā€™t connect to your nrepl anymore and you have to fall back on inf-clojure ā€¦

orestis 2017-12-05T10:09:34.000373Z

Is there any project that could easily take those different implementations and run proper benchmarks etc? Not sure how warm/cold JIT makes any difference here.

2017-12-05T10:10:35.000178Z

after more poking around, aset vs aset-int does indeed seem to be the culprit.

2017-12-05T10:11:07.000032Z

aset:

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 102.748740 ms
aset-int:
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.391871 sec

borkdude 2017-12-05T10:11:11.000015Z

At this point you might want to write it in Java and just call from Clojure šŸ˜‰

2017-12-05T10:11:48.000342Z

true. still fun to overthink šŸ™‚

orestis 2017-12-05T10:13:58.000172Z

@borkdude Well, the ā€œnaiveā€ Clojure version that uses a vector can look really similar to the one that uses int arrays. Still a 10 line function anyway šŸ™‚

borkdude 2017-12-05T10:15:28.000033Z

true, but dealing with things like ā€œshould I use aset or aset-intā€ isnā€™t fun imho

borkdude 2017-12-05T10:16:01.000039Z

why so slow, aset-int

orestis 2017-12-05T10:18:35.000159Z

Thatā€™s because yā€™all know too many core functions for your own good šŸ™‚

2017-12-05T10:43:55.000155Z

good catch!

val_waeselynck 2017-12-05T11:32:47.000095Z

@orestis Do you mean that moving the alength call outside improved the perf of my version by 2x?

2017-12-05T12:23:04.000060Z

Is there a usecase for aset-int if it is slower than aset? Or is it just a legacy function?

borkdude 2017-12-05T12:23:54.000008Z

@chrisblom very good question

mfikes 2017-12-05T12:24:42.000210Z

It's looking like loop / recur is the most popular for day 5, followed by iterate

borkdude 2017-12-05T12:26:05.000255Z

I usually start with reduce, but when I need multiple things as the argument of the reducing function, I give up and fall back on loop

2017-12-05T12:26:16.000484Z

if i macro expand the definition and clean it up a bit there are only 2 differences: aget has :inline metadata and uses .set instead of .setInt

mfikes 2017-12-05T12:26:17.000247Z

I actually used reduce (on (range) as the step counter)

borkdude 2017-12-05T12:27:48.000179Z

@mfikes Yeah, I think loop can mostly be rewritten to a reduce with destructuring

borkdude 2017-12-05T12:28:30.000051Z

@mfikes Neat that youā€™re doing it with cljc now

mfikes 2017-12-05T12:29:12.000203Z

The nice trick I missed is relying on getto return nil, as opposed to pre-checking the index

borkdude 2017-12-05T12:29:35.000090Z

@mfikes That trick worked for me, until I got to primitive arrays

mfikes 2017-12-05T12:29:50.000177Z

(You can learn so much by studying the range of solutions. šŸ™‚ )

mfikes 2017-12-05T12:30:48.000222Z

Yeah, that's the story. You can write clean Clojure, and then when you want it to go faster, you delve into messiness. (I suppose that's true in any language.)

mfikes 2017-12-05T12:31:28.000421Z

At least in Clojure we can really push hard on the "clean" end of the spectrum.

borkdude 2017-12-05T12:32:47.000006Z

Well, Scala really is better (not the right word, I must say, friendlier/easier) at this kind of stuff, while still allowing you to do functional. I think the type system helps to generate more primitive stuff if needed.

borkdude 2017-12-05T12:33:19.000382Z

But I only need mutability this during Advent of Code really, and then only for extra performance, not for getting the answer, so šŸ™‚

mfikes 2017-12-05T12:35:47.000395Z

Ahh, right. Does Scala essentially generate code like C++ does when instantiating templates?

mfikes 2017-12-05T12:36:58.000084Z

(The closest I think I've seen Clojure come to this is how ClojureScript has macro variants of functions that can try to generate optimal code based on what is available at compile time.)

borkdude 2017-12-05T12:46:18.000181Z

@mfikes Donā€™t know, I just didnā€™t have as much trouble with it as in Clojure: https://gist.github.com/borkdude/b37563939639b40013f483361a7c5d8f

borkdude 2017-12-05T12:47:09.000068Z

(I was just learning Scala back then for fun)

borkdude 2017-12-05T12:50:28.000161Z

(@mfikes http://www.drmaciver.com/2008/06/scala-arrays/)

mfikes 2017-12-05T12:59:02.000184Z

An example of how (I suppose essentially anything) can be done with macros is how str in ClojureScript passes the literal strings in something like (str "a" 1 "b") untouched. (I think in Clojure it may still end up invoking (.toString "a") and (.toString "b").)

mfikes 2017-12-05T13:00:23.000151Z

That may not be the best example, but it illustrates that anything can be done at compile time with macros if you are trying to eek out perf for the generated code.

borkdude 2017-12-05T13:02:00.000060Z

@mfikes How does this relate to primitive array performance?

mfikes 2017-12-05T13:03:46.000013Z

I'm just saying that if Scala can do nice things with types, Clojure may also be capable of the same if you go down the road of using macros (and if you have access to inferred types).

borkdude 2017-12-05T13:04:14.000122Z

With the macro approach you can get rid of some calls like (str "foo" "bar"), but will it get you very far, I donā€™t know?

mfikes 2017-12-05T13:05:21.000535Z

I suppose Scala has the benefit of much richer type inference.

borkdude 2017-12-05T13:05:31.000261Z

There was also a talk about this topic at EuroClojure 2017: http://2017.euroclojure.org/nada-amin/

borkdude 2017-12-05T13:06:50.000097Z

where the type system was used to generate fast low level code: https://www.youtube.com/watch?v=QuJ-cEvH_oI

mfikes 2017-12-05T13:08:01.000381Z

I haven't thought about this too much, being on the JavaScript side of things šŸ™‚ In that world, the engine watches the values fly by and infers their types, dynamically generating optimal machine code, and de-opting if it sees a value that violates assumptions, etc.

mfikes 2017-12-05T13:08:52.000254Z

The end effect is that where in Clojure you can manually hint, in ClojureScript, you need to let the lower layers do it for you.

borkdude 2017-12-05T13:11:38.000143Z

Yeah, I kind of wondered why you donā€™t need to add type hints in ClojureScript, so thatā€™s how it works

borkdude 2017-12-05T13:19:14.000338Z

PureScript solution: https://github.com/krisajenkins/AdventOfCode/blob/master/src/Year2017/Day5.purs

mfikes 2017-12-05T13:20:01.000066Z

The primary type hints in ClojureScript are ones that help avoid checked if: If the compiler knows it is dealing with a Boolean value for example, it can just use JavaScript's notion of truthiness.

borkdude 2017-12-05T13:25:50.000048Z

@mfikes Do you perhaps have a clojurescript version of day 5 which runs on Node, to compare it with the performance of PureScript day 5?

mfikes 2017-12-05T13:26:28.000362Z

I'll convert it to Lumo real quick

mfikes 2017-12-05T13:26:42.000298Z

Or are you interested in Closure-optimized code?

mfikes 2017-12-05T13:26:58.000003Z

I could write that fairly easily.

borkdude 2017-12-05T13:27:21.000398Z

does it help for performance? and mutable array version is fine

mfikes 2017-12-05T13:28:07.000488Z

I can take your fastest mutable array version, and produce an optimized Node-compatible JavaScript file for you to use to compare with PureScript.

mfikes 2017-12-05T13:28:18.000191Z

What is your recommended mutable array version?

borkdude 2017-12-05T13:32:44.000008Z

I donā€™t about the absolute fastest, but this is my fastest: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L54

mfikes 2017-12-05T13:43:24.000119Z

Cool. I'm putting together a GitHub repo so it is possible to revise the code if needed and build and run it in Node

borkdude 2017-12-05T13:44:12.000237Z

@mfikes I only asked if you had this laying around, not if you wanted to put time in it, but if you like to do it, still thanks šŸ™‚

mfikes 2017-12-05T13:44:22.000546Z

Nah, it is close to trivial

mfikes 2017-12-05T14:16:44.000355Z

@borkdude For me, ClojureScript / Node.js is running that array-based solution 25% faster than Clojure. Here it is, set up so it is trivial for anyone to reproduce perf results or change the code: https://github.com/mfikes/advent-day-5-cljs-node

borkdude 2017-12-05T14:17:25.000087Z

@mfikes Awesome, Iā€™ll try

mfikes 2017-12-05T14:22:24.000469Z

Interestingly Planck runs it at about the same speed as the Closure optimized JavaScript under Node.js. (JavaScriptCore is wicked fast.)

borkdude 2017-12-05T14:26:23.000366Z

I get 240 ms on the JVM, 400 on Node

borkdude 2017-12-05T14:28:24.000245Z

hmm, weird enough, in lein I get 800 ms.. itā€™s slower than from my REPL it seems

val_waeselynck 2017-12-05T14:33:30.000388Z

Maybe the JVM didn't have enough time to warm up. V8 is probably more optimized than the JVM for quick warmup

borkdude 2017-12-05T14:42:50.000350Z

@mfikes boot: 240 ms, lein 800 ms?

mfikes 2017-12-05T14:43:23.000810Z

I think I need to add the -server JVM option to the lein setup in that project

mfikes 2017-12-05T14:44:02.000484Z

@val_waeselynck I specifically have the perf test running over and over again to avoid warm up issues. (Good point, though šŸ™‚ )

orestis 2017-12-05T14:45:50.000080Z

I think that might be it, yes. Iā€™m still a noob so Iā€™m not really sure what Iā€™m doing there; type hinting is a bit opaque, but I donā€™t get any reflection warnings soā€¦

val_waeselynck 2017-12-05T14:45:50.000502Z

@mfikes yeah I noticed but I'm wondering if that's enough, Criterium typically seems to do much more sophisticated things to take both warmup and GC into account :thinking_face:

borkdude 2017-12-05T14:45:56.000197Z

Ooh even faster with your input (I forgot the inputs differs)

mfikes 2017-12-05T14:46:27.000072Z

@val_waeselynck I agree. But we have a huge perf diff right now to sort out šŸ™‚

mfikes 2017-12-05T14:47:46.000237Z

Yeah, it needs -server

borkdude 2017-12-05T14:47:49.000527Z

really weird, I run both tests from the repl. Exact same code now.

mfikes 2017-12-05T14:47:50.000304Z

Patching the repo

borkdude 2017-12-05T14:47:59.000636Z

does boot run with server by default maybe?

borkdude 2017-12-05T14:48:48.000444Z

or maybe thatā€™s the default and lein adds client

mfikes 2017-12-05T14:49:26.000194Z

Yeah, lein run evidently wants to minimize startup latency

borkdude 2017-12-05T14:50:41.000247Z

yeah, now it works šŸ™‚ 180ms

borkdude 2017-12-05T14:52:41.000443Z

@mfikes Thanks, that was a nice adventure

mfikes 2017-12-05T14:53:41.000287Z

Yeah, with any perf tests you really really need to provide the code so others can try it as well šŸ™‚

val_waeselynck 2017-12-05T14:54:01.000713Z

@borkdude so what are your numbers now?

borkdude 2017-12-05T14:54:23.000372Z

@val_waeselynck same as mikeā€™s repo, ~ 180 ms on my machine in the JVM

mfikes 2017-12-05T14:55:53.000283Z

My numbers are Clojure: 167 ms, ClojureScript, Closure / Node.js: 384 ms Planck: 570 ms

val_waeselynck 2017-12-05T14:56:10.000449Z

Alright, thanks!

borkdude 2017-12-05T14:57:46.000661Z

Some guy on Reddit had 75 ms using C++ā€¦ weā€™re not done yet šŸ˜‰

borkdude 2017-12-05T14:58:00.000655Z

Maybe we can write a Clojure program which generates C++ and run that one

borkdude 2017-12-05T14:58:18.000580Z

Didnā€™t Rich write C++ like that as well in Common Lisp? šŸ˜‰

val_waeselynck 2017-12-05T14:59:05.000242Z

Yes, but was it the same input? šŸ™‚

borkdude 2017-12-05T14:59:14.000592Z

Assembly would probably be not so difficult for this problem

borkdude 2017-12-05T14:59:21.000269Z

or maybe straight JVM bytecode?

borkdude 2017-12-05T15:00:41.000720Z

well, twice the speed of C++ ainā€™t bad

val_waeselynck 2017-12-05T15:01:23.000709Z

especially when you haven't compiled it with heavy optimizations

val_waeselynck 2017-12-05T15:02:03.000676Z

JIT is probably the best strategy for low latency (including the whole write + debug + compile + run process)

bhauman 2017-12-05T15:06:55.000395Z

so why does the array implementation take so long???

bhauman 2017-12-05T15:10:14.000112Z

checking it out with ints instead of longs

dyankowsky 2017-12-05T15:10:45.000700Z

@mfikes AFAIK, Scala's generics aren't like C++ templates. Scala's generics are mostly the same as Java's generics - type erasure and boxing. There are conveniences, like implicit parameters to capture class tokens. But they're fundamentally not a code generation mechanism, as C++ templates are.

mfikes 2017-12-05T15:13:25.000229Z

@bytecrafter Cool. I vaguely (perhaps mis-) recall something in Scala generating code covering the possible permutations of primitives, but that was perhaps 5 years ago for me.

mfikes 2017-12-05T15:14:29.000221Z

Ahh, "specializing" is what I recall. For example http://www.scala-notes.org/2011/04/specializing-for-primitive-types/

bhauman 2017-12-05T15:15:36.000216Z

@mfikes are you guys using the same set of inputs for benchmarking?

mfikes 2017-12-05T15:15:52.000571Z

No, we realized that this causes a problem @bhauman

mfikes 2017-12-05T15:16:11.000472Z

At least, we caught one instance recently were we forgot the data changes. šŸ™‚

bhauman 2017-12-05T15:16:20.000555Z

the final solution should determine the complexity

bhauman 2017-12-05T15:16:42.000285Z

mine is in the 28 million

bhauman 2017-12-05T15:16:44.000164Z

range

mfikes 2017-12-05T15:16:58.000336Z

I haven't been too involved in the perf game, until some fun stuff this morning

bhauman 2017-12-05T15:17:26.000566Z

well my fastest is 5 seconds! where as yours is in the ms

mfikes 2017-12-05T15:17:30.000252Z

My day 5 result was 24,315,397

bhauman 2017-12-05T15:17:44.000461Z

ok so same complexity

bhauman 2017-12-05T15:17:53.000125Z

darn it son

mfikes 2017-12-05T15:18:00.000376Z

My reduced-based solution using persistent data structures takes about 5 seconds

mfikes 2017-12-05T15:18:19.000665Z

That benchmark repo is banging on primitive arrays

mfikes 2017-12-05T15:18:31.000365Z

It is Java in disguise šŸ™‚

bhauman 2017-12-05T15:19:17.000506Z

my array implementation performs much worse, so I'm going to take this opportunity to learn some things ....

mfikes 2017-12-05T15:20:00.000279Z

I optimized for readability, not worrying about perf: https://github.com/mfikes/advent-of-code/blob/master/src/advent_2017/day_05.cljc#L14-L19

borkdude 2017-12-05T15:20:06.000357Z

Came up with another perf optimization, but didnā€™t help: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L127 Still 232 ms

borkdude 2017-12-05T15:23:22.000100Z

@bhauman Didnā€™t see your solution, but aset-int are not so good for performance, we found out today

borkdude 2017-12-05T15:23:50.000303Z

@bhauman Also:

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
helps

bhauman 2017-12-05T15:25:37.000028Z

oh dems some tricks

bhauman 2017-12-05T15:25:42.000579Z

cool I just posted mine

val_waeselynck 2017-12-05T15:34:02.000690Z

@bhauman According to the docs, I think you probably want (int ...) instead of ^int ... https://clojure.org/reference/java_interop#optimization

2017-12-05T15:34:07.000381Z

@borkdude mine is similar, but i'm not passing the mutable array via recur, runs in 130ms on my machine: https://github.com/ChrisBlom/advent-of- @code/blob/master/src/adventofcode/2017/day05.clj#L16-L33

bhauman 2017-12-05T15:34:30.000148Z

yeah thanks I'm still working on the scratch area

bhauman 2017-12-05T15:34:38.000618Z

just added that thanks!

borkdude 2017-12-05T15:34:43.000523Z

Maybe try the same input for comparison?

mfikes 2017-12-05T15:35:00.000730Z

@borkdude What kind of black magic is involved in annotating a local as a primitive ^int but treating it as a Boolean?

bhauman 2017-12-05T15:35:00.000738Z

I never have to optimize stuff like this so this is kind of a blast

borkdude 2017-12-05T15:35:16.000200Z

I tried without passing the array in the loop, but it did not matter I think

mfikes 2017-12-05T15:35:21.000590Z

Hmm. Something to learn there...

borkdude 2017-12-05T15:35:23.000161Z

itā€™s just a reference right

mfikes 2017-12-05T15:36:02.000673Z

I'm referring to the annotation on this line https://github.com/mfikes/advent-day-5/blob/master/src/advent_day_5/core.cljc#L17

borkdude 2017-12-05T15:36:29.000313Z

@mfikes The bound value is an int, because and returns the int. But if itā€™s non-true, it wonā€™t be bound at all

mfikes 2017-12-05T15:36:46.000024Z

Ahh. Thank you. I need to macroexpand to see that.

mfikes 2017-12-05T15:36:58.000026Z

Hah!

borkdude 2017-12-05T15:40:58.000899Z

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
(if-let [^int x (and true 1)] (inc x) :else) ;; good
(if-let [x (and true 1)] (inc x) :else) ;; bad, boxed
(macroexpand ā€™(if-let [^int x (and true 1)] (inc x) :else))
(let* [temp__4655__auto__ (and true 1)] (if temp__4655__auto__ (clojure.core/let [x temp__4655__auto__] (inc x)) :else))

borkdude 2017-12-05T15:41:32.000445Z

it seems macroexpand doesnā€™t show the type hints?

val_waeselynck 2017-12-05T15:44:23.000229Z

@bhauman I'm currently writing a home-made GraphQL engine, so I'm totally in the performance business right now šŸ™‚

bhauman 2017-12-05T15:45:15.000241Z

oh darn, sounds fun?????

borkdude 2017-12-05T15:47:47.000673Z

it will if you (set! *print-meta* true)

val_waeselynck 2017-12-05T15:48:29.000542Z

it is šŸ™‚ actually not exactly a GraphQL engine, since it's an alternative to GraphQL with some added features.

dyankowsky 2017-12-05T15:49:08.000261Z

@mfikes Interesting, I didn't know about that in Scala. Thanks!

val_waeselynck 2017-12-05T15:49:15.000945Z

However, I guess the core server-side algorithm could be reused to implement both GraphQL engines and Om Next routers

mfikes 2017-12-05T15:49:17.000199Z

Is [1 -3] a legitimate input? (In other words, can it exit by going negative?)

borkdude 2017-12-05T15:49:49.000913Z

@mfikes I thought about that, but since I got the answer, I didnā€™t worry about it anymore šŸ˜‰

mfikes 2017-12-05T15:50:04.000605Z

OK. Cool.

borkdude 2017-12-05T15:52:54.000663Z

This is just awesome. Advent of Code in pure Postgresā€¦ https://github.com/xocolatl/advent-of-code-2017/blob/master/Day05/script.sql

mfikes 2017-12-05T15:55:07.000098Z

@borkdude I got yours to run in 80 ms. I'll put the intersting change in a branch

borkdude 2017-12-05T15:55:18.000901Z

wowie

borkdude 2017-12-05T15:55:35.000313Z

that is very close to the C++ version

mfikes 2017-12-05T15:56:36.000168Z

I was trying to eliminate a checked if in what ClojureScript produces, and got it to run faster, but it also made the Clojure faster (if everything pans out)

borkdude 2017-12-05T15:57:37.000162Z

if this works, we have world domination

mfikes 2017-12-05T16:01:50.000504Z

It is actually around 87 ms

borkdude 2017-12-05T16:02:08.000200Z

veeery nice, letā€™s see if I understand

mfikes 2017-12-05T16:02:12.000495Z

Wow. Same speed in Node

borkdude 2017-12-05T16:02:21.000605Z

still the correct answer?

mfikes 2017-12-05T16:02:27.000706Z

24315397

borkdude 2017-12-05T16:04:41.000029Z

@mfikes I donā€™t understand why you need the if-some at all there

borkdude 2017-12-05T16:05:12.000176Z

(aget maze cur-pos) should always return something when (&lt; cur-pos length)

mfikes 2017-12-05T16:05:23.000578Z

OK. Let me address that šŸ™‚

bhauman 2017-12-05T16:08:05.000061Z

OK I got it in the ball park

bhauman 2017-12-05T16:08:13.000619Z

by switching to aget

borkdude 2017-12-05T16:08:31.000359Z

@mfikes turned it to a let, seems to help a bit

borkdude 2017-12-05T16:08:46.000434Z

so, if-let made it slower.. hmm

mfikes 2017-12-05T16:08:58.000290Z

Perhaps crappy expansion?

mfikes 2017-12-05T16:09:00.000267Z

Hmm

mfikes 2017-12-05T16:13:30.000061Z

It is surprising that ternary &lt; in Clojure botches the perf by an order of magnitude, but not so in ClojureScript. Specificially, trying to do the right thing and testing (&lt; -1 cur-pos length) instead of just (&lt; cur-pos length) makes it run in 900 ms instead of 87 ms.

mfikes 2017-12-05T16:15:23.000814Z

Anyway, I put the 87 ms-version on master https://github.com/mfikes/advent-day-5/blob/master/src/advent_day_5/core.cljc#L8-L27

borkdude 2017-12-05T16:15:35.000483Z

Well, I could in fact shave of a couple of ms with this trick: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L115

mfikes 2017-12-05T16:16:24.000566Z

Oh, so check for zero and then optimize for that case?

borkdude 2017-12-05T16:17:25.000340Z

@mfikes with your data input 78 ms

borkdude 2017-12-05T16:17:29.000156Z

yes

mfikes 2017-12-05T16:17:40.000195Z

Wow. I'll apply that one! We must dominate.

mfikes 2017-12-05T16:18:04.000597Z

Or, if you want I can add you as a committer and you can push your change in.

borkdude 2017-12-05T16:18:24.000346Z

ok, add me šŸ™‚

bhauman 2017-12-05T16:19:46.000478Z

you could check for negative as well right?

mfikes 2017-12-05T16:21:23.000513Z

I'm not seeing why it replaces 0 with 2. Should it be 1 instead?

borkdude 2017-12-05T16:21:34.000125Z

nope

borkdude 2017-12-05T16:21:59.000200Z

but maybe this is no good, I donā€™t see a real change when applied to your benchmark with dotimes

mfikes 2017-12-05T16:22:08.000531Z

ok

borkdude 2017-12-05T16:22:18.000012Z

Run 58 of 60.
ā€œElapsed time: 102.893639 msecsā€
24315397
Run 59 of 60.
ā€œElapsed time: 90.53835 msecsā€
24315397
Run 60 of 60.
ā€œElapsed time: 100.259889 msecsā€
24315397
---
Run 1 of 60.
ā€œElapsed time: 129.65845 msecsā€
24315397
Run 2 of 60.
ā€œElapsed time: 104.405081 msecsā€
24315397
Run 3 of 60.
ā€œElapsed time: 96.121724 msecsā€

mfikes 2017-12-05T16:22:58.000361Z

Oh, I see why 2. You are trying to skip a loop cycle?

mfikes 2017-12-05T16:23:14.000426Z

Clever

mfikes 2017-12-05T16:23:50.000344Z

Hence the (+ steps 2)

borkdude 2017-12-05T16:24:11.000837Z

with criterium I do see a difference of about 4-5 ms

borkdude 2017-12-05T16:24:16.000513Z

which may not be significant

borkdude 2017-12-05T16:24:37.000174Z

@bhauman negatives?

borkdude 2017-12-05T16:33:19.000407Z

@mfikes we have world domination nowā€¦ congrats.

borkdude 2017-12-05T16:33:41.000069Z

Posted it under the C++ solution: https://www.reddit.com/r/adventofcode/comments/7hngbn/2017_day_5_solutions/dqt0s5v/ šŸ˜Ž

mfikes 2017-12-05T16:34:16.000334Z

In Node, my reduce-based "normal" solution takes 8 seconds, while the optimized version takes 80 ms in Node.

mfikes 2017-12-05T16:38:10.000322Z

I should post this as an 80-ms JavaScript based solution: https://gist.github.com/mfikes/4ef3d2c3efc8f72a848e9149e1229e84

borkdude 2017-12-05T16:46:31.000305Z

is that the output of the compiler?

mfikes 2017-12-05T16:47:33.000569Z

Yeah! šŸ™‚

borkdude 2017-12-05T16:49:57.000830Z

cool!

borkdude 2017-12-05T16:50:00.000540Z

74ms now

šŸ‘ 1
borkdude 2017-12-05T16:50:25.000234Z

@chrisblom Removed the maze from the loop arguments, another 4 ms šŸ™‚

borkdude 2017-12-05T17:05:14.000776Z

@mfikes Could you try this commit? It seems to be 30 ms faster on Node on my machine than the original: https://github.com/mfikes/advent-day-5/commit/3fdda4ba34d144fb795674f51ba81927c9125d1c

2017-12-05T17:06:31.000396Z

has any tried unrolling the loop yet?

borkdude 2017-12-05T17:08:29.000448Z

@mfikes also pushed a cljc fix, but it worked nonetheless

dpsutton 2017-12-05T17:09:11.000690Z

to unroll a loop you need to know in advance how many times you will loop, no? and that isn't known until you get there

dpsutton 2017-12-05T17:09:36.000465Z

unless you mean make the loop include several iterations with escape hatches in each?

dpsutton 2017-12-05T17:10:34.000771Z

but i thought loop unrolling was a strategy to evade the loop test which we cannot escape so we always pay the branch penalty

mfikes 2017-12-05T17:18:06.000086Z

@borkdude Yes, Node is faster for me with your commit

borkdude 2017-12-05T17:18:28.000543Z

How fast now?

mfikes 2017-12-05T17:20:15.000253Z

Oh wait. It actually went for an original speed of 86 ms to slower at around 93 ms

borkdude 2017-12-05T17:22:12.000407Z

On my machine, branch master:

24315397
Run 57 of 60.
ā€œElapsed time: 122.000000 msecsā€
24315397
Run 58 of 60.
ā€œElapsed time: 136.000000 msecsā€
24315397
Run 59 of 60.
ā€œElapsed time: 125.000000 msecsā€
24315397
Run 60 of 60.
ā€œElapsed time: 128.000000 msecsā€
24315397

borkdude 2017-12-05T17:22:27.000074Z

Branch faster:

ā€œElapsed time: 114.000000 msecsā€
24315397
Run 58 of 60.
ā€œElapsed time: 109.000000 msecsā€
24315397
Run 59 of 60.
ā€œElapsed time: 106.000000 msecsā€
24315397
Run 60 of 60.
ā€œElapsed time: 109.000000 msecsā€
24315397

borkdude 2017-12-05T17:22:55.000712Z

(Node)

mfikes 2017-12-05T17:22:56.000143Z

Right, something was messed up with the squashed commit I put on master. It was causing an inexplicable regression.

mfikes 2017-12-05T17:23:05.000092Z

I've since removed it.

borkdude 2017-12-05T17:25:30.000245Z

I noticed when compiling some code on the background, severely impacted the performance, so these times also depend on what your machine is doing. For some real benchmarking criterium or something is needed.

borkdude 2017-12-05T17:25:48.000169Z

In my repo I used criterium, which gives better time for this commit

borkdude 2017-12-05T17:27:11.000008Z

Maybe node also has something like this

mfikes 2017-12-05T17:28:14.000378Z

FWIW, I fixed the master branch so it now has the better perf number

mfikes 2017-12-05T17:29:05.000244Z

(At least better under Node)

borkdude 2017-12-05T17:29:48.000716Z

@mfikes do you mean this commit is the newest now? https://github.com/mfikes/advent-day-5/commit/45aa111e8df606884f5af4ffd886aa2a8af393eb

mfikes 2017-12-05T17:30:20.000779Z

Yeah, on my instance of Node that one runs faster

borkdude 2017-12-05T17:31:40.000362Z

thatā€™s weird, because the if-some is not neededā€¦ šŸ™‚

borkdude 2017-12-05T17:32:28.000703Z

on my machine itā€™s faster too

mfikes 2017-12-05T17:33:26.000535Z

Right... the if-some should be cleaned up. It is unnecessary, but AFAICT it doesn't hurt perf.

borkdude 2017-12-05T17:33:46.000873Z

when I change the if-some to let it becomes slower

borkdude 2017-12-05T17:33:55.000327Z

WAT

mfikes 2017-12-05T17:34:19.000432Z

I was referring to perf on Node. Let me make the change as well to be sure.

borkdude 2017-12-05T17:34:31.000520Z

@mfikes I was also talking about that

borkdude 2017-12-05T17:36:01.000635Z

With let:

"Elapsed time: 132.000000 msecs"
24315397
Run 59 of 60.
"Elapsed time: 129.000000 msecs"
24315397
Run 60 of 60.
"Elapsed time: 138.000000 msecs"
24315397
With if-some:
Run 57 of 60.
"Elapsed time: 111.000000 msecs"
24315397
Run 58 of 60.
"Elapsed time: 106.000000 msecs"
24315397
Run 59 of 60.
"Elapsed time: 111.000000 msecs"
24315397
Run 60 of 60.
"Elapsed time: 106.000000 msecs"
24315397
(Node)

mfikes 2017-12-05T17:36:36.000141Z

Ahh... indeed, converting it to a let and dropping the else branch makes it slower for me on node than having an if-some... going to look at the generate JavaScript

borkdude 2017-12-05T17:37:01.000261Z

So far we have learned: if-let slows down, but if-some speeds up šŸ˜›

grzm 2017-12-05T17:38:21.000147Z

@bhauman thanks šŸ™‚ I'm amazed at all of the different ways people approach this, and I'm sure what seems obvious to one seems clever to someone else. I'm learning a lot by reading. Trying to figure out a way to make it all stick as efficiently as possible.

šŸ‘ 1
spfeiffer 2017-12-05T18:51:01.000529Z

So AoC leads to the discovery of obscure perf regressionsā€¦great!

borkdude 2017-12-05T18:52:50.000165Z

@spfeiffer We discovered that Node gets speedup from adding obsolete codeā€¦

spfeiffer 2017-12-05T18:55:10.000205Z

@borkdude I think that is a very nice metaphor for the Node ecosystem ā˜ŗļø

mfikes 2017-12-05T19:13:30.000653Z

@borkdude I did a bunch of tests. It seems to boil down to the fact that an extra nil-check, while not free, pays off because it lets the Node optimizer know that it is doing arithmetic on non-`null` values. (We as humans know that it the values pulled out of the array are not nil, but the optimizer doesn't.) In fact, I saw one test run with Lumo where it started off slow, and then "figured it out" on its own, and started running at the faster speed, while in a loop, without me doing anything. I also found that you can get a lesser speedup by changing (aget maze cur-pos) to (int (aget maze cur-pos)). This gets compiled down to JavaScript that involves | 0, which is evidently a ostensibly useless appendage added to JavaScript arithmetic code to "hint" to the optimizer that it is doing integer arithmetic. So, when-some or even a more straightforward (when-not (nil? ...)) ..., while not changing the functionality, ends up acting like a hint, or an assertion that the optimizer can use to its advantage.

borkdude 2017-12-05T19:14:24.000017Z

Interesting and at the same time frightening

mfikes 2017-12-05T19:18:09.000518Z

One interesting question in my mind: In the early days, types were used for performance. Will we reach a point where types are inferred at runtime sufficiently well so that the performance of such code behaves just as well as statically typed code. Or, to put it another way, with enough money being dumped into JavaScript VMs, will ClojureScript end up running faster than Clojure one day. It can already do this at times for unhinted Clojure, but I'm wondering if it will all get to be so good where you can just write your dynamic unhinted Lispy code and it will run extremely fast.

borkdude 2017-12-05T19:21:43.000103Z

That would be nice, but nothing beats full type inference for this I think?

borkdude 2017-12-05T19:22:21.000735Z

I mean, if you have all type info, what else is there to guess?

borkdude 2017-12-05T19:28:06.000575Z

PureScript probably wonā€™t have much benefit from typing to performance because of its target?

mfikes 2017-12-05T19:35:49.000211Z

An analogy: I used to find it odd that you didn't pass flags like -O3 to the javac compiler. It was difficult to just "let go" of the idea optimizing at compile time, and defer it all to runtime. One argument is that if you let the runtime do the optimization, it can leverage the call and memory access patterns in your program, and perhaps do a better job than a static analyzer could do. If static type info is available, then, sure, it can be useful. I'm just wondering if we as human developers will be in the low-level business of type hinting a decade or two from now.

borkdude 2017-12-05T19:38:28.000692Z

Maybe there will be a more fuzzy type of static type inference based on ML?

borkdude 2017-12-05T19:38:51.000291Z

That could also assist in IDEs and for performance, without getting in the way

borkdude 2017-12-05T19:43:07.000762Z

I think Steve Yegge once wrote about this

borkdude 2017-12-05T19:43:47.000397Z

Thatā€™s 10 years agoā€¦ heā€™s now fond of Kotlin šŸ˜‰

2017-12-05T20:11:36.000022Z

@mfikes I'd argue we're already there with pypy

dpsutton 2017-12-05T20:13:21.000085Z

i don't follow why types hints allow for anything more than asserted types

2017-12-05T20:13:22.000229Z

I've seen entire transducer pipelines compile down to half a dozen assembly instructions.

dpsutton 2017-12-05T20:14:04.000319Z

if a jit finally believes you that they are ints, wouldn't it believe you if you said they were ints in the first place?

2017-12-05T20:14:52.000609Z

@mfikes what you describe is pretty much the premise of tracing JITs. https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation

2017-12-05T20:15:33.000423Z

@dpsutton hints can also be used for storage optimization. In that case they're a bit stronger than assertions.

2017-12-05T20:15:52.000008Z

But yeah, hints are a way of papering over the inability of a JIT to deoptimize.

2017-12-05T20:16:50.000048Z

JS and PyPy do deoptimization. Put only ints in a list and the JIT will create a typed array of ints. Once you try to put an object in the array it will convert the list into an list of objects and then add the object to it.

2017-12-05T20:18:01.000331Z

So the question becomes, why should you ever have to hint at all? In Clojure it's because the JVM doesn't allow user-level deoptimization.

2017-12-05T20:18:18.000518Z

And it's a method jit, and that doesn't help either

dpsutton 2017-12-05T20:18:27.000268Z

i'm not sure i follow. deoptimization sounds like recognizing a bad assumption?

dpsutton 2017-12-05T20:18:35.000464Z

and that wouldn't happen with types? or does it

dpsutton 2017-12-05T20:18:52.000118Z

I thought these were ints but there's a map so now they're all objects

dpsutton 2017-12-05T20:18:58.000181Z

is what i'm imagining the JIT is doing

2017-12-05T20:20:22.000340Z

Well with types and something like C# or C++ deoptimization can never occur, since all the types are known at compile time.

2017-12-05T20:20:58.000641Z

On the JVM it's a bit strange, but it will deoptimize a callsite. So as you implement IFoo on more and more types the JVM may recompile every method that calls IFoo.foo.

2017-12-05T20:21:29.000443Z

That's why adding stuff like tuples to Clojure doesn't work since the perf improvement is a wash compared to the de-optimized callsites.

dpsutton 2017-12-05T20:21:47.000286Z

interesting. thanks for explaining. my jvm knowledge is very slim

2017-12-05T20:22:23.000411Z

What tracing jits like PyPy do, is look at the context for a loop. Within a single loop they inline all methods and reduce all types to their lowest level parts. So as long as you never call a method with a map within that single loop, the code for handling that map will never exist in the loop.

mfikes 2017-12-05T20:33:19.000346Z

Hmm. So, transducers are a little more friendly to tracing JITs, it soundsā€¦

2017-12-05T20:40:36.000112Z

Right, on the JVM transducers are always boxed. On JS the JIT is a bit more tracing-like, so you get primitive math loops.

2017-12-05T20:40:53.000001Z

Do a transduce over an array in CLJ and CLJS and the CLJS version will be much faster if all you're doing is math.

bhauman 2017-12-05T20:58:50.000346Z

so that wasn't a good idea? you starting to get hungry?

borkdude 2017-12-05T20:59:43.000188Z

@bhauman did you comment on my deleted blob?

bhauman 2017-12-05T20:59:57.000462Z

yeah šŸ™‚

borkdude 2017-12-05T21:00:08.000541Z

ah - yeah, it wasnā€™t a feasible idea

mfikes 2017-12-05T21:01:49.000720Z

For the day 5 problem, it seems that it reads and writes rapidly to lots of parts of a big array. I wonder if memory bandwidth is affecting things more than any of the math / logic.

2017-12-05T21:02:36.000337Z

@mfikes in what language? and what access pattern?

2017-12-05T21:03:01.000763Z

I've seen 4x perf improvement from making sure I scanned an array from 0 to X instead of walking it randomly.

borkdude 2017-12-05T21:03:07.000580Z

Pretty satisfied that we got it down to 75 ms, the same number the C++ and Rust guys are mentioning on Reddit šŸ™‚

bhauman 2017-12-05T21:03:55.000597Z

and thats for CLJS? right?

borkdude 2017-12-05T21:04:05.000176Z

in my case Clojure JVM

bhauman 2017-12-05T21:04:11.000045Z

oh dang

borkdude 2017-12-05T21:04:35.000427Z

I canā€™t reliably benchmark node, I use criterium in Clojure

2017-12-05T21:05:14.000338Z

I wonder how much using unchecked math would improve that?

bhauman 2017-12-05T21:05:28.000166Z

oh thats been tried

bhauman 2017-12-05T21:05:54.000472Z

or has it?

2017-12-05T21:06:27.000199Z

ah, cool, I missed that post

bhauman 2017-12-05T21:06:31.000568Z

using :warn-on-boxed

borkdude 2017-12-05T21:07:08.000220Z

@tbaldridge Mike discovered that if-let generates less performant code than just if and letā€¦

borkdude 2017-12-05T21:07:30.000200Z

this and a couple of other tweaks shaved off some ms here and there

mfikes 2017-12-05T21:07:45.000624Z

The array is only about 1000 integers. We read from a spot which tells us where to jump to next, and we write to the spot we just read from with a new offset derived from the old one. Repeat until we hop out of the array.

mfikes 2017-12-05T21:08:47.000071Z

Oh, I'd summarize it this way: By guarding a block of code with a nil check, that code runs faster under Node, even though we know (as humans) that none of the values are nil.

borkdude 2017-12-05T21:09:08.000273Z

yeah, but on JVM Clojure the code ran also faster without if-let

mfikes 2017-12-05T21:09:35.000244Z

Oh. OK.

bhauman 2017-12-05T21:09:49.000655Z

yeah if and then let is faster than if let

borkdude 2017-12-05T21:09:54.000358Z

we went from if-let to if-some and then to letā€¦ if-some turned out to be beneficial on Nodeā€¦ itā€™s hard to optimize for both worlds

bhauman 2017-12-05T21:10:19.000676Z

wow it's a lot faster

2017-12-05T21:10:31.000109Z

yeah, I doubt memory perf is the issue then. As 1000 integers at 8 bytes that should easily fit in the cache.

mfikes 2017-12-05T21:10:45.000248Z

Yeah. I mistakenly assumed it was a much larger array.

bhauman 2017-12-05T21:11:22.000343Z

using if and then let cut my time in half

borkdude 2017-12-05T21:11:24.000015Z

while the if-some was an obsolete check, the Node runtime still benefited from that, crazy

mfikes 2017-12-05T21:11:35.000439Z

Don't we already have a solution that is around the same as one of the C++ ones out there?

borkdude 2017-12-05T21:12:08.000539Z

@mfikes Yes, I posted it 5 minutes ago: (quick-bench (part-2-array-faster?)) ;; 74 ms, or did you mean something else?

borkdude 2017-12-05T21:12:48.000523Z

(on a Macbook Pro 2015)

mfikes 2017-12-05T21:13:32.000400Z

We are running through 20 million steps in 70 ms. Around a few nanoseconds per loop?

mfikes 2017-12-05T21:15:40.000425Z

If that's around 10 machine cycles, it might be hard to improve upon :thinking_face:

borkdude 2017-12-05T21:16:11.000100Z

The assembly is probably straightforward, but I only did that in universityā€¦

2017-12-05T21:16:40.000616Z

hey @mfikes you had a twitter thing about emitting machine code form JS right?

2017-12-05T21:16:44.000583Z

got the machine code for this?

mfikes 2017-12-05T21:17:30.000119Z

Yes, you can enable JavaScriptCore's logging of compilation. But it is very hard to actually catch the code you'd like. (It spews stuff faster than you can consume it.) I'll try and see šŸ™‚

borkdude 2017-12-05T21:17:56.000698Z

maybe writing a C program is easier

mfikes 2017-12-05T21:18:10.000448Z

Yeah, this is probably a few lines of C

mfikes 2017-12-05T21:19:50.000276Z

Here is the tweet @tbaldridge referred to https://twitter.com/mfikes/status/935527618274873344

mfikes 2017-12-05T21:21:49.000444Z

In short if you

JSC_dumpDisassembly=true planck src/advent_day_5/core.cljc
in https://github.com/mfikes/advent-day-5, you will see lots of disassembly

mfikes 2017-12-05T21:48:36.000349Z

Perhaps this is some of the assembly JavaScriptCore is creating for our loop, but it is hard for me to tell: https://gist.github.com/mfikes/5c68510b6ab2207b85cc92697a005a3e

borkdude 2017-12-05T21:56:04.000226Z

Iā€™m trying a C program now, about 195 ms on my machine

mfikes 2017-12-05T21:57:19.000323Z

@borkdude Did you compile it with optimizations enabled?

borkdude 2017-12-05T21:59:19.000393Z

No, I just did gcc -o day05 day05.c &amp;&amp; ./day05, what optimizations do you suggest?

borkdude 2017-12-05T22:00:17.000646Z

ah 57 ms with -O3

mfikes 2017-12-05T22:00:24.000446Z

šŸ™‚

borkdude 2017-12-05T22:00:29.000593Z

thatā€™s probably all the speed youā€™re gonna get

mfikes 2017-12-05T22:01:09.000021Z

It's remarkable we are able to produce pretty much the same result with some carefully crafted Clojure.

borkdude 2017-12-05T22:01:22.000497Z

FWIW I tried this one: https://github.com/vesche/adventofcode-2017/blob/master/day05.c

dpsutton 2017-12-05T22:08:47.000634Z

this has got to be the best channel on slack right now

šŸ’Æ 6
mfikes 2017-12-05T22:17:40.000534Z

I can reproduce the 57 ms. I also tried inlining the input data array (instead of having the program read it from disk), but it still runs at the same speedā€”dominated by the loop.

borkdude 2017-12-05T22:29:55.000344Z

I also applied the zero checking trick to the C program, but it didnā€™t help there

borkdude 2017-12-05T22:36:45.000124Z

an extra branch (simple boolean/int check in C) is more costly on the lower level than just doing the same loop all the time?

borkdude 2017-12-05T22:38:50.000164Z

Talking about this one:

while (current &lt;= 1036 &amp;&amp; current &gt;= 0) {
    move = numbers[current];
    /* if (move == 0) { */
    /*   numbers[current] = 2; */
    /*   steps += 2; */
    /*   current += 1; */
    /* }   */
    /* else */
    {
      if (move &gt;= 3)
        numbers[current] -= 1;
      else
        numbers[current] += 1;
      current += move;
      steps += 1;
    }
  }
With code uncommented, it becomes slower, not faster.

borkdude 2017-12-05T22:39:55.000322Z

I also tried to move the &gt;= 3 branch to the first place, but that becomes even slower

borkdude 2017-12-05T22:42:27.000029Z

Calling it a night. Tomorrow is another puzzleā€¦

2017-12-05T22:44:34.000324Z

branch prediction at work most likely

2017-12-05T22:44:52.000137Z

easier to to calculate something than do a conditional