off-topic

https://github.com/clojurians/community-development/blob/master/Code-of-Conduct.md Clojurians Slack Community Code of Conduct. Searchable message archives are at https://clojurians-log.clojureverse.org/
2020-11-30T16:11:52.244800Z

When performance benchmarking, libraries like criterium often use Java APIs to get the current time in nanoseconds. I was doing some benchmarking recently using criterium, and got some results that surprised me at first, so I decided to run the whole batch again, and then a third time (it was only 15 minutes each batch, but far from instant gratification). The 2nd and 3rd batches were fairly close to each other, but made the first look like the measurements were probably off in pretty significant ways. My laptop might have gone to sleep somewhere during that run, but I am not sure if (a) it actually did, or (b) if it did, whether that would have caused so many of the results to be off (I could understand one outlier pretty easily, but this was multiple 30-second runs that had off results).

borkdude 2020-12-01T22:13:07.303Z

Could it be that your processor was throttling the third time?

borkdude 2020-12-01T22:14:26.303900Z

This is sometimes an issue when I've compiled something heavy and do benchmarks while my fans are still spinning ;)

2020-12-02T00:12:04.392200Z

I doubt it, given that I saw that only 1 of 8 cores was busy during the times I was watching. I am nearly certain it did go to sleep for 30 to 60 minutes somewhere in the middle, though, based upon some modification times on different output files.

2020-11-30T16:12:48.244900Z

I know that the most straightforward advice is: Make sure the machine did not go to sleep during benchmarking, and I've taken steps to avoid that now, but it makes me wonder if it would be easy to add something to criterium, or benchmarking libs in general, that can detect this and preemptively warn you that the results are probably wrong.

2020-11-30T16:13:46.245100Z

simplest thing I can imagine would be to start up a parallel thread that does nothing but periodically ask for the elapsed real world time via some API that would have a noticeable "gap" if the machine went to sleep.

Wolfram Schroers 2020-11-30T16:45:58.245600Z

Hi @andy.fingerhut, it depends on what you consider “simple”. If you are able to run the task at the command line (and assuming you have macOS or Linux) you can prepend the task with time and get different time stats for the runtime of the entire process. This can tell you how much time the process had been running and how long it had actually been “active”.

seancorfield 2020-11-30T16:54:37.245800Z

The advice I've heard, even with criterium, is to do one run to warm up the JVM and ignore the results, and then do at least two runs for actual timing purposes.

seancorfield 2020-11-30T16:55:48.246Z

So I always do at least 3 criterium runs and just ignore the first one.

alexmiller 2020-11-30T16:56:16.246200Z

really you should probably do that across multiple jvm starts as well before you really believe it

alexmiller 2020-11-30T16:57:06.246400Z

generally I don't put much stock in any number reported in nanoseconds - better to make bigger chunks to compare

alexmiller 2020-11-30T16:58:23.246600Z

not sure if you were using bench or quick-bench but it's not unusual to see different results between the two

2020-11-30T16:59:15.246800Z

Definitely true on the bigger runs. criterium already tries to do that by rerunning your expression as many times as it takes to fill up 60 seconds for the full one, which I usually change to 30 since I'm doing a dozen or so expressions and they are all pretty quick.

2020-11-30T16:59:38.247Z

The big warning flag I got on one run was negative numbers, which I may dig into and figure out when/why that can occur. I definitely don't believe that one.

alexmiller 2020-11-30T16:59:59.247200Z

now that's fast

2020-11-30T17:00:08.247400Z

The more worrying case was the numbers that were low, but didn't look impossibly low, but were not matched by later runs of the same expressions.

alexmiller 2020-11-30T17:00:10.247600Z

the answer is 37 - what is your question?

2020-11-30T17:00:29.247800Z

So yeah, a general rule is to measure more than once, or in more than one way if possible, and I will do that here, too.

2020-11-30T17:02:19.248Z

FYI, Alex, I am doing little benchmarks on some potential optimizations to some operations on Clojure's built-in vectors. There are multiple operations where they use for (i=0; i < count(); i++) do_something(nth(i)); which is fine for small vectors, but for about 33 and up the nth(i) is redoing a lot of tree walking work on each element. Trying out a small patch that uses Java iterators on the vector elements instead.

alexmiller 2020-11-30T17:20:33.248200Z

operations like?

2020-11-30T17:32:08.248400Z

Mostly things in APersistentVector.java e.g. doEquiv, doEquals, hashCode, hasheq, indexOf, compareTo, maybe one or two others. Anything that linearly scans forward through most or all of the elements.

2020-11-30T17:34:32.248600Z

Early measurements indicate possible 20% to 40% reduction in time for such operations on vectors with at least 33 elements, which I know is not the common case in most Clojure applications, so nothing to get extremely excited about.

2020-11-30T17:36:53.248800Z

Unfortunately the current change I'm testing makes = (the only one I've measured so far) slightly slower for small vectors (<= 32 elements), so next I will see about a change that uses existing code for small ones, and iterators for large ones.

2020-11-30T17:37:44.249Z

Anyway, mainly tinkering at the fringes because of things I've noticed from reading the code while working on core.rrb-vector. Nothing I expect to get into the master release any time soon, if ever.

alexmiller 2020-11-30T17:50:56.249200Z

cool, seems worth trying

dominicm 2020-11-30T18:00:10.249400Z

Sounds like clock drift/ntp might also be effecting you if you're seeing negative numbers.

2020-11-30T18:05:28.249600Z

System/nanoTime should be monotonic if that is what is being used, so ntp could cause 0 elapsed time to be reported, but not negative

2020-11-30T18:07:29.249800Z

criterium does some calculations that I am not intimately familiar with before showing the times, obviously, if it can in some cases show negative numbers. My current suspicion is that it has ways of estimating and subtracting out the "measurement loop overhead" that criterium itself introduces, and that somehow turned out to be way too large of a number that was subtracted from the actual (elapsed time / number of iterations), which was positive.

2020-11-30T18:09:27.250Z

Negative time is a clear sign the results cannot be trusted. I know that one benchmark run to another can vary easily by plus or minus 10%, even in Criterium runs that execute the same code as many times as possible in 30 seconds, even for compute-bound code with no I/O. It is the larger variations that shake my trust.

2020-11-30T18:11:59.250200Z

have you looked at https://openjdk.java.net/projects/code-tools/jmh/ at all, it seems like the tool dejour used for benchmarking java, so it might have more man hours behind it than criterium. I haven't looked at it at all, so I am not sure what it would take to use it with/from clojure

alexmiller 2020-11-30T18:12:19.250400Z

there is a clojure harness for it

alexmiller 2020-11-30T18:12:37.250600Z

https://github.com/jgpc42/jmh-clojure

2020-11-30T18:13:23.250900Z

interesting

alexmiller 2020-11-30T18:13:41.251100Z

jmh is somewhat cumbersome to use from clojure iirc b/c it relies on subclassing and/or annotations

alexmiller 2020-11-30T18:15:16.251300Z

http://clojure-goes-fast.com/blog/using-jmh-with-clojure-part1/ maybe also of interest (doesn't use that lib)

alexmiller 2020-11-30T18:17:23.251600Z

but JMH itself is truly high-quality stuff created and used by people working on java itself, so highly recommended

2020-11-30T18:17:51.251800Z

In the past, when criterium has shaken my trust, I have often dug into it to find out what it does in more detail. It may be time for another round of that for me 🙂