iirc we looked at sip back when we made the last hash change in 1.6
well, my notes are at https://docs.google.com/document/d/1DT2uXlAwH5NstgYSeqbOXb_8K6Gwnw99QksCaUKpCjU/edit, don't see it there, so I guess not
it's funny how I have no memory now of doing any of that work :)
good thing I wrote it down
SipHash is slower than Murmur3 (what we use) but Murmur3 is susceptible to hash-flooding
hah
I also remember SipHash being set aside back then too
I do recall at least coming across it, and city, and a few others
I don't remember why I didn't include them now
Apparently CityHash is worse than Murmur3 for collision flooding attacks (source: djb)
Breaking down and installing YourKit. Strange I haven't used it before. Perf debugging really not very fun without a decent profiler.
don't believe everything you see with yourkit, particularly around microbenching
it uses safepoints, and also seems to end up with inflated #s for things called more often, particularly if using tracing not sampling
I find it useful for memory debugging and for getting leads on things to look at with perf (or things that are unexpected/surprising) but exercise caution in drawing conclusions only from yk (or any profiler)
Thanks for the advice. Right now just trying to narrow down where the code is spending most of its time.
hoping it can give hints there.
In a function that executes for 5 minutes
if you want the short version, just take thread dumps every 10s or so. if there's a bottleneck, they'll all be the same and the function at the top is it.
this seems dumb, but is remarkably effective at telling you the exact same thing that a sampling profiler will tell you
tracing profilers often give misleading results (but are super useful for examining counts if you control the test). for example, if you're doing something 10k times, and you see a function called 50k times when you expect it to be 10k times, that's a big tell
you might want to look at https://github.com/clojure-goes-fast/clj-async-profiler too
which can even avoid safe points
I think flight recorder is the gold standard now, and I believe it also doesn’t have the safepoint problem.
I remember Tom Crayford talking about it at EuroClojure a couple of years back.
https://www.youtube.com/watch?v=0tUrbf6Uzu8, at about 20 minutes in.
It would be good to capture this knowledge was available somewhere more accessible, under a heading like “profiling clojure applications”.
I second that recommendation. clj-async-profiler is really handy for quickly getting a grasp where your CPU time is spent
it's basically "profiling jvm applications" I think
^^ nothing here is clojure-specific
Alex Yakushev has written a ton of great stuff at http://clojure-goes-fast.com/blog/
Hmm. Regarding the performance issue I mentioned earlier. I am not much farther in figuring out why one version of the code takes about 10x longer, except that I changed the slower one so it no longer uses sets of integers as map keys/set elements, only integers, and that version is still 10x slower. So whatever is making it slower has nothing to do with my earlier guess.
while peeking around and experimenting, I did notice that if you use sets or maps as keys in an array-map, there is no identical check when searching for such a key, because equivPred is used, and finds the equiv method for sets or maps, which have no identical check. For a hash-map, it uses Util.equiv(Object, Object) which does have the identical check.