clojure-dev

Issues: https://clojure.atlassian.net/browse/CLJ | Guide: https://insideclojure.org/2015/05/01/contributing-clojure/
alexmiller 2019-07-17T00:31:54.449700Z

iirc we looked at sip back when we made the last hash change in 1.6

alexmiller 2019-07-17T00:34:48.450Z

well, my notes are at https://docs.google.com/document/d/1DT2uXlAwH5NstgYSeqbOXb_8K6Gwnw99QksCaUKpCjU/edit, don't see it there, so I guess not

alexmiller 2019-07-17T00:39:16.451200Z

it's funny how I have no memory now of doing any of that work :)

alexmiller 2019-07-17T00:39:22.451400Z

good thing I wrote it down

ghadi 2019-07-17T00:39:56.451600Z

SipHash is slower than Murmur3 (what we use) but Murmur3 is susceptible to hash-flooding

ghadi 2019-07-17T00:39:59.451900Z

hah

ghadi 2019-07-17T00:40:16.452400Z

I also remember SipHash being set aside back then too

alexmiller 2019-07-17T00:40:32.452700Z

I do recall at least coming across it, and city, and a few others

alexmiller 2019-07-17T00:40:49.453100Z

I don't remember why I didn't include them now

ghadi 2019-07-17T00:42:56.454300Z

Apparently CityHash is worse than Murmur3 for collision flooding attacks (source: djb)

2019-07-17T00:57:32.455400Z

Breaking down and installing YourKit. Strange I haven't used it before. Perf debugging really not very fun without a decent profiler.

alexmiller 2019-07-17T01:03:20.455800Z

don't believe everything you see with yourkit, particularly around microbenching

alexmiller 2019-07-17T01:04:36.456800Z

it uses safepoints, and also seems to end up with inflated #s for things called more often, particularly if using tracing not sampling

alexmiller 2019-07-17T01:06:27.457900Z

I find it useful for memory debugging and for getting leads on things to look at with perf (or things that are unexpected/surprising) but exercise caution in drawing conclusions only from yk (or any profiler)

2019-07-17T01:09:24.458400Z

Thanks for the advice. Right now just trying to narrow down where the code is spending most of its time.

2019-07-17T01:09:29.458600Z

hoping it can give hints there.

2019-07-17T01:09:42.458900Z

In a function that executes for 5 minutes

alexmiller 2019-07-17T01:11:49.459600Z

if you want the short version, just take thread dumps every 10s or so. if there's a bottleneck, they'll all be the same and the function at the top is it.

alexmiller 2019-07-17T01:12:37.460100Z

this seems dumb, but is remarkably effective at telling you the exact same thing that a sampling profiler will tell you

alexmiller 2019-07-17T01:13:57.461400Z

tracing profilers often give misleading results (but are super useful for examining counts if you control the test). for example, if you're doing something 10k times, and you see a function called 50k times when you expect it to be 10k times, that's a big tell

alexmiller 2019-07-17T01:14:46.461700Z

you might want to look at https://github.com/clojure-goes-fast/clj-async-profiler too

đź‘Ť 1
alexmiller 2019-07-17T01:15:52.462Z

which can even avoid safe points

cfleming 2019-07-17T03:01:53.462700Z

I think flight recorder is the gold standard now, and I believe it also doesn’t have the safepoint problem.

cfleming 2019-07-17T03:05:23.463100Z

I remember Tom Crayford talking about it at EuroClojure a couple of years back.

cfleming 2019-07-17T03:15:20.463400Z

https://www.youtube.com/watch?v=0tUrbf6Uzu8, at about 20 minutes in.

devn 2019-07-17T04:13:23.465900Z

It would be good to capture this knowledge was available somewhere more accessible, under a heading like “profiling clojure applications”.

jumar 2019-07-17T05:25:32.466100Z

I second that recommendation. clj-async-profiler is really handy for quickly getting a grasp where your CPU time is spent

jumar 2019-07-17T05:26:12.466300Z

it's basically "profiling jvm applications" I think

alexmiller 2019-07-17T05:43:40.466500Z

^^ nothing here is clojure-specific

alexmiller 2019-07-17T05:44:54.467200Z

Alex Yakushev has written a ton of great stuff at http://clojure-goes-fast.com/blog/

đź‘Ť 2
2019-07-17T09:26:54.469Z

Hmm. Regarding the performance issue I mentioned earlier. I am not much farther in figuring out why one version of the code takes about 10x longer, except that I changed the slower one so it no longer uses sets of integers as map keys/set elements, only integers, and that version is still 10x slower. So whatever is making it slower has nothing to do with my earlier guess.

2019-07-17T21:39:53.471300Z

while peeking around and experimenting, I did notice that if you use sets or maps as keys in an array-map, there is no identical check when searching for such a key, because equivPred is used, and finds the equiv method for sets or maps, which have no identical check. For a hash-map, it uses Util.equiv(Object, Object) which does have the identical check.