Are there any core.async-oriented Clojure profiling tools for ongoing profiling?
Seems like tufte may be the best I've found, but it doesn't have first class support for core.async.
What capabilities are you missing? I'm usually satisfied with clj-async-profiler and visualVM @jeaye
@ben.sless I haven't used the former, but its documentation seems to be focused on benchmarking, not profiling.
i.e. more criterium than tufte.
The profiler's output is a flame graph of %CPU time on X axis and stack on Y axis. I find it very useful, usually
For VisualVM, I think it could work, but things are made more complicated by me wanting to profile a staging environment within AWS.
Ah, been there π
you need to inject a bunch of flags to your run command
Actually, might be the same issue with clj-async-profile. For tufte, everything was done in-process and then can be exposed through an HTTP endpoint.
That was some of the appeal, I think.
It's a bother, but I have done both. Async profiler can also be exposed as an endpoint
If you have a flag or environment variable which lets you know you're in a staging environment,
I recently wrapped one service with a run script:
#!/usr/bin/env bash
opts="-server -Xmx5g -Xms5g"
if [ "ENVTYPE" = "prod" ]; then
echo "running in prod"
else
opts="$opts -XX:+PreserveFramePointer -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints"
fi
jar=path/to/your.jar
cmd="java $opts -jar $jar $*"
echo running "$cmd"
exec $cmd
You can inject whatever arguments you'd like, that way
we're using it to run async profiler, we expose it as an endpoint
Thank you. π We have a whole init setup with docker/ECR, so I can plug stuff into there.
Out of curiosity and thoroughness, you've used VisualVM and clj-async-profiler. Have you used tufte?
Not yet, I have found them adequate for all my use cases until now
Seems like ECS doesn't set perf_event_paranoid
, so that may rule async-profiler out.
From what I see tufte is used to profile a specific piece of code, not the entire JVM process?
Right.
For that you can consider using jmh
It's more oriented towards benchmarking, but the results are more detailed, like tufte's
Yeah, benchmarking isn't what I'm after.
I did see jmh-clojure in my research.
Trying to understand which features exactly you're looking for, what you want to check, etc.
I think either whole-JVM process for exporatory flame graphs or tufte's approach, where I add profiling to specific parts of the code to see how long things take. The former would be better for finding surprises; the latter would be better for timing things I already know I want timed.
I wouldn't be lying if I said both sounded good. π
Ah!, why not just use metrics?
I am using prometheus + grafana for metrics.
So, wrapping the piece of code under test in a timer wouldn't give you the same amount of information as tufte?
For tufte, they'd be quite similar, but rendered differently and tufte would be more convenient, rather than timing myself and pushing it into prometheus.
I suppose the biggest gain, now that we talk it all out, would be from whole-JVM process flame graph exploration.
I see. It's a bit of a hack but you don't have to push it to prometheus
You can get all the data from the timer object as an endpoint π
Ah. We already have prometheus and grafana up, so any time series data might as well get rendered nicely through grafana.
Ok, refocusing. If I want a whole-JVM process flame graph for exploring the unknown, what's recommended?
clj-async-profiler
use the (prof/profile-for duration options)
function
trigger it with a http request sending it the options.
Pretty simple solution. If your flame graph comes out weird, means you need to add JVM flags. If you get an exception, you need perf_event_paranoid
You can then use it to find the function call you're interested in, zoom in on it, and analyze the CPU usage. If you're using go blocks pervasively, the graph will be slightly harder to analyze, because everything will happen on the same thread pool
I'll look into it. I'm currently concerned about perf_event_paranoid
within ECS, but we'll see where I can get.
Yeah, everything we do is within go blocks.
I can tell you from experience I had no issues with it on AWS
ECS as well?
i.e. docker
yes
Docker on EC2
Cool. That's good to know. Thanks a lot, Ben.
π
Good luck, let me know if you found something interesting
To satisfy my curiosity, what are you concerned about which led you to need to profile? Think you might be spending too much time with locks or something else?
@ben.sless We just moved our whole back-end to core.async to try to resolve some scalability issues. We're now fully async, with no blocking anywhere, but our scalability is 100x lower than where we need it and where we need it is still a very reasonable number. So I need to figure out what's taking so much time; we may be fully async, but we have the latency of molasses with even 1k clients.
And you suspect you might be blocking on your middleware?
@ben.sless Nope, I don't think there's any blocking going on. We checked pretty thoroughly.
I do think we're being quite slow with some things, though; just trying to figure out what those things are.
Then async profiler will serve you well.
Swapping from compojure to reitit+muuntaja will speed you up. ring-json uses slurp
and regular expressions, performance killer. I PRed it a few days ago to remove slurp
but still, switch to #reitit if you want speed. (also jsonista > cheshire)
@ben.sless We have reitit on the list to bring in! Based on some testing we did a few months ago, it gave us a 15% speed up in running all of our tests, which is a surprising amount of time we we're spending doing routing.
We're using jsonista already.
yeah, and when the server is under stress it compounds exponentially
Thanks for all the tips. π
I have clj-async-profiler working from an HTTP endpoint now. Just digging into what's going on.
Best of luck! hope you find some performance lying on the floor for quick wins
Then write a blog post π
hehe, maybe. It's been years since my last blog post. This type of stuff is typically quite fun to write about, though.
What I find very weird is that a quarter samples seem to include the JVM function compiler. This isn't anything I've seen in other JVM flamegraph articles, aside from one, which verified it's the JIT.
Do you typically see that?
did you run with these flags? -XX:+PreserveFramePointer -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
Not the first one, but the other two.
I'm using -Djdk.attach.allowAttachSelf -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
I think PreserveFramePointer is required for analyzing JITed code. you should also wait a few minutes before starting the profiler while the JIT does its job
(a few ~3 should be enough)
Yep, I'll add that flag.
Itβs probably worth running a load test and then just taking periodic thread dumps. Generally if you take 10 thread dumps and 8 show the same thing, thatβs the bottleneck
This is dumb, but shockingly effective
heh
are there plans to add some sort of metrics/profiling help to core-async?
No plans, not sure what would be genetically useful
Something we watch in production is the latency of the go block thread pool, we periodically throw a task on there that just reports the elapsed time between when it was queued and when it was executed.
It would be interesting to get the queue size out, I forget if I tired to get that
In theory the queue should pretty much always be empty, because tasks run fast, run a little logic, then wait on a channel
So a constant cycle through the queue