can i tell, either in clojure or in the jvm, whether the core async threadpool is deadlocked?
I had a lot of problems with core.async thread pool becoming exhausted, causing hangs. I've been pulling my hair out over this until I found that -Dclojure.core.async.pool-size=96
helps. Not a solution, I know. The problem is that in a large application many libraries might make use of core.async, not just the main code.
I reworked every go block in my app to make sure it doesn't perform computation or block on I/O and I've still been getting hangs with the default thread pool size. Apparently the combination of my use of core.async and the libraries causes the issue. Extremely annoying, as this can happen in production after a minute or a month of uptime.
for example, i see this in a jvm thread analyzer. these eight threads with this stacktrace means deadlock?
the more reliable approach is never making calls inside go blocks that don't return quickly - you could use jstack (or the equivalent C-\ shortcut in a terminal) to see all stack traces
at the same time, perhaps i should trust the analyzer
indeed
"waiting on notification" is the opposite of blocked
this is not a small codebase, and i can’t guarantee that blocking calls aren’t being made
ok thanks. one-ish more dumb question: what should i look for in a thread dump, for blocked threads?
calls to IO or CPU intensive methods, or non-parked waiting on locks (which should be very rare in idiomatic clojure code, outside interop)
ok--thank you--damn. it makes sense that deadlock is a condition, and not a denoted state.
yeah, by its nature core.async obscures things that would be obvious or impossible in sync code
the way I usually present it to people who are considering adopting it is that async is a liability, and you need to have a big enough benefit from the async to buy into the corresponding liability
yeah--thanks again. looks like we’re not deadlocked, which (ha) means we avoided that landmine, and are back to the drawing board.
if a code base might be deadlocking, and it has gotten so large without the people building it making sure that isn't happening as they are building it, then it might be junk
looks like it’s not; it’s my naivete of reading thread dumps. but i appreciate the feedback.
the large symptom of the core.async threadpool being deadlocked is when you create a go block that reads from a channel, that part of the go block after reading from the channel never runs
that happens if all the threads are blocked
the lesser symptom is a reduced number of go blocks running in parallel (which is harder to quantify and test) and that happens if only some of the threads in the pool are taken away
indeed. i worry about it in general because the thing that stops us from using up the threadpool is us knowing what’s going into core.async and what we put on threads. as best as i can tell, we are careful about it.
you can also reduce the size of the threadpool to try and make a total blockage easier
interesting--thanks
there is also a newish feature, a property you can set that will warn you if you use blocking channel ops in go blocks
clojure.core.async.go-checking and clojure.core.async.pool-size
clojure.core.async.go-checking is not particularly fancy, so it is prone to false positives
Assuming a positive is an error, I think it’s more prone to false negatives actually
That is, not telling you about a problem, rather incorrectly telling you there is a problem