core-async

zlrth 2020-04-14T16:52:28.035500Z

can i tell, either in clojure or in the jvm, whether the core async threadpool is deadlocked?

jrychter 2020-04-15T07:40:27.053600Z

I had a lot of problems with core.async thread pool becoming exhausted, causing hangs. I've been pulling my hair out over this until I found that -Dclojure.core.async.pool-size=96 helps. Not a solution, I know. The problem is that in a large application many libraries might make use of core.async, not just the main code.

jrychter 2020-04-15T07:42:35.053800Z

I reworked every go block in my app to make sure it doesn't perform computation or block on I/O and I've still been getting hangs with the default thread pool size. Apparently the combination of my use of core.async and the libraries causes the issue. Extremely annoying, as this can happen in production after a minute or a month of uptime.

zlrth 2020-04-14T16:53:53.036400Z

for example, i see this in a jvm thread analyzer. these eight threads with this stacktrace means deadlock?

2020-04-14T16:53:58.036900Z

the more reliable approach is never making calls inside go blocks that don't return quickly - you could use jstack (or the equivalent C-\ shortcut in a terminal) to see all stack traces

zlrth 2020-04-14T16:54:17.037Z

at the same time, perhaps i should trust the analyzer

zlrth 2020-04-14T16:54:24.037500Z

indeed

2020-04-14T16:54:39.037900Z

"waiting on notification" is the opposite of blocked

đź‘Ť 1
zlrth 2020-04-14T16:54:46.038300Z

this is not a small codebase, and i can’t guarantee that blocking calls aren’t being made

zlrth 2020-04-14T16:55:49.038900Z

ok thanks. one-ish more dumb question: what should i look for in a thread dump, for blocked threads?

2020-04-14T16:57:08.039300Z

calls to IO or CPU intensive methods, or non-parked waiting on locks (which should be very rare in idiomatic clojure code, outside interop)

zlrth 2020-04-14T16:59:28.039500Z

ok--thank you--damn. it makes sense that deadlock is a condition, and not a denoted state.

2020-04-14T17:00:13.039700Z

yeah, by its nature core.async obscures things that would be obvious or impossible in sync code

2020-04-14T17:00:50.039900Z

the way I usually present it to people who are considering adopting it is that async is a liability, and you need to have a big enough benefit from the async to buy into the corresponding liability

zlrth 2020-04-14T17:04:15.040100Z

yeah--thanks again. looks like we’re not deadlocked, which (ha) means we avoided that landmine, and are back to the drawing board.

2020-04-14T17:18:19.042300Z

if a code base might be deadlocking, and it has gotten so large without the people building it making sure that isn't happening as they are building it, then it might be junk

zlrth 2020-04-14T17:19:37.043300Z

looks like it’s not; it’s my naivete of reading thread dumps. but i appreciate the feedback.

2020-04-14T17:21:14.044300Z

the large symptom of the core.async threadpool being deadlocked is when you create a go block that reads from a channel, that part of the go block after reading from the channel never runs

đź‘Ť 1
2020-04-14T17:21:44.045200Z

that happens if all the threads are blocked

2020-04-14T17:22:43.047200Z

the lesser symptom is a reduced number of go blocks running in parallel (which is harder to quantify and test) and that happens if only some of the threads in the pool are taken away

đź‘Ť 1
zlrth 2020-04-14T17:23:02.047300Z

indeed. i worry about it in general because the thing that stops us from using up the threadpool is us knowing what’s going into core.async and what we put on threads. as best as i can tell, we are careful about it.

2020-04-14T17:24:28.048200Z

you can also reduce the size of the threadpool to try and make a total blockage easier

zlrth 2020-04-14T17:25:12.048700Z

interesting--thanks

2020-04-14T17:26:33.049800Z

there is also a newish feature, a property you can set that will warn you if you use blocking channel ops in go blocks

đź‘Ť 1
2020-04-14T17:27:26.050300Z

clojure.core.async.go-checking and clojure.core.async.pool-size

2020-04-14T17:28:30.051300Z

clojure.core.async.go-checking is not particularly fancy, so it is prone to false positives

alexmiller 2020-04-14T18:33:24.052300Z

Assuming a positive is an error, I think it’s more prone to false negatives actually

alexmiller 2020-04-14T18:33:50.053100Z

That is, not telling you about a problem, rather incorrectly telling you there is a problem