yada

mccraigmccraig 2020-03-17T18:07:04.033800Z

so we've got a memory leak in a yada api process... i suspect it's in direct memory rather than heap - we get no OOMEs logged, and heap telemetry seems well within limits, but our process gets oom-killed by k8s despite the sum of -XX:MaxDirectMemorySize and -Xmx being somewhat less than the cgroups limit

mccraigmccraig 2020-03-17T18:10:54.036300Z

i note that we are using :raw-streams? true on our aleph server ('cos streaming uploads don't work without it), but i also note that yada doesn't seem to do any releasing of netty ByteBufs anywhere i can find, so i'm starting to suspect our yada handler is leaking ByteBufs

mccraigmccraig 2020-03-17T18:11:04.036600Z

anyone else noticed anything similar ?

mccraigmccraig 2020-03-17T19:24:53.037Z

hmm. maybe it gets buffer-releasing behaviour from ztellman/byte-streams

mccraigmccraig 2020-03-17T19:33:22.037900Z

yeah, looks like byte-streams/to-byte-array will use the transform defined in aleph.netty which releases ByteBufs

mccraigmccraig 2020-03-17T19:35:50.038300Z

ok, time to get some allocation instrumentation going then

malcolmsparks 2020-03-17T19:52:16.041300Z

It's quite a while since I wrote the byte buffer streaming code (and some versions of aleph have gone by, which may have changed behaviour), but I do remember double-checking that all buffers were deallocated.

mccraigmccraig 2020-03-17T19:56:00.043800Z

i don't think i'll get any further without some instrumentation - it seems likely it's something in aleph or yada or our usage thereof, since we have no memory leaks in our kafka-streams apps, and they use largely the same model codebase

mccraigmccraig 2020-03-17T19:57:43.044400Z

it's very annoying that we get oom-killed by cgroups rather than getting an OOME though. no clues whatsoever to follow