Ok thank you. We have now increased the snapshot interval
i'm seeing log lines like this: 17-12-20 18:35:51 FATAL [onyx.messaging.aeron.publication-manager:40] [clojure-agent-send-off-pool-63] - Aeron write from buffer error: java.lang.IllegalArgumentException: Encoded message exceeds maxMessageLength of 2097152, length=2107298
might this be because all output segments corresponding to a single input segment are written to one aeron message ?
(and i have a high fanout)
@mccraigmccraig looks like we have a bug in the way we split up our messages. We should have at least thrown an error earlier.
Looks like you’re right at the boundary.
@lucasbradstreet none of my segments are anything like that large, assuming the count is bytes - are multiple segments written to each aeron message ?
and is there anything i can do to increase the aeron max-message-length or change how onyx splits messages ?
Yes, it’ll just add messages up to the batch size on the task it’s outputting to
I think I would need to fix whatever bug is causing the miscomputation. You could reduce the batch size on the tasks it’s emitting to though.
That would prevent it from getting too big.
I will obviously need to fix the bug soon.
i'm stuck on 0.9 atm - so it may already have been fixed
i've got an upgrade to 0.12 in the pipeline, but until we move to our new cluster with more recent docker & kafka it's 0.9
ok, i'll try dropping the batch size
Oh, I didn’t realise that. Yes, it’s almost definitely fixed, because I did add support for re-batching messages based on message sizes
Batch sizes on the receiving task won’t help on 0.9, because that was part of the same feature
Best you can do is reduce the batch sizes on the task that is generating all of these fan out messages, and maybe increasing the aeron channel sizes (didn’t suggest that before because I forgot you were running 0.9)
ah - the batch size on the tasks generating the fan-out is already 1, so not much scope for reducing there
how do i increaese the aeron channel sizes @lucasbradstreet? i can't see anything in http://www.onyxplatform.org/docs/cheat-sheet/latest/
You can increase aeron.term.buffer.length
via a java property, see https://github.com/real-logic/Aeron/wiki/Configuration-Options
There’s a peer-config option for it in 0.12.
The other suggestion I would have is to decrease onyx/max-pending to backpressure more
If it’s large and you have high fan out, it will get very bad very quick.
(in 0.9)
i'm not setting onyx/max-pending anywhere in my project, so it must be defaulting
Yeah, 10000 is the default, which means 10000 messages at the input source can be outstanding at any time, each with their own fan outs
our input message throughput is not very high - rarely above 10 messages a second i think, but our fanout can easily be 20k per message and getting larger
Yeah, this is going to be a really tough case for onyx 0.9
If your fanout is that big then you should probably reduce batch sizes, reduce max-pending to maybe even 1
and increase the channel sizes
but I would recommend moving over to 0.12, as it is much better at handing these situations.
i'll be on 0.12 soon - in a month or so... migrating to an all-new dc/os based cluster isn't something i want to rush though
Understood, which is why I’m trying to give you some short term workarounds 🙂
Reducing max-pending way, way down, and reducing batch sizes are the best bet.
If your fan out is that large it probably won’t hurt performance. I just can’t guarantee that it won’t pop up again
your help is much appreciated @lucasbradstreet - thank you