onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
eelke 2017-12-20T09:27:03.000512Z

Ok thank you. We have now increased the snapshot interval

mccraigmccraig 2017-12-20T19:05:01.000479Z

i'm seeing log lines like this: 17-12-20 18:35:51 FATAL [onyx.messaging.aeron.publication-manager:40] [clojure-agent-send-off-pool-63] - Aeron write from buffer error: java.lang.IllegalArgumentException: Encoded message exceeds maxMessageLength of 2097152, length=2107298

mccraigmccraig 2017-12-20T19:05:45.000478Z

might this be because all output segments corresponding to a single input segment are written to one aeron message ?

mccraigmccraig 2017-12-20T19:05:55.000656Z

(and i have a high fanout)

lucasbradstreet 2017-12-20T19:06:04.000219Z

@mccraigmccraig looks like we have a bug in the way we split up our messages. We should have at least thrown an error earlier.

lucasbradstreet 2017-12-20T19:06:09.000128Z

Looks like you’re right at the boundary.

mccraigmccraig 2017-12-20T19:09:27.000605Z

@lucasbradstreet none of my segments are anything like that large, assuming the count is bytes - are multiple segments written to each aeron message ?

mccraigmccraig 2017-12-20T19:10:56.000324Z

and is there anything i can do to increase the aeron max-message-length or change how onyx splits messages ?

lucasbradstreet 2017-12-20T19:11:03.000385Z

Yes, it’ll just add messages up to the batch size on the task it’s outputting to

lucasbradstreet 2017-12-20T19:11:44.000266Z

I think I would need to fix whatever bug is causing the miscomputation. You could reduce the batch size on the tasks it’s emitting to though.

lucasbradstreet 2017-12-20T19:11:52.000214Z

That would prevent it from getting too big.

lucasbradstreet 2017-12-20T19:12:01.000097Z

I will obviously need to fix the bug soon.

mccraigmccraig 2017-12-20T19:13:18.000247Z

i'm stuck on 0.9 atm - so it may already have been fixed

mccraigmccraig 2017-12-20T19:14:18.000625Z

i've got an upgrade to 0.12 in the pipeline, but until we move to our new cluster with more recent docker & kafka it's 0.9

mccraigmccraig 2017-12-20T19:14:59.000436Z

ok, i'll try dropping the batch size

lucasbradstreet 2017-12-20T19:32:16.000383Z

Oh, I didn’t realise that. Yes, it’s almost definitely fixed, because I did add support for re-batching messages based on message sizes

lucasbradstreet 2017-12-20T19:32:39.000064Z

Batch sizes on the receiving task won’t help on 0.9, because that was part of the same feature

lucasbradstreet 2017-12-20T19:33:20.000291Z

Best you can do is reduce the batch sizes on the task that is generating all of these fan out messages, and maybe increasing the aeron channel sizes (didn’t suggest that before because I forgot you were running 0.9)

mccraigmccraig 2017-12-20T21:18:51.000016Z

ah - the batch size on the tasks generating the fan-out is already 1, so not much scope for reducing there

mccraigmccraig 2017-12-20T21:20:29.000084Z

how do i increaese the aeron channel sizes @lucasbradstreet? i can't see anything in http://www.onyxplatform.org/docs/cheat-sheet/latest/

lucasbradstreet 2017-12-20T21:21:43.000361Z

You can increase aeron.term.buffer.length via a java property, see https://github.com/real-logic/Aeron/wiki/Configuration-Options

lucasbradstreet 2017-12-20T21:21:57.000103Z

There’s a peer-config option for it in 0.12.

lucasbradstreet 2017-12-20T21:22:06.000381Z

The other suggestion I would have is to decrease onyx/max-pending to backpressure more

lucasbradstreet 2017-12-20T21:22:15.000397Z

If it’s large and you have high fan out, it will get very bad very quick.

lucasbradstreet 2017-12-20T21:23:03.000129Z

(in 0.9)

mccraigmccraig 2017-12-20T21:23:03.000335Z

i'm not setting onyx/max-pending anywhere in my project, so it must be defaulting

lucasbradstreet 2017-12-20T21:23:20.000289Z

Yeah, 10000 is the default, which means 10000 messages at the input source can be outstanding at any time, each with their own fan outs

mccraigmccraig 2017-12-20T21:24:40.000045Z

our input message throughput is not very high - rarely above 10 messages a second i think, but our fanout can easily be 20k per message and getting larger

lucasbradstreet 2017-12-20T21:33:24.000122Z

Yeah, this is going to be a really tough case for onyx 0.9

lucasbradstreet 2017-12-20T21:33:39.000584Z

If your fanout is that big then you should probably reduce batch sizes, reduce max-pending to maybe even 1

lucasbradstreet 2017-12-20T21:33:47.000076Z

and increase the channel sizes

lucasbradstreet 2017-12-20T21:34:02.000342Z

but I would recommend moving over to 0.12, as it is much better at handing these situations.

mccraigmccraig 2017-12-20T21:46:33.000134Z

i'll be on 0.12 soon - in a month or so... migrating to an all-new dc/os based cluster isn't something i want to rush though

lucasbradstreet 2017-12-20T21:52:09.000614Z

Understood, which is why I’m trying to give you some short term workarounds 🙂

lucasbradstreet 2017-12-20T21:52:34.000216Z

Reducing max-pending way, way down, and reducing batch sizes are the best bet.

lucasbradstreet 2017-12-20T21:52:49.000568Z

If your fan out is that large it probably won’t hurt performance. I just can’t guarantee that it won’t pop up again

mccraigmccraig 2017-12-20T22:06:22.000008Z

your help is much appreciated @lucasbradstreet - thank you