onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
2018-09-23T15:36:59.000100Z

is there a rule of thumb on how much space to give the aeron media driver? seeing a few of these pop up in our logs io.aeron.exceptions.RegistrationException: Insufficient usable storage for new log of length=50332096 in /dev/shm (shm)

2018-09-23T15:50:55.000100Z

I’m noticing people have posted the exact error message in here in the past, is 50332096 the length of the buffer per vpeer?

2018-09-23T18:35:17.000100Z

it's more likely that you're allocating too much space to Aeron, than not having enough available

2018-09-23T18:36:09.000100Z

are you running in docker, by any chance?

2018-09-23T18:41:54.000100Z

@lmergen we are running in docker on kubernetes. We’ve ben giving 2GB per a replica, but we were running a lot of peers per replica. recently we reduced the number of peers per replica and add more replicas. We are currently running 20 vpeers per replica

lucasbradstreet 2018-09-23T18:44:39.000100Z

The largest term buffers should only be 2MB by default, but it’s possible you’re running a version from before that was set as the default? http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.messaging/term-buffer-size.segment

lucasbradstreet 2018-09-23T18:45:41.000100Z

Thought that said, it will allocate three of those for one channel, so it’s more like 6MB

2018-09-23T18:46:25.000200Z

@lucasbradstreet this is on v 0.9.x . we are in the process of upgrading to 0.13.x but it will take us a while to move it all over so we are just trying to keep this up and running while we do so.

lucasbradstreet 2018-09-23T18:46:31.000100Z

One of these is required for each peer to peer connection, so I’m mostly responding to that registration exception above.

lucasbradstreet 2018-09-23T18:46:32.000100Z

Ah ok

lucasbradstreet 2018-09-23T18:48:07.000200Z

In that case your best bet is to probably reduce aeron.term.buffer.length in https://github.com/real-logic/aeron/wiki/Configuration-Options#common-options

lucasbradstreet 2018-09-23T18:48:17.000100Z

Via a java property

lucasbradstreet 2018-09-23T18:48:35.000100Z

For 0.9.x this will limit the max size of your batch in bytes.

lucasbradstreet 2018-09-23T18:50:32.000100Z

In 0.9.x, Onyx is not smart enough to chunk your batch before sending it, so if you have a batch size of say 20, then the term buffer needs to be big enough to hold all 20 messages. Term buffer of 16MB = 16/8 = 2MB max batch size.

lucasbradstreet 2018-09-23T18:50:51.000100Z

So if you expect large messages you will need to reduce the batch size (in 0.9)

2018-09-23T18:51:33.000100Z

gotcha. that makes sense.

2018-09-23T18:52:53.000100Z

and aeron.term.buffer.length is an arg to the media driver, correct?

lucasbradstreet 2018-09-23T18:54:57.000100Z

It should be to the peer, since you can change the buffer size client side.

2018-09-23T18:56:58.000100Z

ok, even easier, then. Forgive me, I’ve been staring at this stuff for 2 days now, in your calculation above, where is the 8 in 16/8 coming from?

lucasbradstreet 2018-09-23T18:57:43.000100Z

Aeron allows a max message size of term buffer / 8

lucasbradstreet 2018-09-23T18:58:16.000100Z

In onyx 0.13, it can split up a batch into multiple messages, so max of term buffer / 8 per segment. 0.9 can’t do this.

2018-09-23T18:59:47.000100Z

ok thanks, appreciate the help.

2018-09-23T22:36:07.000100Z

besides the error I posted above, what would the effect of using a buffer that large be? If the default is now 2, is there a performance hit or some other reason a smaller buffer is preferred? I’m just curious as to how this has affected our system to date, although we used to have a lot more peers on a single instance, so I assume the issue was less pronounced until recently

lucasbradstreet 2018-09-23T22:39:22.000100Z

The new default is partially because of the switch to a new fault tolerance model which requires more connections between peers. In 0.9 they were multiplexed and also able to restrict their number of peer to peer connections. This meant more connections and more buffer use, so we had to reduce the buffer sizes to compensate. That said, there will still be an effect on the number buffer as the number of nodes scale up. As a result you are still probably seeing similar effects now as you scale up

lucasbradstreet 2018-09-23T22:41:49.000100Z

The consequences of a smaller buffer are primarily 1. Reduced max message size, which we discussed. 2. Some throughput effects, however these haven’t tended to be much of an issue in 0.13, so you can likely reduce them without much of a consequence. I’m not sure how large the effect will be since we didn’t test 0.9 in these ways.

2018-09-23T22:48:53.000100Z

ok, appreciate the information. based on the calculation you posted earlier and some napkin calculations with our batch size and what each segment generally contains, I think we can reduce the size of those buffers by a quite a bit. is there a way to calculate the total number of buffers that needs to be created on a node? I assumed it was vpeer count * size of buffer, but it sounds like it may also be based on the overall size of the cluster?

2018-09-23T22:52:07.000100Z

sorry, that doesn’t make sense. what I’m trying to ensure is that there is sufficient space in /dev/shm to accommodate the buffers that need to be created

lucasbradstreet 2018-09-23T22:54:32.000100Z

If I remember correctly, 0.9 multiplexes the connections so you end up with each node having a connection to every other node. So each node needs (n-1)*term buffer size*3 (each connection has three term buffers). I’m kinda skeptical that this is true however, as you said you’re giving it 2GB of SHM, so I’m not sure why you would be running out of what I say is true.

2018-09-23T23:01:57.000100Z

where n is node count, correct? as part of this scaling effort, we went from running a few very large nodes to a lot of smaller nodes, so as of now we have 30 total nodes in the cluster. so 1) it sounds like that might not be a great strategy, and 2) I think that means we would need over 4GB, given it was trying to create 50MB buffers

2018-09-23T23:05:32.000100Z

reducing that to 16MB, which should be more than enough space for out batches, I believe gets us well under the 2GB we have allocated, but I’m curious if we wouldn’t be better off moving back to bigger nodes

lucasbradstreet 2018-09-23T23:05:51.000100Z

Yes, n is node count

lucasbradstreet 2018-09-23T23:05:58.000100Z

Ah that makes sense then

lucasbradstreet 2018-09-23T23:06:54.000100Z

Knowing nothing about your application, my preference would be for 5 bigger nodes rather than 30 smaller ones.

lucasbradstreet 2018-09-23T23:07:32.000100Z

Partially because onyx short circuits locally, which reduces overhead a lot because there’s no networking or serialisation. But to do so peers need to be collocated.

2018-09-23T23:08:23.000100Z

is that something that is configurable?

lucasbradstreet 2018-09-23T23:13:25.000100Z

Short circuiting is on by default but there’s not really any good way to encourage more short circuiting other than increasing the number of peers on each node, and decreasing node count

2018-09-23T23:14:00.000100Z

gotcha. well looks like I did exactly the wrong thing. good to know

lucasbradstreet 2018-09-23T23:15:37.000100Z

To me, more nodes is good because it can help with fault tolerance if a node goes down, but 30 does feel excessive if you already have 5 and could scale vertically

2018-09-23T23:17:07.000100Z

makes sense. was going for fault tolerance, but given what I learned here, I think less nodes is probably better for throughput

2018-09-23T23:18:32.000100Z

thanks for the help, this has ben enlightening to say the least

lucasbradstreet 2018-09-23T23:20:18.000100Z

You’re welcome. Good luck. There were a lot of improvements to heart beating and health checks since 0.9, so you will need to do your best without them being built in. I understand why trying not to switch fault tolerance models as you scale up now is the preference though.

2018-09-23T23:25:44.000100Z

yeah, luckily all new work is being done on 0.13, which we are about to release to production, but we have some plugins to port to 13 before we can move all of our jobs over