is there a rule of thumb on how much space to give the aeron media driver? seeing a few of these pop up in our logs io.aeron.exceptions.RegistrationException: Insufficient usable storage for new log of length=50332096 in /dev/shm (shm)
I’m noticing people have posted the exact error message in here in the past, is 50332096
the length of the buffer per vpeer?
it's more likely that you're allocating too much space to Aeron, than not having enough available
are you running in docker, by any chance?
@lmergen we are running in docker on kubernetes. We’ve ben giving 2GB per a replica, but we were running a lot of peers per replica. recently we reduced the number of peers per replica and add more replicas. We are currently running 20 vpeers per replica
The largest term buffers should only be 2MB by default, but it’s possible you’re running a version from before that was set as the default? http://www.onyxplatform.org/docs/cheat-sheet/latest/#peer-config/:onyx.messaging/term-buffer-size.segment
Thought that said, it will allocate three of those for one channel, so it’s more like 6MB
@lucasbradstreet this is on v 0.9.x
. we are in the process of upgrading to 0.13.x
but it will take us a while to move it all over so we are just trying to keep this up and running while we do so.
One of these is required for each peer to peer connection, so I’m mostly responding to that registration exception above.
Ah ok
In that case your best bet is to probably reduce aeron.term.buffer.length in https://github.com/real-logic/aeron/wiki/Configuration-Options#common-options
Via a java property
For 0.9.x this will limit the max size of your batch in bytes.
In 0.9.x, Onyx is not smart enough to chunk your batch before sending it, so if you have a batch size of say 20, then the term buffer needs to be big enough to hold all 20 messages. Term buffer of 16MB = 16/8 = 2MB max batch size.
So if you expect large messages you will need to reduce the batch size (in 0.9)
gotcha. that makes sense.
and aeron.term.buffer.length
is an arg to the media driver, correct?
It should be to the peer, since you can change the buffer size client side.
ok, even easier, then. Forgive me, I’ve been staring at this stuff for 2 days now, in your calculation above, where is the 8 in 16/8 coming from?
Aeron allows a max message size of term buffer / 8
In onyx 0.13, it can split up a batch into multiple messages, so max of term buffer / 8 per segment. 0.9 can’t do this.
ok thanks, appreciate the help.
besides the error I posted above, what would the effect of using a buffer that large be? If the default is now 2, is there a performance hit or some other reason a smaller buffer is preferred? I’m just curious as to how this has affected our system to date, although we used to have a lot more peers on a single instance, so I assume the issue was less pronounced until recently
The new default is partially because of the switch to a new fault tolerance model which requires more connections between peers. In 0.9 they were multiplexed and also able to restrict their number of peer to peer connections. This meant more connections and more buffer use, so we had to reduce the buffer sizes to compensate. That said, there will still be an effect on the number buffer as the number of nodes scale up. As a result you are still probably seeing similar effects now as you scale up
The consequences of a smaller buffer are primarily 1. Reduced max message size, which we discussed. 2. Some throughput effects, however these haven’t tended to be much of an issue in 0.13, so you can likely reduce them without much of a consequence. I’m not sure how large the effect will be since we didn’t test 0.9 in these ways.
ok, appreciate the information. based on the calculation you posted earlier and some napkin calculations with our batch size and what each segment generally contains, I think we can reduce the size of those buffers by a quite a bit. is there a way to calculate the total number of buffers that needs to be created on a node? I assumed it was vpeer count * size of buffer, but it sounds like it may also be based on the overall size of the cluster?
sorry, that doesn’t make sense. what I’m trying to ensure is that there is sufficient space in /dev/shm to accommodate the buffers that need to be created
If I remember correctly, 0.9 multiplexes the connections so you end up with each node having a connection to every other node. So each node needs (n-1)*term buffer size*3 (each connection has three term buffers). I’m kinda skeptical that this is true however, as you said you’re giving it 2GB of SHM, so I’m not sure why you would be running out of what I say is true.
where n is node count, correct? as part of this scaling effort, we went from running a few very large nodes to a lot of smaller nodes, so as of now we have 30 total nodes in the cluster. so 1) it sounds like that might not be a great strategy, and 2) I think that means we would need over 4GB, given it was trying to create 50MB buffers
reducing that to 16MB, which should be more than enough space for out batches, I believe gets us well under the 2GB we have allocated, but I’m curious if we wouldn’t be better off moving back to bigger nodes
Yes, n is node count
Ah that makes sense then
Knowing nothing about your application, my preference would be for 5 bigger nodes rather than 30 smaller ones.
Partially because onyx short circuits locally, which reduces overhead a lot because there’s no networking or serialisation. But to do so peers need to be collocated.
is that something that is configurable?
Short circuiting is on by default but there’s not really any good way to encourage more short circuiting other than increasing the number of peers on each node, and decreasing node count
gotcha. well looks like I did exactly the wrong thing. good to know
To me, more nodes is good because it can help with fault tolerance if a node goes down, but 30 does feel excessive if you already have 5 and could scale vertically
makes sense. was going for fault tolerance, but given what I learned here, I think less nodes is probably better for throughput
thanks for the help, this has ben enlightening to say the least
You’re welcome. Good luck. There were a lot of improvements to heart beating and health checks since 0.9, so you will need to do your best without them being built in. I understand why trying not to switch fault tolerance models as you scale up now is the preference though.
yeah, luckily all new work is being done on 0.13, which we are about to release to production, but we have some plugins to port to 13 before we can move all of our jobs over