I'm curious how onyx handles restart with stateful tasks. I understand why the number of peers assigned to a stateful task can't change while the job is running, but what happens when you kill and restart. Can it redistribute the tasks to a different number of peers?
It does not currently have the ability to re-partition state. It’s been on our backlog for a while.
It wouldn’t really be all that hard to implement a feature which would use the resume-point functionality to recover and repartition state.
That would be a great feature.
Agreed. Our current suggestion is to over-partition your stateful peers to begin with. We know it’s not great.