Do I have to do something to resume points if I change the number of peers assigned to a stateful task?
After increasing the peers, I was getting a "key not found" exception from S3 during resume. I changed the resume mode to :initialize
for the offending window and the exception went away, though obviously the window state wasn't recovered. In that original exception expected in this case?
The bad news is that we don’t currently support rescaling peers with resume points. It’s near the top of my list to implement.
I should have a little time over the next week so I might try implementing it, as it’s been on the backlog for too long.
I’ve been holding off because I wanted a good solution where all peers wouldn’t have to read all of the state from all peers in order to repartition. I think if we make it part of building a resume point it will help people get unstuck, though it’s not my ideal solution for the long term.