For handling redeployments of an ongoing job, I've been manually assigning the a new job ID, using the previous job ID to look up the job snapshot coordinates, which is then used to create the resume point for the updated job. I have a feeling I'm making this too hard. How is this generally handled?
In the last year we added the ability to lookup the job history by a :job-name
that is supplied in the job map
You can use https://github.com/onyx-platform/onyx/blob/0.13.x/src/onyx/api.clj#L298 to get the job-id / tenancy history, or even better get the snapshot coordinates by name here https://github.com/onyx-platform/onyx/blob/0.13.x/src/onyx/api.clj#L308
The latter is smart enough to deal with cases where a job never managed to successfully checkpoint e.g. deployed with a bug
It walks back until it finds a snapshot
Nice
Thanks, that's a big improvement over what I was doing before.
Great. Had to dog food the resume stuff a bit before coming up with a good pattern for that 🙂
If you’re multi node and submit on startup you’ll still want to inject a stable job-id into all of your peers, but other than that it should just work
I’ll try to work that into the resume point onyx example