Another instance of similar behavior. Got the "forcefully timed out" thing from S3, the :lifecycle/handle-exception
function was called, and Onyx tried to restart the job. However, it got the same exception during the restart (see above), with no more subsequent restart attempts. Is that expected?
Definitely should keep restarting after that exception. There may be a bug in the supervision if that is happening
I could see that not hitting handle-exception because it’s not user code, but it really should be restarting the peer without prejudice
As in the previous case, after restarting the peers, the job does not start, but runs if I submit it by hand.
OK. It sounds like the handle exception flow is not being used and it is writing out a kill-job log message.
Could you add a println/log message to handle-exception and ensure it’s never called?
I have one in there. It gets called once, logs the message, and the next thing in the log is the snippet above. At least in the one peer where the exception occurred. If it happens again I'll be sure to collect logs from all of the peers.
K thanks.
hi guys, if I had a legacy onyx cluster from several versions back, and wanted to stand up a new one on the latest version.. .could they run simultaneously on the same zookeeper?
@jgerman yes, they just need their own tenancy-id
great, I was wondering since I read something about submitting jobs even with no peers as long as there was a previous peer-group on that zookeeper and thought maybe there was tenancy independent structure
thanks for the response, this’ll help us move to current
Another option is to completely jail onyx in zookeeper by adding a path to the zookeeper connection string
But it’s not strictly necessary and has some downsides for migration