onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
sparkofreason 2018-08-05T14:01:10.000054Z

Back on the exception/restart topic: happened again, lifecycle exception handler was called several times, and onyx gave a warning a few times in there as well. The last thing was a the onyx warning "Caught exception inside task lifecycle :lifecycle/offer-heartbeats.", and then everything shut down.

sparkofreason 2018-08-05T14:05:24.000053Z

And as before, after the peers are restarted the job does not restart on its own, and requires manual resubmission.

lucasbradstreet 2018-08-05T17:33:21.000028Z

Thanks. There must be a bug in the supervision where handle-exception isn’t invoked under certain circumstances (probably in offer-heartbeats)

lucasbradstreet 2018-08-05T17:33:45.000016Z

I assume handle-exception is set for :all and always returns :restart?

sparkofreason 2018-08-05T17:54:40.000002Z

I believe so, code above, let me know if I missed something. It did actually restart successfully several times.

lucasbradstreet 2018-08-05T17:55:08.000011Z

Looks right to me. Just double checking stuff before going digging.

lucasbradstreet 2018-08-05T17:55:46.000017Z

Do you know whether the old job moved to the killed key in the cluster replica? 99% sure it is.

sparkofreason 2018-08-05T21:50:47.000007Z

How would I check?

lucasbradstreet 2018-08-05T21:51:37.000092Z

On my phone so I haven’t checked this. If you have onyx peer http query you can query /replica and see if the latest job-is is under killed-jobs

lucasbradstreet 2018-08-05T21:51:57.000053Z

Either that or look under killed-jobs however you’ve played back the log for diagnostics