onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
lucasbradstreet 2018-07-30T01:50:05.000126Z

The job should be able to stay running regardless, assuming the lifecycles are setup right. You just may have a period where one node prevents progress. We have generally used 30 seconds in the past but it really dependso n your needs

sparkofreason 2018-07-30T13:46:55.000403Z

:lifecycle/handle-exception is set, and got called several times with 8io.aeron.exceptions.ConductorServiceTimeoutException`, followed by a couple of org.agrona.concurrent.AgentTerminationException. I'm guessing what follows in the log snippet above is Onyx attempting to restart and not succeding, as the lifecycle handler does not get called again and subsequent exceptions are logged from Onyx code.

lucasbradstreet 2018-07-30T13:50:45.000205Z

Yup, definitely looks like it’s not succeeding

lucasbradstreet 2018-07-30T13:51:14.000184Z

Do you know if a kill-job log entry is written out though? If so, the job should come back up without a new job submission if the container is cycled

sparkofreason 2018-07-30T14:00:39.000184Z

I don't know specifically if kill-job was written (getting monitoring wired up in production is next on the task list), but I don't think the job came back up on its own after recycling the container. I waited awhile after the peers started and nothing seemed to be happening, so I resubmitted the job.

lucasbradstreet 2018-07-30T14:02:02.000072Z

Same tenancy?

sparkofreason 2018-07-30T14:11:38.000259Z

Yes