onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
sparkofreason 2018-08-12T15:07:46.000022Z

I removed the job garbage collection to avoid confusing things. Had one successful restart on an S3 failure, things ran for awhile longer, then another attempted restart on S3 failure, onyx threw (logged as WARN) with Caught exception inside task lifecycle :lifecycle/offer-heartbeats. Rebooting the task. with peer-id: #uuid "8eb1c72c-d43c-395f-67a7-5af1e7639642". This is followed by INFO messages - Peer Group Action: 72614020-5285-6fcf-a6dd-0a570ca1c58f :restart-vpeer 8eb1c72c-d43c-395f-67a7-5af1e7639642 - Peer Group Action: 72614020-5285-6fcf-a6dd-0a570ca1c58f :restart-peer 3fdfb0f5-1695-96f9-f6f1-27ce17dc5453 - Peer Group Action: 72614020-5285-6fcf-a6dd-0a570ca1c58f :stop-peer 3fdfb0f5-1695-96f9-f6f1-27ce17dc5453 - Stopping Virtual Peer 8eb1c72c-d43c-395f-67a7-5af1e7639642 - Peer Group Action: 72614020-5285-6fcf-a6dd-0a570ca1c58f :start-peer 3fdfb0f5-1695-96f9-f6f1-27ce17dc5453 - Starting Virtual Peer 392eb3e7-c20d-a3a7-50b9-ddf7a2b2807f and then the usual bunch of messages around everything shutting down. Since I got rid of the job GC, I was able to verify that the job is moved to the killed key.