Whittling down the production issues. Recently got a "host not found" exception from S3, which I assume is just S3 being flakey. Looks like onyx tried several times to connect and eventually gave up and shut everything down. Is there a setting or something to avoid full shutdown in this case (or some pattern for auto restart of the jobs)?
If you return :restart from your handle-exception lifecycle it should just keep rebooting the peer until it comes back up.
I’ve seen some of those transient host not found issues and I was never sure if it was S3 or whether it was some DNS issues within the container.
Thanks, completely blanked on the handle-exception lifecycle.
@dave.dixon I found that S3 becomes unreachable quite a lot, Lucas has already answered it but yes the lifecycle restart
on handle exception is a life saver 🙂