@lellis I set :lifecycle/handle-exception
to return :restart
.
@lucasbradstreet Thanks, I'll check into that. In the meantime, looks like I could use the onyx-peer-http-query to set up a liveness check in k8s, getting k8s to restart the peer if the /health
endpoint returns a 500. Presumably if I catch "bad" latency early enough, recycling the peer will allow the job to keep running.
Any thoughts on choice of threshold?