Hi, there. Can anybody help me with one thing in Onyx? It’s about :onyx/pending-timeout and the sentence bellow: “Asynchronous Barrier Snapshotting fault tolerance technique does not depend on retrying individual segments on a timeout.” Ok, it does not depend on retrying, but indeed is there a timeout inside the code somewhere, where the segments are submitted to retry? If yes, is it possible/recomendable to disable or increase this timeout attribute somewhere? Thanks, Luis
the keyword here is "individual" -- it does not depend on retrying individual segments
however, periodically, it will send a control signal, a barrier, which is then stored on both input and output storage
this essentially makes sure that input and output both agree on what data has been processed
this happens periodically, e.g. every 15 seconds
i think, however, the onyx/pending-timeout might be more of an artifact of the pre-ABS days
Yes, it is pre-ABS days. My concern is: eventually, if a block of segments passes the 60sec, it will receive a timeout and then retried, right? Is it defined in some place?
yes if the block passes 60s, a timeout occurs and an exception for the task is thrown
Onyx then attempts to restart the task
and recover from the last checkpoint
the code to handle this is fairly deep inside Onyx, but you can see it surface in e.g. all the plugins that have to implement these check pointing and recovery functions