Hi, I'm considering using Onyx for a project. In our case there would have to be jobs that could "revisit" an earlier "job handler" based on certain criteria. For example: a user uploads two files, one video file and one PDF. The ideal flow would look like this: 1) Extract text from the PDF. 2) Perform analysis on the extracted text. 3) Send the entire segment onwards to a job for speech recognition. This job would make use of the earlier text analysis to more accurately predict which words are likely to occur in the speech. 4) The segment now contains the recognized text from the video. Now the segment would have to "revisit" step 2 to perform analysis on this text as well. I hope this example is clear. Is it possible to construct such workflows with Onyx?
yes that type of workflow is very much possible with onyx
depending upon the exact requirements, it could either be a single job with multiple tasks, or multiple jobs
@lmergen Good to know! I will dive into the documentation then 🙂
So I've got an issue currently where the zookeeper connection is closing. The exception is "Connection reset by peer" and zookeeper is configed with max connections of 500, and a timeout of 60000ms. The workflow is simply a Sequence input that I'm feeding 600k triples into, followed by a transform function and output. The workflow fails while loading the sequence into zookeeper after 10 seconds. Any ideas?
My first guess is you’re probably hitting the 1MB cap on zookeeper payloads. If you’re feeding that much data in you’re probably better off doing it outside of the job data by just passing the parameters to what you’re generating.
Thanks for the quick response! I'll rework the job to batch things better
It’d be good for us to have a way to support side channel submissions of data to s3 rather than ZK for this sort of thing