onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
jlmr 2018-06-20T11:21:12.000014Z

Hi, I'm considering using Onyx for a project. In our case there would have to be jobs that could "revisit" an earlier "job handler" based on certain criteria. For example: a user uploads two files, one video file and one PDF. The ideal flow would look like this: 1) Extract text from the PDF. 2) Perform analysis on the extracted text. 3) Send the entire segment onwards to a job for speech recognition. This job would make use of the earlier text analysis to more accurately predict which words are likely to occur in the speech. 4) The segment now contains the recognized text from the video. Now the segment would have to "revisit" step 2 to perform analysis on this text as well. I hope this example is clear. Is it possible to construct such workflows with Onyx?

2018-06-20T11:34:09.000106Z

yes that type of workflow is very much possible with onyx

2018-06-20T11:34:40.000233Z

depending upon the exact requirements, it could either be a single job with multiple tasks, or multiple jobs

jlmr 2018-06-20T12:12:48.000385Z

@lmergen Good to know! I will dive into the documentation then 🙂

daniel-tcgplayer 2018-06-20T19:27:09.000033Z

So I've got an issue currently where the zookeeper connection is closing. The exception is "Connection reset by peer" and zookeeper is configed with max connections of 500, and a timeout of 60000ms. The workflow is simply a Sequence input that I'm feeding 600k triples into, followed by a transform function and output. The workflow fails while loading the sequence into zookeeper after 10 seconds. Any ideas?

lucasbradstreet 2018-06-20T19:28:36.000612Z

My first guess is you’re probably hitting the 1MB cap on zookeeper payloads. If you’re feeding that much data in you’re probably better off doing it outside of the job data by just passing the parameters to what you’re generating.

daniel-tcgplayer 2018-06-20T19:32:58.000171Z

Thanks for the quick response! I'll rework the job to batch things better

lucasbradstreet 2018-06-20T19:37:54.000016Z

It’d be good for us to have a way to support side channel submissions of data to s3 rather than ZK for this sort of thing