onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
michaeldrogalis 2018-01-23T02:19:04.000225Z

@atwrdik Ran out of time tonight -- going to have to take a look tomorrow. Sorry! ๐Ÿ˜•

๐Ÿ‘ 1
sparkofreason 2018-01-23T15:09:26.000833Z

Starting work on a use-case where a user will upload a file that needs to be converted and streamed into our Onyx system. We're currently just uploading straight to S3. Before I go reinventing any wheels, if anyone has any tips or pointers to useful libraries for this task, it would be much appreciated. Thanks.

michaeldrogalis 2018-01-23T15:27:37.000907Z

@dave.dixon S3 -> notification into Lambda -> read file inside Lambda and stream to Kafka -> into Onyx is one route. From what I've heard about your use case previously, Kafka's not ideal here though.

michaeldrogalis 2018-01-23T15:28:04.000574Z

Being able to make use of onyx-amazon-s3 directly off of S3 change notifications would be pretty nice

sparkofreason 2018-01-23T15:50:21.000539Z

We'll get explicit notification of the file upload in our onyx job. I was going to have that task stream in the file, parse/transform and output to another topic (the contents are transformed into the same commands used in other parts of the application). I think that should be fine, just that one task will be spewing a lot of segments. Will that work? The main thing to avoid would be having to vacuum up the whole file, or having to accumulate all output segments and push them at once, and I'm not entirely clear how to manage that.

sparkofreason 2018-01-23T15:57:24.000950Z

Or maybe that requires a plugin, and I need to kick off another parameterized job?

michaeldrogalis 2018-01-23T16:04:47.000550Z

I think the main risk in that approach is knowing what to do if one of your Onyx nodes gets kicked over. Are those notifications replayable?

michaeldrogalis 2018-01-23T16:05:25.000059Z

I think it'd work okay if you can parallelize the number of processes on the input task doing the reading activity.

sparkofreason 2018-01-23T18:16:55.000224Z

If a task returns a seq of segments, is it evaluated lazily?

michaeldrogalis 2018-01-23T18:20:56.000053Z

How late does the evaluation need to occur for it to be "lazy"? ๐Ÿ™‚

michaeldrogalis 2018-01-23T18:21:06.000330Z

Or rather, by whom as the evaluator?

sparkofreason 2018-01-23T18:23:44.000187Z

Presumably the evaluator is whatever is distributing the segments to downstream tasks. Just trying to get my head around how a task could output a large batch of segments without having to surface them all in memory.

michaeldrogalis 2018-01-23T19:05:57.000270Z

Im not sure that it's handled lazily all the way through. I could be wrong though. @lucasbradstreet?

lucasbradstreet 2018-01-23T19:10:11.000460Z

If a task returns a seq of segments itโ€™ll realize it before sending it on. It may split the seq into multiple batches on the messenger, but itโ€™ll have to realize it all at once.

lucasbradstreet 2018-01-23T19:10:38.000232Z

The exception is if you use something like onyx-seq. With onyx-seq input tasks the plugin can support realizing a lazy seq incrementally.