onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
dbernal 2018-03-13T14:17:22.000232Z

what's the best way to aggregate a list of segments using group-by-key without knowing the size of the list? Basically I have an incoming set of segments that are related through their ids and I need to have them grouped. I'm having a hard time wrapping my head around using state as a way to group segments

michaeldrogalis 2018-03-13T15:47:57.000060Z

@dbernal Windowing to the rescue! Why do you think you need to know the size of the collection?

dbernal 2018-03-13T15:59:50.000747Z

ok I that's what I was starting to look into. I was mostly looking at the group-by-key test code and seeing how it was merging maps. I thought I needed to know the size of the collection so that the task would know when to emit the full aggregated collection. For a batch job, would I use the global window type and use the state map to handle the grouping of segments?

michaeldrogalis 2018-03-13T16:20:28.000894Z

@dbernal Correct, yeah. You definitely want windows and triggers there.

michaeldrogalis 2018-03-13T16:20:55.000336Z

group-by-key will segment data across windows, but the windows themselves will maintain the state, and triggers will emit aggregations

michaeldrogalis 2018-03-13T16:21:09.000350Z

You can also access window contents without triggering via HTTP

dbernal 2018-03-13T16:25:49.000213Z

@michaeldrogalis is there a particular way to trigger at the end of the batch? How would I be able to trigger after all segments have been processed by the task?

michaeldrogalis 2018-03-13T16:27:02.000406Z

@dbernal What input storage are you reading out of?

dbernal 2018-03-13T16:27:26.000559Z

SQL

michaeldrogalis 2018-03-13T16:29:50.000087Z

@dbernal The job will complete after all segments have been read out of a SQL table, and you'll receive a trigger event for completion. You can flush them then

dbernal 2018-03-13T16:31:44.000902Z

@michaeldrogalis is there a particular trigger type I would use?

michaeldrogalis 2018-03-13T16:32:33.000779Z

Shouldn't particularly matter if all you're doing is looking to flush at the end - otherwise just make it no-op based on non-matching event types.

lucasbradstreet 2018-03-13T16:34:29.000083Z

Almost all of the triggers count job-completed as a reason to trigger e.g. https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/triggers.cljc#L84

dbernal 2018-03-13T16:37:07.000106Z

ok cool. Got it! Thanks for all the help @michaeldrogalis @lucasbradstreet

michaeldrogalis 2018-03-13T16:37:18.000777Z

Anytime!

dbernal 2018-03-13T18:00:43.000677Z

is it possible to flush out aggregated segments to a downstream task?

lucasbradstreet 2018-03-13T18:01:40.000843Z

@dbernal yep, use :trigger/emit, see http://onyxplatform.org/docs/cheat-sheet/latest/

lucasbradstreet 2018-03-13T18:02:17.000650Z

generally you will want to use :onyx/type :reduce on the task that does the sending downstream, because you typically don’t want to pass down the segments that are being reduced

dbernal 2018-03-13T18:03:34.000490Z

@lucasbradstreet ok thank you!