what's the best way to aggregate a list of segments using group-by-key without knowing the size of the list? Basically I have an incoming set of segments that are related through their ids and I need to have them grouped. I'm having a hard time wrapping my head around using state as a way to group segments
@dbernal Windowing to the rescue! Why do you think you need to know the size of the collection?
ok I that's what I was starting to look into. I was mostly looking at the group-by-key test code and seeing how it was merging maps. I thought I needed to know the size of the collection so that the task would know when to emit the full aggregated collection. For a batch job, would I use the global window type and use the state map to handle the grouping of segments?
@dbernal Correct, yeah. You definitely want windows and triggers there.
group-by-key will segment data across windows, but the windows themselves will maintain the state, and triggers will emit aggregations
You can also access window contents without triggering via HTTP
@michaeldrogalis is there a particular way to trigger at the end of the batch? How would I be able to trigger after all segments have been processed by the task?
@dbernal What input storage are you reading out of?
SQL
@dbernal The job will complete after all segments have been read out of a SQL table, and you'll receive a trigger event for completion. You can flush them then
@michaeldrogalis is there a particular trigger type I would use?
Shouldn't particularly matter if all you're doing is looking to flush at the end - otherwise just make it no-op based on non-matching event types.
Almost all of the triggers count job-completed as a reason to trigger e.g. https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/triggers.cljc#L84
ok cool. Got it! Thanks for all the help @michaeldrogalis @lucasbradstreet
Anytime!
is it possible to flush out aggregated segments to a downstream task?
@dbernal yep, use :trigger/emit
, see http://onyxplatform.org/docs/cheat-sheet/latest/
generally you will want to use :onyx/type :reduce
on the task that does the sending downstream, because you typically don’t want to pass down the segments that are being reduced
@lucasbradstreet ok thank you!