is it possible for a downstream task let's say in a workflow like [[:a :c] [:b :c]] to be able to determine when all of it's upstream tasks are done emitting segments and it can begin doing an aggregation task?
We needed to do something similar and added a feature to the kafka plugin to emit messages for each partition downstream once it hits some target offset. It gets a little tricky as you need to design your job to maintain the message ordering. Given you’re using onyx-sql, your best bet is to wait for the job-completion trigger event, however job-complete and trigger/emit do not play nice.
let's say it's two function tasks. My issue right now is that it seems like one upstream tasks signals job completion and my aggregation task below is triggered to flush out the segments it has but it hasn't fully received the complete set of segments from all upstream tasks. Would it be possible to maintain state in an atom and only flush out when there have been two job completion event-types?
Hmm, that’d be an Onyx bug if I understand what you’re saying.
Onyx will wait for all inputs to signal completion before the downstream task will signal completion.
I see, let me make sure this is the case then. It's just a hunch right now