Hi @lmergen, and thanks for your reply No not deduplication, actually each stream contains different info but that info may be about the same thing I am getting stock market info from different sources (scrapping some, receiving others from rabbitmq, others comming from different source through tcp socket, etc) And the system is supposed to compute some financial indicators (that may need data from these different sources)
I wanted to merge related segments from different sources and then send them down stream
that's a lot easier 🙂
:)) Yeah What I intend to do is use state management and triggers
Dunno if that’s the right way, though
so just for my understanding, say you get event X from stream A, you want to find related event X' from stream B and send those downstream ?
Yeah
and if no related events are found, none are send downstream ?
Yes
sounds almost like a kind of windowing
where you cache all events in the window
Yeah, so that is the appropriate way, then ?
well maybe you can model it as just an aggregate as well
depends on your time window an exact requirements
but what i would do is use onyx' :aggregation
keys
Hmm ... what did you mean by aggregarate (am I missin something) ?
i think this example code might be of use for your case: https://github.com/lbradstreet/onyx-reducers/blob/master/src/onyx_reducers/deduplicate.clj
Hmmm thanks alot @lmergen
well it's actually more like a reducer, that emits events only when it was found before
About the trigger + window solution I have another problem : But trigger/pred doesn’t appear to be getting the segment How am I supposed to decide when to send the event down stream ?
yeah i think a window might not be the best abstraction here, because it only allows you to flush down the entire window state (i think, i do not have extensive experience with onyx + windows)
the reduction/aggregation as seen in the code sample above is better, because it allows you to separate the state (seen events) from what you want to emit (similar events that have been seen before)
Thanks a lot Will check it right away
I'd use a session window, triggering on segment count = 2
We had a similar problem but we needed to send downstream even if some sources were missing and we used session window + time gap trigger