Then how can one implement this? After all, in the documentation for the watermark trigger it states "This is a shortcut function for a punctuation trigger that fires when any piece of data has a time-based window key that is above another extent, effectively declaring that no more data for earlier windows will be arriving." And do I understand right that if you introduce more peers, your window would be fragmented? Is there a way to combine those windows in the case of many peers, to get the actual window of all according data?
So you can do it if you know the group key for the window by sending a segment that will be routes to the right set of windows. However that is not possible with many peers and no routing.
Since you are online, here is my follow-up noob question: A setting of :onyx/max-peers 1 for the task that is referred to in the defined windowing will prevent this, correct?
That’s correct
(Assuming that no grouping is used. If grouping is used you will need a segment for each group)
Okay, thanks for the clarification. In my application I do use grouping, but per group it is ensured that at least one segment is emitted in each window.
I inherited an onyx (0.9.x, we are upgrading but it will be a bit and we have to make do with 0.9.x for a bit) application a while back and I have some basic questions on vpeer / task allocation. We are using the job percentage scheduler, if we add more replicas that spin up more vpeers after the job has been submitted, do tasks dynamically get allocated to the vpeers on those replicas? or do the peers have to be up at the time the job is submitted for them to get allocated?
at the moment the task allocation is static; if a job has. say. 4 vpeers for a single task, that will remain "fixed"
dynamic scaling is a bit tricky in the context of aggregates / window functions, but not completely undoable
ok, so if I understand correctly, If I add vpeers via another replica after the jobs have been submitted and started, those new vpeers will not pick up any tasks?
@lmergen am I getting that right, or am I misunderstanding what you mean, there
@djjolicoeur yes this is correct
it will recover from crashed servers, among others, but it will not be able to dynamically scale up or down
thanks, just validated. we were submitting jobs, then spinning up replicas, which meant we had a ton of unused vpeers
yeah that's not the correct approach -- first spin up the peers, then submit jobs