@matt.t.grimm The way you perceived it is the way we designed it. We didn't have ad-hoc queries in mind -- we were building for an append-only table style of flow.
@matt.t.grimm we’d accept a PR to do more specialised queries, but the primary problem is how to handle check pointing and recovery
This is because it becomes harder to split up key ranges. I’m open to versions that don’t even try to checkpoint and just restart from scratch too.
Thanks for the replies. The question mostly came from a misunderstanding about onyx on my part, and your answer helped me clear a mental hurdle. It did lead to one more question related to onyx's paradigm: is it expected that an entry in the database queue contains all the data necessary for the job that picks it up? Or might the job use parameters from the queue entry to go out and do additional I/O before executing the pure functional part of the job?
@matt.t.grimm not necessarily - you can use onyx’s windowing/aggregation functionality to statefully reduce over multiple data segments (i.e. rows)
that allows you to compute over rows in relation to each other
I'm thinking more along the lines of wanting to reach out to e.g. a lookup table in a database to pull in some data to be merged with the segment.