I'm attempting to use the results of a jdbc query as an input for my workflow. What is the recommended way of accomplishing that? The onyx-sql
plugin looks like it would require massive amounts of unnecessary data to come in over the wire, and I loose connection to zookeeper when using the onyx-seq
plugin if I don't severely limit the results from my query.
@halcyon Are you looking to execute a single query, and pump those results through the workflow once? Or recurringly query and continuous put results in?
With respect to losing your ZK connection, you probably need to switch your checkpoint storage to :s3
instead of :zookeeper
and you'll be fine
single query and pump those results through the workflow once
Probably onyx-seq or onyx-kafka. If the results fit in memory use onyx-seq, if not spool them into a Kafka topic. Either way definitely make sure checkpointing is using S3
ZK is only for development
thank you!
Anytime! Let us know if you hit any trouble.
After switching to S3 for checkpointing, will onyx still require ZK for anything else?
Yeah, it will for coordination
It just wont be storing large amounts of data inside of it, which is why you were hitting those problems
Off for the night now. 🙂
Got it, thanks again - good night!
Hey, I was wondering if it is a possibility for the onyx-kafka plugin to have the number of peers per task scale with cluster size up to :n-partitions
. Currently it is required that :onyx/min-peers must equal :onyx/max-peers, or :onyx/n-peers must be set, and :onyx/min-peers and :onyx/max-peers must not be set
. Most desired functionality is that the n-peers scale automatically up and down with increasing and decreasing amounts of instances. Not sure if that is feasible, but I am thinking it will require a complete reconfiguration of the assignment of partitions per peer with each scale action. If it is possible and fits in the spec I am willing to help out developing it.
That would be a nice addition @eelke I’ve always had to bring the running job down, reconfigure and bring it back up again.
Yeah, I think it is nice to have if you want to use autoscaling. Our use case is that the throughput range is quite big, so on moments of low throughput we would like to have less instances and the other way around.
agreed
@eelke Onyx would need to support repartitioning state. Each peer is set up with some checkpoint state about each partition's offset. Changing the way partitions and peers are reassigned means that we'd need a better way to use resume points which translate between the before/after configuration.
It's completely possible, but not on our short term roadmap.
Are there any open source projects using onyx in production(rather than reference implementations) that are known for having good best practices and format? Would love to read through some codebases to see how they've structured onyx, even better if they've been vetted by Distributed Masonry.
It wouldn’t be so hard to repartition the kafka offset state, so it’s certainly more possible than general rescaling of window state.
Ok sounds good. Let's do it 😉