That’s definitely my overall impression of onyx-sql
onyx-kafka, onyx-seq, and onyx-core-async (for testing), onyx-amazon-s3, onyx-amazon-sqs, and onyx-http are the most popular plugins overall
Ranking between those is tricky
onyx-kafka is definitely at the top, followed by sqs and s3 probably
@rustam.gilaztdinov not sure whether it fits in your architecture, but perhaps it's a better idea to use Kafka Connect or something like that to source data from PostgreSQL into Kafka, and then read from Kafka directly in Onyx
What about https://debezium.io
I’ve heard a lot of good things for CDC with debezium. Then onyx-kafka
The CDC case is pretty big and hasn’t been paid enough attention to imo
nice, didn't know about debezium
that's pretty sweet
The CDC direction to streaming is a pretty low risk way to start integrating systems into the streaming model
yes, and it makes a lot of sense -- i see that debezium uses actual database xlogs to capture changes
i still remember the insane old days of PostgreSQL + Slony
I should really dust off the twitter plugin work that I did.
this is a very frustrating behavior of sql plugin. Still doesn’t solve it
@jasonbell actually i've brought it up to date until 0.12.7, i think it's compatible with 0.14 as well
@rustam.gilaztdinov you mean the NPE?
@lmergen hello, sorry for the delay, can we pls try to fix this? how can I know slot-id? ranges is right, I guess -- [0 99] [100 199] [200 299]...
(take-nth n-peers (drop slot-id ...))
-- in my case, n-peers = 3, so, I choose every 3rd value in sequence
yep
do you know which line of code is triggering the NPE? that would help
Handling uncaught exception thrown inside task lifecycle :lifecycle/read-batch. Killing the job.
java.lang.Thread.run Thread.java: 748
java.util.concurrent.ThreadPoolExecutor$Worker.run ThreadPoolExecutor.java: 624
java.util.concurrent.ThreadPoolExecutor.runWorker ThreadPoolExecutor.java: 1149
...
clojure.core.async/thread-call/fn async.clj: 441
onyx.peer.task-lifecycle/start-task-lifecycle!/fn task_lifecycle.clj: 1155
onyx.peer.task-lifecycle/run-task-lifecycle! task_lifecycle.clj: 550
onyx.peer.task-lifecycle.TaskStateMachine/exec task_lifecycle.clj: 1070
onyx.peer.task-lifecycle/wrap-lifecycle-metrics/fn task_lifecycle.clj: 1097
onyx.peer.task-lifecycle/build-read-batch/fn task_lifecycle.clj: 651
onyx.peer.read-batch/read-input-batch read_batch.clj: 49
onyx.peer.read-batch/read-input-batch/fn read_batch.clj: 54
onyx.plugin.sql.SqlPartitioner/poll! sql.clj: 114
clojure.core/first core.clj: 55
...
clojure.core/take-nth/fn core.clj: 4271
clojure.core/seq core.clj: 137
...
clojure.core/drop/fn core.clj: 2924
clojure.core/drop/step core.clj: 2921
great, that helps, let me see
ohhh, i see what's going on already
Excellent news @lmergen thanks for letting me know
https://github.com/solatis/onyx-sql/tree/master i've just pushed a fix for this issue there, could you try it out ?
no, not helped(
are you sure you're using the correct version? is the error still exactly the same ?
yep, I’m sure, and error the same
this is weird, lein clean
not help
error on onyx.plugin.sql.SqlPartitioner/poll! sql.clj: 115
ok, it could be that partition-table
returns nil
could you try to change the if-let at line 115 to this:
(if-let [part (and rst
(first @rst))]
no, doesn’t help 😞
this makes no sense at all
:thisisfine:
wait! :thinking_face:
could it be that there is a sequence with lazy side-effects here
which is triggered by the first
...
hmm
ok i don't have the time to debug this issue atm, but i'm fairly sure the issue is with the drop
function inside the partition-table
function
👀
i will check it
there's probably an incorrect slot-id, or an empty ranges, or something like that
what happens at that function is that onyx takes your total input table, and divides it over the number of peers you have assigned to the input task
that way, you get automatic partitioning
it does this by creating N partitions, and then dropping the first M partitions (where M == our own peer id)
there is probably something off in one of those things
if you could identify what the inputs of that function are, specifically slot-id and the ranges, that would be very useful