onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
2018-05-05T04:50:50.000032Z

If you need lots of streaming functions, but not much raw processing power (small data) what options are out there? I like the onyx model, but it seems like onyx-local-rt isn’t built to be used for more than testing. Is there a middle ground where its possible to get the dataflow model but scale it up and down from dataflow models running in threads to dataflow models running across distributed nodes.

2018-05-05T04:53:02.000010Z

I’m also curious what, if anything, needs to be done to move onyx-local-rt into a position where it could be used in a production system as a lightweight tool for realtime dataflow processing. I would be excited to do the work if necessary.

lucasbradstreet 2018-05-05T05:20:50.000048Z

running onyx in single node mode should be perfectly fine, though you will start to top out with the current defaults if you run too many peers on one node.

lucasbradstreet 2018-05-05T05:21:15.000038Z

It’s smart enough not to serialize messages that it’s sending to peers on the same node, so there’s not that much overhead, it’s just a bit aggressive when idling to reduce latency.

lucasbradstreet 2018-05-05T05:22:41.000065Z

The main thing that I would like to see for more efficient scale up on a single node under the data flow model is for onyx peers to share worker threads.

lucasbradstreet 2018-05-05T05:23:46.000102Z

the peer state machine is written in a non-blocking way, so when instead of idling it’d be easy enough to just switch to another task.

joelsanchez 2018-05-05T15:32:09.000005Z

any ideas?

lucasbradstreet 2018-05-05T18:31:20.000007Z

Hmm. It might be restricting all messages that flow to all tasks to ::error? segments

lucasbradstreet 2018-05-05T18:31:38.000013Z

might be a weird handling case with :all and short circuit / error handling flow conditions

lucasbradstreet 2018-05-05T18:31:48.000089Z

I am not sure you would ever want :all for those

joelsanchez 2018-05-05T18:33:30.000080Z

I was under the impression that :all in this context meant "all possible tasks that follow :transact", hence it would equal [:out] in my case

lucasbradstreet 2018-05-05T18:34:00.000079Z

Yeah, it might be due to how the short circuited / error handling conditions are special cased

lucasbradstreet 2018-05-05T18:34:09.000120Z

You’re right that that’s probably how it should work.

lucasbradstreet 2018-05-05T18:34:19.000087Z

Create an issue on github for me and I’ll give it a look

joelsanchez 2018-05-05T18:35:00.000103Z

I might have found something! 🙂 it's not big deal though, I just used [:out], but it was confusing (going to make that issue now)

lucasbradstreet 2018-05-05T18:35:51.000035Z

I might just block :all from being used on the short circuit exceptions

lucasbradstreet 2018-05-05T18:36:16.000131Z

as it doesn’t make a lot of sense since the point is generally to restrict flow to only certain tasks

lucasbradstreet 2018-05-05T18:36:27.000107Z

I’ll have to think about it a bit more though.

joelsanchez 2018-05-05T18:37:09.000038Z

in my case what I wanted to achieve is to auto-handle exceptions across all tasks

joelsanchez 2018-05-05T18:37:45.000026Z

the only way to do it (that I know of) is to create one flow condition for each, :all doesn't seem to work

joelsanchez 2018-05-05T18:38:48.000062Z

@joelsanchez uploaded a file: https://clojurians.slack.com/files/U5LPUJ7AP/FAK3PQD4M/-.clj

lucasbradstreet 2018-05-05T18:40:03.000078Z

Oh, don’t you want :flow/from :all, :flow/to :out?

lucasbradstreet 2018-05-05T18:40:19.000017Z

with a short circuited flow condition?

lucasbradstreet 2018-05-05T18:40:30.000100Z

That should send all error exceptions down to out while leaving everything else working as normal

joelsanchez 2018-05-05T18:40:42.000142Z

that would be it, yes, trying now

joelsanchez 2018-05-05T18:44:14.000053Z

so...it kind of works but at the same time it doesn't

joelsanchez 2018-05-05T18:44:33.000117Z

it works if the task that errored is the one that's connected to out (transact)

joelsanchez 2018-05-05T18:44:49.000081Z

but I get nothing if it's another one that's not connected to out

lucasbradstreet 2018-05-05T18:44:56.000106Z

Ah. Right, all tasks will need to be connected to out

lucasbradstreet 2018-05-05T18:45:11.000078Z

If you’re trying to achieve that it should be pretty easy with code

joelsanchez 2018-05-05T18:45:21.000082Z

ah but I can prevent them from going to out with a flow condition, and connect all of them to out

lucasbradstreet 2018-05-05T18:45:28.000104Z

yes

joelsanchez 2018-05-05T18:46:24.000024Z

ok I'll try 🙂 thanks

joelsanchez 2018-05-05T19:37:55.000017Z

got it working!

🦜 1
joelsanchez 2018-05-05T19:38:20.000088Z

@joelsanchez uploaded a file: https://clojurians.slack.com/files/U5LPUJ7AP/FAJVAL43C/-.clj

joelsanchez 2018-05-05T19:38:40.000072Z

just realized the shortr typo...anyway, this works 100% as expected

joelsanchez 2018-05-05T19:38:59.000055Z

I wire all tasks to out and catch exceptions with this flow condition, for all tasks

lucasbradstreet 2018-05-05T19:39:14.000028Z

Great

2018-05-05T20:38:11.000057Z

@lucasbradstreet The problem i'm trying to solve is that often times we would like to leverage the dataflow model (windows, triggers, etc...) but our processing needs are best served in a none distributed (multiple servers) model. I feel i could craft together the time widowing aspect with coreasync or go channels (in golang) but i'm surprised i dont see this being done already.

lucasbradstreet 2018-05-05T21:06:17.000113Z

Right. I guess I’m saying that if you run embedded aeron and ZooKeeper you’re pretty much getting that (though you still need s3 checkpointing). The main problem is that onyx peers are currently thread heavy.

sparkofreason 2018-05-05T21:50:01.000082Z

Can the onyx-kafka plugin be used without direct access to ZK, such as with a managed kafka service like https://github.com/CloudKarafka/java-kafka-example?

lucasbradstreet 2018-05-05T21:50:43.000012Z

It can take bootstrap servers rather than looking up via ZK. you’ll still need ZK for onyx but it doesn’t have to touch the kafka ZK servers

sparkofreason 2018-05-05T22:24:18.000034Z

That worked great, thanks!

lucasbradstreet 2018-05-05T22:25:31.000053Z

:)