onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
sparkofreason 2018-04-24T01:05:48.000067Z

I'd like to have a trigger that fires periodically, but only if new segments have arrived. Is that possible?

lucasbradstreet 2018-04-24T01:23:45.000027Z

@dave.dixon yes, you’ll need to write a new trigger that uses some of the logic from the timer trigger as well as setting a flag when a segment comes through. See https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/triggers.cljc#L56

lucasbradstreet 2018-04-24T01:24:26.000064Z

You can build your own triggers and keep them as part of your code, but if you get it all worked out we’d be happy to add it to Onyx.

sparkofreason 2018-04-24T01:27:39.000126Z

Thanks, did not realize you could do custom triggers. Very cool.

lucasbradstreet 2018-04-24T01:30:55.000021Z

Yep. The different event types that you can chose to act upon are listed here: http://www.onyxplatform.org/docs/cheat-sheet/latest/#state-event/:event-type

lucasbradstreet 2018-04-24T01:31:49.000100Z

The trigger states are checkpointed and may be recovered on a different node, so be careful to reset any times on recovery.

lucasbradstreet 2018-04-24T01:32:27.000077Z

The current timer trigger just choses to fire on recovery and reset the timer that way.

sparkofreason 2018-04-24T03:48:30.000042Z

So generally, if a namespace-qualified keyword like :onyx.triggers/timer appears as a value in a job-definition map, is it safe to assume that refers to a symbol, and thus the associated attribute is a potential extension point?

lucasbradstreet 2018-04-24T04:04:07.000176Z

Yeah, it might have been nice to support symbols there too, but that’s right.

lucasbradstreet 2018-04-24T15:51:05.000757Z

Hmm. Interesting. Won’t that only fire when it sees a new segment and enough time has passed? So if not enough time has passed and then a segment comes, it won’t fire in the future if no more segments arrive?

sparkofreason 2018-04-24T15:57:56.000247Z

Yes, that's the behavior I was looking for. But what you describe may be more generally useful.

lucasbradstreet 2018-04-24T16:05:57.000702Z

Cool. This is more likely to have fresh emits when you do emit.

lucasbradstreet 2018-04-24T17:52:11.000396Z

Ah cool, you’re able to handle both cases then.

lucasbradstreet 2018-04-24T17:53:31.000687Z

One quick non-obvious thought is that if you are going to be using group-by-key/group-by-fn with a high cardinality of keys, you may want to switch the trigger-state from a map to a vector. This is because you will be storing the keywords as strings so your state will be larger when serialized. It’s not really a big deal otherwise though.

lucasbradstreet 2018-04-24T17:54:05.000168Z

That’s the main reason all of the other trigger destructure vectors instead.

lucasbradstreet 2018-04-24T17:54:37.000597Z

As the state gets bigger those keywords will compress better though, so overall it might not hurt so much.

sparkofreason 2018-04-24T17:57:05.000627Z

Ah. I was wondering why vectors were used. I'll probably make that change, with the optimism we'll have so many customers it will matter.

lucasbradstreet 2018-04-24T17:57:59.000269Z

🙂

lucasbradstreet 2018-04-24T17:58:25.000408Z

It’d be more ready for inclusion in Onyx too.

lucasbradstreet 2018-04-24T17:59:09.000039Z

Assuming you want to create a PR

sparkofreason 2018-04-24T18:01:06.000597Z

I'll probably do that.

lucasbradstreet 2018-04-24T18:03:13.000640Z

Great

lucasbradstreet 2018-04-24T18:04:33.000008Z

This is something that I’ve wanted for a while as it makes it easily to amortise the cost of state updates at the cost of a little latency.

dbernal 2018-04-24T18:09:24.000403Z

is it normal for an Onyx job to increasingly spawn threads? I'm trying to run a job with production type settings and it keeps running out of memory. Wondering why this could be happening or if I'm setting something wrong

lucasbradstreet 2018-04-24T18:10:46.000574Z

It’ll only increasingly spawn threads if your peers are dying and being rebooted, and even then the old threads should be shutting themselves down. Do you have a log of the behaviour? What’s the general usage pattern: which plugins, are you using windows, do you have large windows, etc?

dbernal 2018-04-24T18:12:44.000571Z

Using the SQL plugin as input and an elasticsearch plugin as output. I'm using a window for aggregation as soon as input tasks are done so it's possible it might be too big?

lucasbradstreet 2018-04-24T18:13:54.000061Z

@dbernal yeah, if you’re building up a lot of intermediate state before flushing / emitting it, then it seems like you could easily run out of memory.

lucasbradstreet 2018-04-24T18:14:48.000498Z

By default all of that state is kept in the memory state store, which is an atom

lucasbradstreet 2018-04-24T18:15:15.000163Z

If you’re using onyx-peer-http-query, you can query the /metrics endpoint and look at the checkpoint_size metric. That’ll give you a good indication of how much state you’re building up over time.

lucasbradstreet 2018-04-24T18:16:22.000309Z

We have an experimental disk backed state store that uses LMDB, but you will likely want to evaluate whether what you’re doing is sensible as there are pretty big costs in managing/checkpointing that much state.

dbernal 2018-04-24T18:24:24.000106Z

@lucasbradstreet ok thank you. I'll go ahead and see if that happens to be the problem. It seems like I might be using Onyx with windows the wrong way as I'm essentially trying to do a group-by-key on a SQL database and then flushing segments to be saved somewhere else. Is it sensible for me to approach that problem using windows or should I rethink my overall design? If what you say about maintaining state happens to be the issue then using windows for long batch aggregations might be the wrong way to go about it

lucasbradstreet 2018-04-24T18:31:39.000776Z

That strategy makes sense, but maybe you’re building up too much state before flushing, or you’re not evicting after the trigger fires? Hard to tell without seeing the code though. You should probably confirm that you’re building up a lot of state via that metric I mentioned first though.

dbernal 2018-04-24T18:32:33.000369Z

ok, I'll go ahead and check that first

dbernal 2018-04-24T20:07:57.000317Z

@lucasbradstreet It turns out that wasn't the issue after all. Looks like I'm creating a new pooled connection in one of my function tasks every time it's called with a new segment. Would it be possible to maintain an active SQL connection pool for a given peer so that it can be reused for new segments being processed by a task?

lucasbradstreet 2018-04-24T20:08:14.000181Z

Aha! Ok yeah that’s bad too :)

lucasbradstreet 2018-04-24T20:08:47.000384Z

You need the connection in your onyx/fn?

dbernal 2018-04-24T20:09:43.000668Z

it needs to reach out to SQL to find some more information out about the segment that came in

lucasbradstreet 2018-04-24T20:10:01.000467Z

If so, you can use a lifecycle with lifecycle/before-task-start to inject the connection into :onyx.core/params, which allows you to pass the connection into your onyx/fn.

lucasbradstreet 2018-04-24T20:10:40.000222Z

We provide a datomic lifecycle that does something similar here: https://github.com/onyx-platform/onyx-datomic/blob/0.12.x/src/onyx/plugin/datomic.clj#L408

lucasbradstreet 2018-04-24T20:10:55.000150Z

You will want to also implement after-task-stop to close it when the peer gets stopped.

dbernal 2018-04-24T20:11:17.000558Z

ah ok, I'll take a look. I had tried to do that but I ran into some issues trying to serialize it. I assumed everything passed as a param needed to be serialized

lucasbradstreet 2018-04-24T20:11:41.000584Z

Ah. You were probably trying to do it though param injection via the task map, which does get serialized.

lucasbradstreet 2018-04-24T20:11:55.000081Z

That’s the easier approach but is not amenable to injecting stuff like connections

dbernal 2018-04-24T20:13:07.000149Z

ok gotcha. I'll give it another shot then. Thanks for all the help!

lucasbradstreet 2018-04-24T20:13:22.000069Z

No problem. Glad you figured it out and it was an easy problem to fix :)