clara

http://www.clara-rules.org/
2018-02-28T13:51:36.000604Z

I'm doing some stress testing of clara and i'm seeing some weird behaviour

2018-02-28T13:51:57.000427Z

the background is that I want to build a sort of access control engine

2018-02-28T13:52:31.000296Z

there are 3 fact types: a consent, a processor and a processingallowed fact

2018-02-28T13:52:53.000516Z

there's just one rule which inserts processingallowed if there is sufficient amount of concent for a given processor

2018-02-28T13:53:24.000666Z

when I insert 1 million random consents, firing the rules takes just a few seconds

2018-02-28T13:53:47.000085Z

if I insert 1 million more into the previous session, firing the rules takes over an hour (I never waiting long enough for it to complete)

2018-02-28T13:54:08.000019Z

if I insert 2 million random consents in an empty session, firing the rules still only takes a few seconds

2018-02-28T13:56:22.000331Z

while this isn't a truly realistic scenario (updates to consent would arrive in batches of a few 100 once initialised), I'm still kind of surprised by this behaviour

2018-02-28T14:57:16.000004Z

@wdullaer that is interesting. it would seem that the insert 1 mil then 1 mil later is triggering some heavy “retraction” work

2018-02-28T14:57:38.000250Z

However, with that sort of time being taken up, it’d be good to have some profiling

2018-02-28T14:58:11.000106Z

there is no rule which retracts facts

2018-02-28T14:58:19.000296Z

Unless you can reproduce it in a small case, which may not be easy. I wonder if you could do some CPU sampling snapshot while it is running excessively via a tool like “visual vm” or similar

2018-02-28T14:58:27.000636Z

I haven't found a good way to do detailed profiling / tracing yet, so all suggestions are welcome 🙂

2018-02-28T14:58:58.000752Z

Well, retracts can happen due to logical inserts (normal insert! in Clara). Also, accumulators may be updated and have to retract previous results. There are quite a few reasons to have “implicit” retracts happening in working memory

2018-02-28T14:59:20.000137Z

make sense

2018-02-28T14:59:31.000143Z

I am using an all accumulator in the rule

2018-02-28T14:59:38.000071Z

VisualVM is free. I’ve used it mostly just since it was good enough for seeing many things like this.

2018-02-28T14:59:51.000242Z

but the funny thing is that it works just fine when doing 2 million in bulk on an empty session

2018-02-28T15:00:00.000537Z

I'll give that a spin

2018-02-28T15:00:07.000907Z

That isn’t entirely surprising to me - if you’ve found a bad retraction sort of scenario

2018-02-28T15:00:17.000940Z

In bulk, Clara does a lot of work in batches

2018-02-28T15:00:43.000334Z

Which has one benefit of avoiding “rework” - aka retractions/insertions to update “incorrect” earlier working memory states

2018-02-28T15:01:18.000489Z

I'm comparing 2 million in 1 go, vs 2 batches of 1 million

2018-02-28T15:01:33.000894Z

(I was expecting memory usage to be a bottleneck, but it's doing pretty fine on that front)

2018-02-28T15:01:41.000924Z

anyway, visualvm it is 🙂

2018-02-28T15:01:59.000253Z

So running with the VisualVM idea. 1) Download it, 2) open it, 3) find your JVM process (it has a list on in the UI) running the bad performance case with

2018-02-28T15:02:45.000077Z

(4) click “Sampler” tab (5) click “CPU” (6) start running whatever is taking a long time

2018-02-28T15:03:08.000715Z

(7) let it run for a bit, perhaps like a minute or so, then click “snapshot”

2018-02-28T15:03:39.000700Z

👍

2018-02-28T15:05:24.000807Z

Once you have a snapshot, you can dig around the callstack to see where the time is. If you want help with that, you can perhaps explain what is taking such a long time in the Clara functions of the callstack and perhaps I (or someone) will know enough by that. Or you can you can “export to” via an icon in the UI and it’ll give you an nps file to share. If you have proprietary stuff (I don’t think much proprietary info shows up in these though, but still.) you don’t want to show, you could try to just find good screenshots that cut the callstack down to just the Clara functions parts.

2018-02-28T15:05:40.000532Z

I have to run now, but I'll do the test later today and post back with the results

2018-02-28T15:05:45.000131Z

thanks a lot already!

2018-02-28T15:05:53.000169Z

It’s probably not a bad idea to get some familiarity with digging through some cpu sampling snapshot results though. Can definitely be useful for finding bottlenecks quickly

2018-02-28T15:05:59.000779Z

No problem

2018-02-28T16:21:39.000062Z

@wdullaer are you inserting the facts like this:

(-> session (insert first-fact) .... (insert millionth-fact)) 
or like this:
(insert session [first-fact .. millionth-fact] 
It seems odd to me that simply breaking a single batch into two would have such a huge performance impact - OTOH smaller batching could do it. Arguably Clara should try to batch the first case together anyway, but it doesn’t. See how facts are added into the “pending-operations” on insertion at https://github.com/cerner/clara-rules/blob/0.17.0/src/main/clojure/clara/rules/engine.cljc#L1818 and removed from the queue in fire-rules at https://github.com/cerner/clara-rules/blob/0.17.0/src/main/clojure/clara/rules/engine.cljc#L1870

2018-02-28T16:26:16.000215Z

If a simple division into two batches really does increase runtime like that I’d be curious to see a reproducing case and if there is some performance optimization we could do - at first glance it doesn’t seem like it should happen, but as always it is hard to say without a concrete case. On profiling, seconding mikerod’s suggestion of VisualVM. Another possible route would be to look at the trace of the rule execution for a smaller number of facts - manual examination wouldn’t be practical for millions of facts, but a batch of 2 facts vs 2 batches of 1 fact each might show insights. I’d probably start with VisualVM though especially since traces generally require some understanding of Clara’s internals to yield much insight. https://github.com/cerner/clara-rules/blob/0.17.0/src/main/clojure/clara/tools/tracing.cljc

2018-02-28T16:32:00.001022Z

@wparker sounded like he had batches of 1 million. so not really “small” hah

2018-02-28T16:32:33.000750Z

however, I was thinking that the scenario was “insert 1 million, then fire”, later on, “insert 1 million, then fire”

2018-02-28T16:32:45.000816Z

Trying to reuse and old session state basically

2018-02-28T16:33:08.000429Z

It certainly sounds like the batching behavior was critical to the performance of the scenario they were having.

2018-02-28T16:34:00.000166Z

It seems that there is potentially found some bad performing case in terms of updating working memory state that this uncovered. Also, I think I assumed this was on the JVM. Not sure if that was actually stated before. I’m guessing so with the millions of facts and the discussion around using visualvm

2018-02-28T16:34:42.000052Z

My point was that I think it would be possible to think the batch was a million, but actually end up being one at a time when rules fire, so just wanted to make sure that it was batched

2018-02-28T16:34:53.000299Z

yeah, true

2018-02-28T16:35:23.000635Z

Perhaps Clara should probably take (-> session (insert A) (insert B) fire-rules) and turn it into a single batch - but I think it is 2 at present

2018-02-28T16:35:44.000137Z

Yeah

2018-02-28T16:36:00.000904Z

If they were inserting a million things though, they probably did it in a single insert

2018-02-28T16:36:14.000841Z

otherwise they’d have to do something like (reduce insert session million-things)

2018-02-28T16:36:39.000642Z

I don’t think they’d write out a million (insert thing1) by hand 😛

1
2018-02-28T16:36:57.000465Z

anyway a VisualVM snapshot would be informative, spending time in memory updates would be a possible culprit

2018-02-28T16:40:54.000378Z

yeah, I thought it would be good to se

2018-02-28T16:40:59.000936Z

hopefully some can be obtained

2018-02-28T16:41:34.000551Z

I agree in general though that Clara should just batch the work from multiple insert calls

2018-02-28T16:41:40.000813Z

Should batch as much as possible

alex-dixon 2018-02-28T16:42:11.000247Z

Would insert-all be preferable here?

2018-02-28T16:43:39.000025Z

If you have a big collection of things to insert, yes insert-all

2018-02-28T16:43:52.000834Z

or you call insert via varargs if you have compile-time known ones

2018-02-28T16:44:05.000289Z

You could also do apply insert, but no reason too since insert-all exists

2018-02-28T16:44:32.000851Z

Batched propagation through the rete network can be a big performance boost

2018-02-28T16:44:41.000440Z

So it’s best to be as batch’ful as you can be

alex-dixon 2018-02-28T16:46:05.000641Z

A while ago I think I noticed retracts were slow. Not the result of insert logical though

2018-02-28T16:46:34.000099Z

to clarify, external retract or RHS retract!?

2018-02-28T16:48:39.000685Z

For no good reason (that I’m aware of) there isn’t a retract-all for external retraction right now. So you should use apply retract if you have a batch there.

2018-02-28T17:10:07.000493Z

that's exactly it

2018-02-28T17:12:28.000359Z

here's roughly what's happening:

2018-02-28T17:13:30.000695Z

(->session 
    (insert-all (generate-consents 1000000 test-purposes test-attributes))
    (fire-rules)
    (insert-all (generate-consents 1000000 test-purposes test-attributes))
    (fire-rules))

2018-02-28T17:13:39.000738Z

with some log statements to see where I am

2018-02-28T17:13:42.000208Z

this is on the jvm

2018-02-28T17:14:20.000460Z

I still have to take the snapshot

2018-02-28T17:14:40.000427Z

maybe I'll try to reduce the code to the size of a gist so I can share it

2018-02-28T17:16:24.000377Z

> maybe I’ll try to reduce the code to the size of a gist so I can share it Yeah, that is best-case scenario

2018-02-28T17:32:10.000410Z

I'll report back tomorrow

✅ 2
alex-dixon 2018-02-28T18:56:19.000444Z

External

2018-02-28T19:05:55.000352Z

So yep, apply retract

2018-02-28T19:06:20.000020Z

(we should get a retract-all added for convenience and symmetry with insert

alex-dixon 2018-02-28T19:06:40.000589Z

Faster than doseq?

2018-02-28T19:07:12.000544Z

I don’t know how you would doseq external retract - returns a new version of the session, not mutation

2018-02-28T19:07:48.000011Z

If you have a collection of facts to externally retract Instead of (reduce retract session facts) do (apply retract session facts)