Hi! So we're debugging a performance issue, and it seems like, in our example, fast-token-compare taskes 142 seconds of self time, while PriorityQueue.remove() takes 139 seconds of self time.
It's not clear from VisualVM's sampling, but it looks like this referring to the PriorityQueue.remove(Object) method, which is consistent with the fast-token-compare times.
This method is linear for priority queue. It should be possible to wrap PriorityQueue to make it log(n) by checking when values are extracted. Is there some reason this won't work?
Alternately, are we doing something wrong to get runtime dominated by PriorityQueue.remove(Object)?
@eraserhd > Alternately, are we doing something wrong to get runtime dominated by PriorityQueue.remove(Object)? Seems like there is a lot of thrash of rules getting activated, but then subsequently “deactivated”
> This method is linear for priority queue. It should be possible to wrap PriorityQueue to make it log(n) by checking when values are extracted. Is there some reason this won’t work? I don’t understand what this is and how it’d make it log(n).
and lastly, fast-token-compare
may be slowing down due to large tokens in these rules that are struggling with if they are satisfied/not satisfied
perhaps from large data accumulated via something like (acc/all)
’s or similar
@mikerod if remove/1 stores objects in a hash set, then remove/0 and add/1 skip removed tokens, remove/1 becomes O(1) and I think remove/0 is still O(log N)
We mostly avoid hashing objects
Dijkstra's algorithm is frequently implemented like this, because removing from the priority queue is expensive.
because then you hit perf barriers with slow hash code impl’s
and often that comes down to the facts being used in the session
in the particular case where the prio queue is used, the object to remove is a new instance too, so wouldn’t be able to have a reusable cached hashcode
Hmm.
not to say it isn’t worth consideration at any time, just saying hashing isn’t always as cheap as it seems
fast-token-compare
avoids deep equals most of the time too
it uses identity-based equivalence as a first-pass, so via identical?
in clj semantics
and typically that is all that is needed to find things
during the fire-rules
loop at least
I’ve seen this problem of large tokens being slow though
the activation stuff is what uses the prio queue, it has to do with how when rules are satisfied, they are “staged” as available to have their RHS action executed at some future time
it is a prio queue since you can have precedence on which rules should be staged ahead of one another - e.g. :salience
I think there are some related issues out there (though not explained the same) https://github.com/cerner/clara-rules/issues/385
well, that’s the only one I see at the moment
I know I’ve messed around with this sort of problem before though, not sure I ever had a good answer
sometimes you could just avoid a rule setup that is causing it though
Those stats look very much like ours.
your hashing thing could be attempted at least, just have to consider a lot of things if it ends up looking like a good path
the issue above talks about a “distinct accumulator”, I think that may be overly specific
probably a more general problem
and I’m fairly sure you have big tokens involved
a big token would come from a rule left-hand side (LHS) that had a lot of facts associated with a single activation of the rule
Take for example something like
(r/defrule big-activation-maker
[?xs <- (acc/all) :from [X]]
=>
(do-stuff ?xs))
now if you can get into a situation where the set of all X
changes a lot
and there are a lot of X
s in working memory overall
then this rule may keep activating/cancelling/re-activating, as the set of X
grows - and each time there is a big token comparison to do these removals
in a rule as simple as above, it typically wouldn’t thrash though
Most of the time, the network’s “batched propagation” semantics would lead you to a situation where all the X
are around in one go with only 1 rule activation resulting
but certain setups can cause thrashing when more conditions and things are involved
perhaps the distinct accumulator case was a situations like this one, I just don’t remember
This sounds like there is one thing we can write as a custom accumulator to avoid retriggering a whole slew of rules.
perhaps
or play with salience
, though as a last resort
yeah, sometimes that’s useful
sometimes you can look at the rules and sort of see where an accumulator rule may be getting retriggered in multiple phases of insert
and try to prevent that, there is salience to help, as well as just potentially logical structuring