clara

http://www.clara-rules.org/
mauricio.szabo 2019-01-10T15:05:40.042100Z

Does clara supports creation of a session and firing rules in different threads? I'm having an exception trying to do it:

Exception in thread "async-dispatch-24" java.lang.IllegalArgumentException: find not supported on type: clara.rules.engine.Token$reify__26254

2019-01-10T15:28:28.042700Z

@mauricio.szabo I’d expect it to be ok. Can I see (1) more stack trace and (2) an (perhaps simplified) example of how you are setting up for this?

eraserhd 2019-01-10T15:40:23.043300Z

Hmm, this seems to say that we have retractions, right? (We shouldn’t.):

ethanc 2019-01-10T15:47:22.043600Z

I think I have seen this behavior before, but i need to verify my suspicions before I could say for sure.

eraserhd 2019-01-10T15:48:52.043800Z

We are currently investigating. What should we look for?

ethanc 2019-01-10T15:55:57.044Z

If it is what i'm thinking, its a scenario where facts are inserted but have no corresponding nodes to activate. Thus meaning all work in prior rules was "unnecessary" and can be retracted. It sounds weird making this comment, but i think it relates to keeping the memory as low as possible... Again, i need to verify this and i am writing a simple set of rules to test it

eraserhd 2019-01-10T15:58:18.044200Z

Ahhh, interesting. We probably do have a lot of facts that aren't used by rules.

eraserhd 2019-01-10T16:10:07.044400Z

we've verified that we're not intentionally retracting any datoms in this batch.

eraserhd 2019-01-10T16:11:27.044600Z

scratch that, no we're seeing something weird

2019-01-10T16:21:53.044800Z

alpha-retract happens from the truth maintenance as well.

👍 1
2019-01-10T16:22:26.045Z

so using logical insert (the default) can/will lead to them as the TMS keeps the working state’s “truth” consistent

2019-01-10T16:23:32.045200Z

Seeing your screenshot there though - relative to the other times I see - it looks like a lot of time is being spent in equality comparisons.

2019-01-10T16:24:07.045400Z

However, the memory updating functions comparing tokens and “remove first of each” do not seem to be as high.

2019-01-10T16:24:33.045600Z

I’d have to expand on the callstack trees of these a bit more perhaps to really know what’s going on there though

eraserhd 2019-01-10T16:25:13.045800Z

@ethanc we made a mistake in a parameter in the call we are timing. After fixing, we don't see a high alpha-retract time.

👍 1
mauricio.szabo 2019-01-10T16:32:57.046200Z

Ok, I'll run some code here and then give you the results

mauricio.szabo 2019-01-10T16:33:41.046400Z

@eraserhd what are you using to profile Clara? I've tried to use VisualVM but my code hang even with small group of facts...

eraserhd 2019-01-10T16:34:16.046600Z

I'm using the CPU sampler in VisualVM. I've never successfully used the profiler.

eraserhd 2019-01-10T16:34:45.046800Z

The profiler always hangs for me no matter what I try.

2019-01-10T16:37:25.047Z

Yeah, VisualVM CPU sampler is typically pretty useful

2019-01-10T16:37:47.047200Z

I don’t try to use the more invasive profiler there - I don’t even know that I’d call that “recommended”

2019-01-10T16:38:25.047400Z

@eraserhd the example you showed did not look to have a high time spent in alpha-retract, so confused why you considered it a problem there

2019-01-10T16:38:36.047600Z

Or perhaps your screenshot was just not indicative of what you are looking at.

2019-01-10T16:39:38.047900Z

or maybe I was reading this wrong. I tend to look at the callstack based view - so nvm I think

ethanc 2019-01-10T16:40:08.048100Z

yeah, @mikerod thats seems to be a hotspot view

2019-01-10T16:41:12.048500Z

yep, just noticed. Harder to read that to me. but Yeah, that means it is the “entry point” to a lot of the stuff taking the higher self-times below

eraserhd 2019-01-10T16:41:27.048700Z

Let me post a corrected screenshot...

2019-01-10T16:41:29.048900Z

I prefer the other view to see a direct hierarchical/callstack visual on it

2019-01-10T16:42:19.049100Z

just easier to pinpoint the more specific path to trace through

eraserhd 2019-01-10T16:52:00.049300Z

2019-01-10T16:55:09.049600Z

@eraserhd It’s a bit cutoff and always a challenge to really know. From this, it seems to suggest that you have a rule with a negation - :not that is initially true due to no matches at some point in fire-rules. Later, it gets matches so becomes false. Due to order of rule firing, when it is becomes false, there was already staged activations of the rule from when it was true that need to be discarded now that that is no longer the case - via the truth maintenance

2019-01-10T16:55:36.049800Z

The activations being discarded may have large tokens as one possibility - perhaps there is a large data gathered via an accumulator in this rule?

2019-01-10T16:56:14.050Z

Alternatively, I may have to see more stack to know (or can’t really know), the rule had a lot of combinations of matches resulting in many activations that now all have to be discarded

eraserhd 2019-01-10T16:57:41.050200Z

OK, my reading also. We're going to try to isolate which rule(s) cause it.

2019-01-10T16:58:42.051200Z

Maybe you can find one of that structure. The negation May be your actual last condition of the role, although that isn’t strictly true in all cases I believe.

2019-01-10T16:59:25.052400Z

May be able to do some listener tracing to find more details too. I’d have to play around to come up with a useful example of how to use it in this sort of case.

ethanc 2019-01-10T17:08:55.052600Z

To my original comment, the scenario i was remembering was something like:

(defrule simple-rule
 [A (= ?f f)]
 [?bs <- (acc/all) :from [B (= f ?f)]] 
 =>
(insert! (->C)))
This does cause an retract on the production node, but thats due to new facts being add. So not as i remembered.

2019-01-10T17:29:33.052800Z

Are you creating more than one session at a time? The compiler uses eval calls internally, and the Clojure compiler itself isn’t thread-safe.

👍 1
2019-01-10T17:37:48.053200Z

yeah, this is why I’m looking for some example of how it is being tried

2019-01-10T17:39:49.053400Z

Regarding tracing - I wrote some helpers at one point for this that I really need to clean up some and push to master - but basically you get an entry in a list for every interaction in the rules engine, most of which have a node ID attached. You can then use that ID to look up the node in the session and see what rules/queries it corresponds to. So if you see an ID come up a lot those rules are doing a lot of work. Since it is ordered, you can investigate in more complex ways, but often a “lots of stuff is happening in this rule” can be enough to narrow things down enough

👍 1
2019-01-10T17:45:30.053700Z

would be good to get some doc examples of this sort of thing

👍 2
mauricio.szabo 2019-01-10T18:08:04.054100Z

@wparker I've tried creating more than one, and don't creating it, and they both gave me the same error

2019-01-10T18:10:38.054300Z

@mauricio.szabo can you give at least an outline of hwo you are doing this

2019-01-10T18:10:55.054500Z

like how you create the session, and then how you distribute it to other threads

2019-01-10T18:11:14.054700Z

I’m guessing the other threads have the same classloaders too?

2019-01-10T18:11:22.054900Z

an example would make that easier to track through

mauricio.szabo 2019-01-10T18:12:04.055100Z

Okay, just give me some minutes 🙂

2019-01-10T18:13:49.055300Z

oh, no problem. not time sensitive (for me at least) hah

mauricio.szabo 2019-01-10T18:53:40.055700Z

@mikerod I'm running the code in the following way:

(let [session (-> (mk-session 'events-reports.reports.deferred
                              'events-reports.reports.normalize-data
                              :fact-type-fn :fact))
      res (jdbc/query db "SELECT * FROM my_table WHERE ...")
      location-ids (map :id res)]

  (doseq [id location-ids]
    (async/go
     (println "Emitting for" id)
     (jdbc/with-db-connection [db @smart/db]
        (let [facts (search-facts db id)
              rows (-> session
                       (insert-all facts)
                       (fire-rules)
                       (query detailed-anual-report)
                       (->> (map make-detailed-anual-row)))]
         (println "Saving for" id)
         (save-rows rows))))))

mauricio.szabo 2019-01-10T18:54:53.055900Z

(it's a simplified version. search-facts is simply a bunch of JDBC queries)

mauricio.szabo 2019-01-10T19:05:54.056100Z

Exception in thread "async-dispatch-4" java.lang.IllegalArgumentException: find not supported on type: clara.rules.engine.Token$reify__26254
        at clojure.lang.RT.find(RT.java:863)
        at clojure.core$select_keys.invokeStatic(core.clj:1539)
        at clojure.core$select_keys.invoke(core.clj:1532)
        at clara.rules.engine$propagate_items_to_nodes$fn__26782.invoke(engine.cljc:154)
        at clara.rules.platform$group_by_seq$fn__26030.invoke(platform.cljc:39)
        at clojure.core.protocols$naive_seq_reduce.invokeStatic(protocols.clj:62)
        at clojure.core.protocols$interface_or_naive_reduce.invokeStatic(protocols.clj:72)
        at clojure.core.protocols$fn__7852.invokeStatic(protocols.clj:169)
        at clojure.core.protocols$fn__7852.invoke(protocols.clj:124)
        at clojure.core.protocols$fn__7807$G__7802__7816.invoke(protocols.clj:19)
        at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
        at clojure.core.protocols$fn__7835.invokeStatic(protocols.clj:75)
        at clojure.core.protocols$fn__7835.invoke(protocols.clj:75)
        at clojure.core.protocols$fn__7781$G__7776__7794.invoke(protocols.clj:13)
        at clojure.core$reduce.invokeStatic(core.clj:6748)
        at clojure.core$reduce.invoke(core.clj:6730)
        at clara.rules.platform$group_by_seq.invokeStatic(platform.cljc:38)
        at clara.rules.platform$group_by_seq.invoke(platform.cljc:27)
        at clara.rules.engine$propagate_items_to_nodes.invokeStatic(engine.cljc:154)
        at clara.rules.engine$propagate_items_to_nodes.invoke(engine.cljc:147)
        at clara.rules.engine.LocalTransport.send_elements(engine.cljc:176)
        at clara.rules.engine.AlphaNode.alpha_activate(engine.cljc:538)
        at clara.rules.engine$flush_updates$flush_all__26940.invoke(engine.cljc:269)
        at clara.rules.engine$flush_updates.invokeStatic(engine.cljc:277)
        at clara.rules.engine$flush_updates.invoke(engine.cljc:252)
        at clara.rules.engine$fire_rules_STAR_.invokeStatic(engine.cljc:1825)
        at clara.rules.engine$fire_rules_STAR_.invoke(engine.cljc:1720)
        at clara.rules.engine.LocalSession.fire_rules(engine.cljc:1909)
        at clara.rules$fire_rules.invokeStatic(rules.cljc:44)
        at clara.rules$fire_rules.invoke(rules.cljc:29)

2019-01-10T19:33:27.056600Z

thanks for the details

2019-01-10T19:33:37.056800Z

nothing immediately is striking me as odd there

2019-01-10T19:34:18.057Z

looking some at what could be oging on

2019-01-10T19:35:21.057200Z

This type clara.rules.engine.Token$reify__26254 is weird to me

2019-01-10T19:35:30.057400Z

I’m not aware of any reify going on in relation to tokens

2019-01-10T19:35:54.057600Z

which, I’d think that’s what produces a name like that

2019-01-10T19:37:02.057800Z

actually, must be thinking about that wrong, since you can’t reify a clara.rules.engine.Token, it’s a record

2019-01-10T19:39:19.058Z

that is just relating to a reify happening from within the clj record impl I see - disregard those comments then,

mauricio.szabo 2019-01-10T19:41:11.058200Z

Do you want me to open an issue?

2019-01-10T19:45:30.058400Z

I suppose we can for tracking

2019-01-10T19:45:47.058600Z

I think I’ve recreated it to some minimal degree as far as what could cause that type of exception

2019-01-10T19:46:02.058800Z

(find (.getLookupThunk (map->Token {}) :bindings) :x)

2019-01-10T19:46:21.059Z

I believe the getLookupThunk can happen via clj keyword lookup compile-time inlining stuff

2019-01-10T19:46:56.059200Z

so this ends up coming from a call (:bindings <token-obj>) in the fn propagate-items-to-nodes

2019-01-10T19:51:36.059400Z

Also, can you give me your clj/java/core.async versions?

mauricio.szabo 2019-01-10T20:02:07.059600Z

Clojure version: Clojure 1.9.0 JVM version: Eclipse OpenJ9 VM 1.8.0_181-b13 Core.Async version: 0.4.490

mauricio.szabo 2019-01-10T20:02:15.059800Z

Also, I've opened an issue

2019-01-10T20:47:07.060Z

thanks, I seeit

2019-01-10T20:47:57.060200Z

this problem is weird. asked a bit on #clojure-dev to see if there were any thoughts on how you could end up with a keyword callsite messing up like this. The only possible thing that comes to mind so far is just making sure that you aren’t somehow requiring clara.rules.engine (or other ns’s) multiple times

2019-01-10T20:48:15.060400Z

so somehow getting different versions of the same classes loaded into successive clj dynamic class loaders

2019-01-10T20:56:56.060600Z

It sort of looks to me now that you may have clojure.rules.engine being loaded more than once, and causing a duplicate class of clojure.rules.engine.Token being created.