clara

http://www.clara-rules.org/
2018-03-01T08:33:46.000058Z

So I fired up visualVM

2018-03-01T08:34:16.000328Z

and it seems to be spending all the time in clara.rules.engine$retract_accumulated.invoke()

2018-03-01T08:35:04.000110Z

which lines up with the theory that implicit retractions are causing this

2018-03-01T08:36:02.000430Z

I can share the snapshot if that would be interesting for people

2018-03-01T08:36:20.000283Z

I'll spend some time trimming the example down to a bare bones app I can put in a gist

1👍
2018-03-01T12:49:19.000178Z

the gist can be found here: https://gist.github.com/wdullaer/cecf88b3266ba0ac90b4f060eefe5208

2018-03-01T12:49:59.000339Z

when making this I noticed that the 2 million case is not exactly the same as 2x 1 million, because of how the attributes are randomly generated

2018-03-01T12:50:17.000241Z

2x 1 million will generate much more consent for a given personId

2018-03-01T12:50:27.000159Z

(giving the accumulator more work)

2018-03-01T12:50:45.000395Z

2x 1 million goes fast enough if I make sure they generate distinct sets

2018-03-01T12:51:18.000143Z

however 1x 2 million tweaked to have the same overlap in consent as 2x 1 million still goes an order of magnitude faster

2018-03-01T12:52:13.000098Z

I know clara doesn't deduplicate facts, but I think in this case that would prevent me from running into this scenario

2018-03-01T14:10:08.000639Z

@wdullaer thanks for the details

2018-03-01T14:10:58.000690Z

I will take a look. It may be about an hour or so though. Doing a few other things at the moment.

2018-03-01T14:11:09.000206Z

no worries, take all the time you need

2018-03-01T14:11:21.000582Z

I can supply the vm snapshot as well (just need to find a quick place to upload it)

2018-03-01T14:11:46.000426Z

Yeah, if you have one you can share, that’d be good to take a glance at.

2018-03-01T14:11:50.000447Z

How to share hmm

2018-03-01T14:13:06.000036Z

I'll put it into dropbox, just a sec

2018-03-01T14:14:34.000632Z

ok

2018-03-01T16:23:28.000585Z

@wdullaer I’ve only got to look at this briefly so far today

2018-03-01T16:25:29.000582Z

It seems like these collections created via acc/distinct would be fairly large. Also, there are many of them after the first 1mil facts. There’d be a big acc/distinct collection associated with each personId + purpose grouped Consent facts. When the next 1mil facts come in, I believe they will contribute to many (or perhaps even all) of those same big collections.

2018-03-01T16:26:14.000125Z

The work from the first 1 mil facts is stored in working memory. Clara builds the new collections on the 2nd wave of inserts. It then must remove the old accumulated work it did from working memory - in order to replace it with the updates.

2018-03-01T16:26:51.000144Z

I think the fact that there are many of these large collections in working memory is making it get really expensive to remove them all from working memory. Clara tries to be efficient in finding and removing things in memory. I think huge collections may be a pitfall.

2018-03-01T16:28:19.000796Z

I’d like to look a little deeper at it, but haven’t been able to yet. I’m not sure what a reasonable workaround may be, other than inserting everything in one batch (as you’ve found).

2018-03-01T16:28:37.000405Z

It’s an interesting case to explore some more for sure.

2018-03-01T17:41:42.000602Z

thanks for the input

2018-03-01T17:41:56.000349Z

I can see how this is a worst case scenario

2018-03-01T17:43:03.000104Z

we probably won’t see this amount of massive updates in a real life scenario, and if it does (say a big data migration), we can always consider starting from scratch

2018-03-01T17:43:49.000255Z

it was a bit surprising that updating these structures is a lot more expensive than creating them

2018-03-01T17:44:19.000074Z

if I can help in any way here, just let me know

2018-03-01T17:44:34.000475Z

I have a few other benchmarks I’m going to run over the coming days as well

2018-03-01T17:58:27.000104Z

I need to look deeper at what is happening to have a stronger sense of what could be done

2018-03-01T19:10:39.000179Z

I’ve logged https://github.com/cerner/clara-rules/issues/385 for this @mikerod @wdullaer , I’d suggest that we post findings there when we have them

2018-03-01T19:15:10.000398Z

From the snapshot it looks like the memory is the bottleneck, which isn’t too surprising. From my first glance over it I suspect there are some performance optimizations we can make for cases like this, but it may be a bit before I have time to write my thoughts down in a sufficiently articulate form. The benchmark and snapshot are really helpful and make diagnosing things like this much easier - thanks for that. It is useful to have benchmarks of realistic rule patterns that stress-test Clara.

2👍