asami

Asami, the graph database https://github.com/threatgrid/asami
jfntn 2020-12-05T20:21:06.213600Z

👋 was wondering if asami had support for datalog rules?

quoll 2020-12-05T20:21:24.213900Z

It’s done through Naga

quoll 2020-12-05T20:21:49.214400Z

Asami was originally part of Naga. It got extracted as a graph DB as it gained more features

quoll 2020-12-05T20:22:06.214600Z

https://github.com/threatgrid/naga

jfntn 2020-12-05T20:22:30.215300Z

Nice, are there any examples on how to integrate the two?

quoll 2020-12-05T20:29:18.215700Z

By default, it’s going to use Asami

quoll 2020-12-05T20:29:36.216200Z

It takes a little effort to ask for Datomic, though it works with that as well

quoll 2020-12-05T20:32:08.216700Z

It’s not really a command line tool. It’s designed as an API

quoll 2020-12-05T20:36:59.217600Z

The Command line program was built as an example of how to use it. Have a look at the run-all function for this: https://github.com/threatgrid/naga/blob/main/src/naga/cli.clj#L74

jfntn 2020-12-06T15:37:06.222600Z

Thanks for the detailed explanation! What’s the best way to learn about what’s supported in rules? I was looking at the schemas in zuko but I’m still not quite sure if what I’m trying to do is supported.

jfntn 2020-12-06T15:38:41.224700Z

(I was looking to implement a rule whose head would increment a counter matched in the body, but not sure those eval style patterns are valid)

quoll 2020-12-06T15:40:52.227700Z

Zuko was just a place to put internal libraries that were shared between Naga and Asami so that there wasn’t an interdependency between them (no one wants to load up Asami if they’re doing rules on Datomic)

quoll 2020-12-06T15:41:13.228100Z

That is possible, actually

quoll 2020-12-06T15:44:21.232200Z

Using the rule macro (so not in predicate form): [?entity :value’ ?new-v] :- [?entity :type “counter”] [?entity :value ?v] [(inc ?v) ?new-v)]

quoll 2020-12-06T15:49:36.232400Z

Pulling that apart: [?entity :type "counter"] [?entity :value ?v] This part just finds the entity (I just chose to use “type counter”, but you may have an ID, or something else), and pics up the numerical value in ?v [(inc ?v) ?new-v] This increments the value of ?v and binds it to ?new-v Then the head of the clause uses: [?entity :value' ?new-v] Note how the :value property is annotated with a trailing quote character? This means “update” the property, rather than just asserting it. If it were simply asserted, then the entity would have 2 :value attributes (which is legal), and you’ve have to look for the max of them. But by using the update annotation it will remove the old value and add the new one

quoll 2020-12-06T15:52:57.232600Z

The problem here is how often the rule runs! This rule will be triggerable by: • another rule setting an entity to be a counter (unlikely) • an entity being given a new value

quoll 2020-12-06T15:53:56.232800Z

Updates are breaking the seminaïve reasoning mechanism, so if anyone else updates their :value then it won’t retrigger this rule (hey, that’s lucky!)

quoll 2020-12-06T15:54:39.233Z

(when I said “entity being given a new value” I meant a value that didn’t exist before. Updates won’t affect it)

quoll 2020-12-06T15:55:54.233200Z

In general, you could only expect that rule to be run one time per engine-invocation. I’m just explaining the edge cases to cover my bases 😉

Steven Deobald 2020-12-05T20:56:02.218500Z

🌊

jfntn 2020-12-05T21:01:23.218600Z

Interesting, do I understand correctly that in this examples the idb facts derived by naga get transacted back to asami and will be available for both a) querying idb facts directly in asami, and b) deriving more facts in another pass of naga?

jfntn 2020-12-05T21:13:14.218800Z

Also curious about time-space complexities, I assume each naga execution will compute rules against all facts in asami? I.e. there’s no rete-style network that would only consider facts inserted since the previous run?

quoll 2020-12-05T21:23:43.219Z

On the first question… a) The data is transacted back into Asami and can be accessed via querying. b) It iterates until completion. You won’t get any more facts unless you input more data. That second one raises an important issue: Naga extends Datalog to allow declarations of new entities. This means that you can shoot yourself in the foot if you create a loop. For instance, don’t say:

Parent(X,Y) :- Person(X).
Person(Y) :- Parent(X, Y).
i.e. All people have parents, and all parents are people. This will result in an OOM

quoll 2020-12-05T21:25:29.219200Z

> I assume each naga execution will compute rules against all facts in asami? I.e. there’s no rete-style network that would only consider facts inserted since the previous run? This is incorrect. There is a rete-style network. However, instead of populating each node of the network with data that flows through it, it uses the indexes to do this work instead. i.e. The memory required for each node in the network is actually a slice out of one of the indexes

jfntn 2020-12-05T21:41:41.219400Z

> There is a rete-style network. However, instead of populating each node of the network with data that flows through it, it uses the indexes to do this work instead Very cool, is this what the queues and constraint data achieve with the dirty tracking?

quoll 2020-12-05T23:02:23.219600Z

I gather that you’re talking about the code in naga.engine?

quoll 2020-12-05T23:10:35.219800Z

Quick overview… 1. The initialization find the connection between productions (or the heads) of rules, and the dependencies (or the bodies) of other rules. 2. Every rule get scheduled by putting them on the queue, with each element of the bodies marked as “dirty” 3. The engine picks up the head of the queue, and checks each element in the body to see if they’re dirty. If the queue is empty, then the engine is finished. 4. If everything is clean, then the engine moves onto the next rule. 5. Otherwise, the engine checks the dirty elements to see if the data they rely on has actually changed (“dirty” just means that they potentially changed). 6. All “dirty” elements that hasn’t experienced a change will be marked as clean. If there have been no changes, then move onto the next rule. 7. If there were any changes, then execute the rule, inserting the productions. 8. Mark all downstream dependencies as dirty, and schedule those rules by adding them to the queue. 9. Return to step 3

quoll 2020-12-05T23:16:16.220100Z

There are a couple of features not being used… • Salience. “Way back when…” I was thinking that it would be nice to offer some rules priority over other rules. This is the “salience” value. Higher values get inserted into the queue before anything of a lower value. This works, but for now everything has the same salience value, so it just works as a FIFO. • Productions. At the moment the only kind of production is to insert data. The plan was always to offer a function that could be called with the data that led to the production. The most flexible thing here would be a message queue or a pub/sub on Redis. This should still happen, but I’m still waiting for someone to ask for it 🙂

quoll 2020-12-05T23:17:52.220300Z

I gave a description of how all of this works at Clojure/conj 2016: https://youtu.be/8rRzESy0X2k?list=PLZdCLR02grLofiMKo0bCeLHZC0_2rpqsz