rdf

2021-02-15T09:38:53.015200Z

@simongray Naga has a data syntax for rules; it’s not the datomic syntax but instead looks to be a direct translation of the datalog horn clause syntax naga supports: https://github.com/threatgrid/naga/blob/964a71e300c947ead28f1b344ed9bccdbf97ff9b/test/naga/test_rules.cljc#L21 I’m not familiar with datomic rules, and have only played with naga briefly but after looking at them again it looks like the main differences are: 1. (I think) naga rules must be either unary or binary predicates. 2. Naga rules are materialised once at startup; datomic’s are AFAICT materialised at query time. I’m guessing datomics rules through a process similar to macro-expansion just expand into datomic query fragments… presumably with gensyms etc to prevent unintended bindings.

🤘 1
simongray 2021-02-15T09:40:52.016200Z

Ooh, that Clojure data Prolog style syntax looks interesting.

simongray 2021-02-15T09:43:38.019600Z

I was asking about Zach Oakes' O'Doyle Rules, though, which uses Datomic-style triples/facts, but otherwise doesn't have anything to do with Datomic.

2021-02-15T10:07:02.022Z

ah sorry — I don’t know… IIRC clara is based on the RETE algorithm… so you might learn more by looking at the differences between rete and datalog. Not sure about O’Doyle, I only heard of it the other day… Looking at the README it implies it’s possible to create infinite loops in it though; which means it may be turing complete. Datalog guarantees termination (and consequently so should Naga).

2021-02-17T14:39:46.068600Z

I wouldn’t doubt that for a second given that RETE is more powerful than datalog; but consequently misses datalogs safety properties (re termination guarantees etc)

quoll 2021-02-17T16:53:30.068900Z

Definitely. It’s essentially a rule scheduler. Datalog imposes policies that ensure boundary conditions on that scheduling

quoll 2021-02-17T17:08:18.069100Z

Actually, that was an interesting thing for me to learn about. I implemented RETE, and I needed to configure what each rule did, and what the scheduling dependencies between the rules were. At that point, I was describing all of that using an RDF graph. But the rule descriptions were in Horn clauses (because that’s how everyone wrote them at the time, and I was used to writing them that way too). I was thinking that instead of writing Horn clauses on paper, and then translating from Horn to the graph representation, and entering all the graph data, then I should write a parser that could read Horn clauses, and convert them into the graph representation automatically. The only problem was figuring out the connections between them. But then I had my epiphany… I knew how to automatically find those dependencies! (I can even tell you which street in Chicago that I was walking along when I realized this).

quoll 2021-02-17T17:09:46.069300Z

So I implemented that, and it worked really well.

quoll 2021-02-17T17:10:18.069500Z

At this point, I realized that due to some of the things I’d done in the implementation of RETE, that I had a fully compliant Datalog engine

quoll 2021-02-17T17:10:34.069700Z

It was a total surprise 🙂

quoll 2021-02-17T17:11:19.070Z

So I put in the effort to learn more about Datalog

quoll 2021-02-17T17:11:31.070200Z

And my implementation experience informed me a lot

2021-02-17T17:15:53.070400Z

Sorry, not following what the epiphany was?

quoll 2021-02-17T17:16:16.070600Z

That I could identify the dependencies between rules automatically.

quoll 2021-02-17T17:16:28.070800Z

Those dependencies are an important part of configuring RETE

2021-02-17T17:16:29.071Z

ok by querying for them?

quoll 2021-02-17T17:19:32.071200Z

No. The form of horn clause is:

pred_y(v1,v2) :- pred_x(v3,v4), ...
If the head of any clause matches any expression in the body of any clause, then this forms a dependency of the latter on the former

2021-02-17T17:20:09.071400Z

yup

2021-02-17T17:20:23.071600Z

(I did a bit of prolog many moons ago)

quoll 2021-02-17T17:21:19.071800Z

“Matching” means that one for each of the 3 values in pred(v1,v2) then either they contain identical concrete values (atoms, numbers, etc), or when either is variable

quoll 2021-02-17T17:21:43.072Z

Note that pred is included in that match, which means that 2nd order expressions are also allowed

quoll 2021-02-17T17:22:30.072200Z

and the simple translation of pred(v1,v2) to [v1 pred v2] simplifies this significantly

quoll 2021-02-17T17:23:19.072400Z

Also, pred(v) => [v :type pred]

2021-02-17T17:23:40.072600Z

yes I saw that in naga

quoll 2021-02-17T17:26:22.072800Z

and if you really want it… pred(v1,v2,v3...) => [v1 pred ?gen2] [?gen2 :tg/first v2] [?gen2 :tg/rest ?gen3] [?gen3 :tg/first v3] …

2021-02-17T17:26:50.073100Z

rdf lists 😬

2021-02-17T17:26:55.073300Z

but yeah makes sense

quoll 2021-02-17T17:27:17.073500Z

slightly different… I don’t terminate them with nil. I found it was just not needed

quoll 2021-02-17T17:27:35.073700Z

(this is internally, where I can presume a closed world)

2021-02-17T17:29:31.073900Z

Is that the epiphany, that n-ary predicate matching is simplified by reducing everything to triples?

quoll 2021-02-17T17:29:46.074100Z

no

quoll 2021-02-17T17:30:07.074300Z

The epiphany was just that I could find the dependencies automatically

2021-02-17T17:30:26.074500Z

oh ok 🙂

quoll 2021-02-17T17:30:40.074700Z

Because everything was unary and binary predicates (expressed in triples), then it was easy to see this

2021-02-17T17:30:49.074900Z

yeah

quoll 2021-02-17T17:31:26.075100Z

when it was n-ary predicates, then it wasn’t clear. But since I was working with the triples architecture it became very obvious

2021-02-17T17:31:31.075300Z

I guess that’s kind of what I meant by querying for them

quoll 2021-02-17T17:32:34.075600Z

weeeeeeell… yes, but… this was while parsing and creating an internal representation. It wasn’t yet in a graph, and therefore wasn’t queryable

quoll 2021-02-17T17:33:12.075800Z

Naga is my 3rd implementation of this (my first was in Java, and my second was in Clojure).

2021-02-17T17:33:55.076Z

ok yeah — I guess you’ll want to unify all the clauses to the same identity when they match.

quoll 2021-02-17T17:34:08.076200Z

The dependency identification in Naga is done here: https://github.com/threatgrid/naga/blob/main/src/naga/engine.cljc#L149

quoll 2021-02-17T17:35:40.076500Z

Recursive work through the body is done in the function https://github.com/threatgrid/naga/blob/main/src/naga/rules.cljc#L174

quoll 2021-02-17T17:36:22.076800Z

Back when it was simply: pred1(v1,v2), pred2(v3,v4)… then it was much easier

2021-02-17T17:37:18.077Z

thanks for the explanation 🙇

quoll 2021-02-17T17:37:36.077200Z

but now the body can be: [?a :attr ?b] [?b :attr2 "x"] (not (or [?b :attr3 "y"] [?a :attr3 "z"]))

quoll 2021-02-17T17:38:12.077400Z

Also, the head can contain multiple statements as well. So that’s an extra complexity 🙂

quoll 2021-02-17T17:41:25.077600Z

But yeah… you’ll see that the code I just gave you can process rules that have bodies with expressions like this. No “graph” is created to describe the rule, just standard Clojure structures.

quoll 2021-02-17T17:41:55.077800Z

It has shortcomings, but in general I’m a little proud of it 😊

2021-02-17T17:42:34.078Z

ah ok so the rules aren’t actually stored in the graph

2021-02-17T17:43:04.078200Z

you just use the representation to materialise the RETE network

quoll 2021-02-17T17:43:06.078400Z

correct

quoll 2021-02-17T17:43:25.078600Z

Though, that very first implementation in Java did so.

quoll 2021-02-17T17:44:05.078800Z

But it did the whole analysis of the dependencies first, and only when it had all of the information, then it inserted it all into the graph in one step

quoll 2021-02-17T17:45:32.079200Z

Strictly speaking, it could have done it in 2 steps… insert the basic rules first, then use queries to extract what was needed to determine dependencies, and then insert those dependencies. But it would actually put more work on the system to do that

👍 1
2021-02-17T17:45:45.079600Z

I’d imagine it’d be trivial to implement a subset of owl/rdfs in naga

quoll 2021-02-17T17:47:08.079900Z

Here you go. This is skos, using a subset of OWL

quoll 2021-02-17T17:47:33.080300Z

RDFS inferencing was actually my test case for the first implementation (in Java)

2021-02-17T17:47:51.080500Z

fantastic

quoll 2021-02-17T17:48:14.080700Z

Most of it is declarative. Lines 126 onwards are the most interesting

quoll 2021-02-17T17:49:40.080900Z

Like I said, I was very proud of this 🙂

2021-02-17T17:49:57.081100Z

ah I forgot you can use predicates as variables

quoll 2021-02-17T17:50:24.081300Z

That’s what I was referring to when I mentioned “2nd order expressions”

2021-02-17T17:50:39.081500Z

yeah that’s handy here

2021-02-17T17:50:54.081700Z

I thought those horn clauses looked a little strange at first!

quoll 2021-02-17T17:51:07.081900Z

😄

quoll 2021-02-17T17:51:14.082100Z

They definitely do!

quoll 2021-02-17T17:51:25.082300Z

That’s really tough to pull off in Prolog

2021-02-17T17:51:29.082500Z

does this perform well?

quoll 2021-02-17T17:51:45.082700Z

(though, I’m still learning how to implement Prolog)

2021-02-17T17:52:02.082900Z

cut is a mind melter

quoll 2021-02-17T17:52:10.083100Z

Oh yes. It performs VERY well. After all, it’s well bounded and it runs in RETE

quoll 2021-02-17T17:55:04.083400Z

My first attempt at building the rules structures were for Mulgara (that skos code was actually from the Mulgara project). The directory containing it all is: https://github.com/quoll/mulgara/tree/master/src/jar/krule/java/org/mulgara/krule

2021-02-17T18:05:48.083600Z

ah thanks for sharing

2021-02-17T18:05:54.083800Z

what is the license for those rules?

quoll 2021-02-17T18:06:14.084Z

It was part of Mulgara, which is OSL

2021-02-17T18:06:14.084200Z

ok OSL

quoll 2021-02-17T18:06:27.084400Z

But it’s copyright to me. I should explicitly CC them

quoll 2021-02-17T18:08:20.084600Z

I want to point out something else, since I’m here. Naga include adapters to different databases, has a priority queue implementation, and various things in other files. But if you keep it to JUST the structure definitions used for rules, and the code that executes the engine, then:

$ wc -l rules.cljc engine.cljc schema/structs.cljc 
     215 rules.cljc
     176 engine.cljc
      76 schema/structs.cljc
     467 total

quoll 2021-02-17T18:09:25.084800Z

Meanwhile, the code that defines the structures and executes the engine (with far fewer features) in Java is:

$ wc -l *.java
     139 ConsistencyCheck.java
    1079 KruleLoader.java
      52 KruleStructureException.java
     243 QueryStruct.java
     389 Rule.java
     360 RuleStructure.java
    2262 total

quoll 2021-02-17T18:12:09.085Z

I had to trace through the Mulgara code to figure out out rules get executed, how they’re loaded, etc, and it took me over 10 minutes. (it’s in a whole lot of other directories). But that’s all glue to incorporate it into the database system. The meat of it is in that single directory. But the whole thing is embarrassingly complex!

👍 1
quoll 2021-02-17T18:16:50.085300Z

That’s my little love letter to Clojure ❤️

2021-02-15T10:07:59.022200Z

AFAIK you can do the same with RETE (create infinite recursions)

quoll 2021-02-15T15:16:23.054200Z

Ah, I just addressed some of this in the thread above (sorry... I’m slow on my phone)

quoll 2021-02-15T15:17:32.056Z

Naga was Datalog, with the restriction of binary/unary predicates. (Higher arity actually isn’t hard to do, but I never needed it :woman-shrugging:)

quoll 2021-02-15T15:18:00.056800Z

But it’s been extended in ways that are described in the thread

quoll 2021-02-15T15:18:29.057400Z

Full query syntax is allowed in the rules too.

quoll 2021-02-15T15:20:44.059800Z

Including not, or, filters, bindings, etc. The head can also have multiple triples to be asserted for a single match (allowing multiple attributes for a single generated entity)

quoll 2021-02-15T15:26:20.066800Z

Also, Naga rules are materialized as needed. If you’re using a macro, then yes, that has to be at startup in ClojureScript (because ClojureScript macros are evaluated as Clojure code during compilation), but not in Clojure. But the r macro is just a convenience around the rule function, which takes: • head: a seq of triple patterns • body: a query :where clause expression • name: optional This function can be called at runtime, even in Cljs.

quoll 2021-02-15T15:27:03.067900Z

We do it at Cisco. Build rules on the fly, then execute them against a connection to Asami

simongray 2021-02-18T09:46:43.094800Z

Just wanted to say thank you for this massive explanation 🙂

quoll 2021-02-15T20:52:46.068400Z

Datalog can be implemented via RETE