asami

Asami, the graph database https://github.com/threatgrid/asami
2021-02-18T09:37:02.057600Z

@quoll picking up on this thread: https://clojurians.slack.com/archives/C09GHBXRC/p1613605836089200?thread_ts=1613605226.087200&cid=C09GHBXRC I get that some users might be more familiar with having a “mutable” database/connection, and that this mirrors datomic… However I find this sort of thing makes naga/asami harder to use for these usecases. It’s much less obvious about how to wire things together as we have to learn asami connection APIs; for negative value compared to something more value oriented e.g.

(->> asamival/empty-db 
       (asamival/transact {:tx-data [[:db/add :foo :bar :baz]]) 
       (nagaval/materialise-inferences program) 
       (asami/q '[:find ,,,,]))
In terms of API design I feel like the “clojure way” for creating maximally useful APIs is to essentially avoid creating singletons, defing atoms etc… from my perspective state management is largely an application concern not a library concern. That’s not to say there isn’t value in providing those sort of interfaces; but I feel that’s the distinction between a library and a framework. i.e framework’s take on application concerns by providing organisational conventions around application concerns. Frameworks get in the way however if their paradigm isn’t what suits your application… for example what if I wanted to use clojure agents to change asami graphs, or refs to transact them with other in memory state, or provide my own storage layer with a shape different to asami’s protocols? I feel like asami and naga are possibly doing too much by being frameworks rather than libraries, and their use and implementation could be simpler by doing less. @quoll… Sorry to criticise by the way, naga and asami look absolutely fantastic and I really like what I see here. I also know it’s possible to work around these issues, but it would great to be able to support the in memory use case without this sort of friction. I don’t know if it’s possible at this stage to extract the pure stuff into a smaller separate library, and leave the frameworky bits elsewhere? Similarly I noticed naga’s project.clj pulls in some rather large deps, that I think should be either moved into separate profiles or put in a “provided” scope, essentially making them optional. i.e. the library use case doesn’t need a CLI or datomic-free or postgresql. Anyway thanks again for all the hard work here, I’ve really been enjoying playing with these libraries, and may even be able to contribute relevant changes after I’ve learned more. 🙇

dominicm 2021-02-18T10:18:06.061900Z

I also ran into these issues fwiw 🙂. I avoided by going to the lower-level API underneath, but I found that easier as I have used naga before the conn concept was introduced and knew where to poke.

2021-02-18T10:19:10.062100Z

Yeah not done any digging yet, but good to know.

2021-02-18T10:26:41.065500Z

@quoll: For fun I had a go at defining rdfs entailment in naga. As I think your skos.rlog definition was missing some rdfs rules e.g:

rdf:type(YYY,XXX) :- rdfs:domain(AAA, XXX), AAA(YYY, ZZZ).    /* rdfs2 */
rdf:type(ZZZ,XXX) :- rdfs:range(AAA, XXX), AAA(YYY, ZZZ).     /* rdfs3 */
One small issue is that I don’t think rules like rdfs1 https://www.w3.org/TR/rdf11-mt/#rdfs-entailment are expressible in naga (or at least pabu). I can obviously pretty easily materialise these instances myself, but I was wondering if an extension might be possible, to allow arbitrary clojure predicate functions to be called as unary predicates in consequent positions. Logically I guess they’d just be wrapped to return success or failure goals and essentially trigger backtracking (though not sure how these concepts map into your RETE implementation). I guess if you were to do this, there’d need to be another restriction that the variables used here would need to be ground — but I expect the compiler could figure that out. I’ve not dug into the grammar yet so I don’t know how you’d represent such a thing syntactically, but I was imaging you could just use clojure.core/requiring-resolve and that with something like this it might be possible to express rdfs1 as something approximating:
rdf:type(A, rdfs:Datatype) :- A(B,C), clojure.core/keyword?(A) .
rdf:type(B, rdfs:Datatype) :- A(B,C), clojure.core/keyword?(B) .
rdf:type(C, rdfs:Datatype) :- A(B,C), clojure.core/keyword?(C) .

2021-02-18T10:41:14.068600Z

I guess such a thing might be better expressed in the naga representation than the pabu one though… As pabu rules could then be kept compatible with other datalogs… where as it would be reasonable to assume if you’re using the clojure rules representation you have access to clojure… Actually do you have something like this already? I seem to recall you mentioned negation and (or ,,,)

2021-02-18T10:51:31.069500Z

:thinking_face: hmm looks like mulgara might have had some special predicate/hacks for this sort of thing too e.g mulgara:UriReference

2021-02-18T10:55:06.072200Z

Actually I’m curious, it looks like mulgara “fixes” a frustration I’ve had in the past with the rdfs entailments… i.e. in rdfs IIRC literally everything is an rdfs:Resource, which means that Literals and URIs are kind of indistinguishable. It looks like mulgara’s entailments might try and keep these distinct? Is that what I’m looking at here @quoll? https://github.com/quoll/mulgara/blob/36ee68b9cccaca26f55a39d37511fb4664b004e0/rules/rdfs.dl#L38-L39 i.e: - 4a says any subject is a resource (therefore must be a URI as you can’t speak of literals) - 4b (appears to) say that any object is a resource iff its an object and of type uri?

2021-02-18T10:58:10.073200Z

ok actually going to have to tear myself away and stop digging any further, and do some real work. 😢

quoll 2021-02-18T14:14:45.073600Z

I’ll confess that I haven’t been looking at making that skos stuff work. It was lifted out of Mulgara. But since I know Naga/Asami can do everything Mulgara can do, then I figured it just needed a little weaking, if anything.

quoll 2021-02-18T14:21:54.073800Z

Also, it’s skos, so it’s not supposed to infer rdfs. A lot of rdfs inferences are trivial. For instance, I found that the inferences of everything being an rdfs:Resource to be useless from a practical perspective, because if it’s in the database, then it’s a resource.

quoll 2021-02-18T14:22:43.074Z

As for these rules, I don’t believe that they are valid.

quoll 2021-02-18T14:22:50.074200Z

Nor consistent

quoll 2021-02-18T14:23:54.074400Z

If you’re doing RDF (which Asami can approximate, but isn’t trying to be), then resources can only be datatypes if they’re IRIs (or URIReferences)

quoll 2021-02-18T14:25:13.074600Z

However, if you’re in Clojure, then keywords are basically QNames, and can be used as such. In that case, your inferences are OK, and are consistent, but I don’t believe they’re valid?

quoll 2021-02-18T14:28:34.074800Z

Actually, it’s hard to make something invalid when you’re only in RDFS, so I guess it’s valid, but it isn’t useful. e.g. This will infer that <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> (QName rdf:type or as a keyword :rdf/type) is an rdfs:Datatype. That makes the following syntactically correct, but nonsense: "foo"^^&lt;rdf:type&gt;

quoll 2021-02-18T14:30:35.075Z

Yes. Rules are actually just a :where clause, and a projection to groups of triples for assertion. Pabu is just a parser that takes Datalog and creates such a thing, but because the parsing hasn’t been looked at for a long time, it has limited syntactic capabilities.

quoll 2021-02-18T14:32:49.075200Z

If I were to expand Pabu, I should really do it with instaparse. At the moment it uses Parsatron, which was really fun and cool to play with at the time. I just needed a quick way to parse something, and I remembered a previous colleague of mine had ported Parsatron from Haskell parser combinators, so I grabbed it quickly, and 45 minutes later the first version of Pabu existed. It was really only that well thought out 🙂

quoll 2021-02-18T14:33:42.075400Z

SPARQL does specifically allow for extensions like this, so hopefully it’s not a “hack” 🙂

quoll 2021-02-18T14:35:42.075600Z

URIReference, but yes.

quoll 2021-02-18T14:39:49.075800Z

RDF does not allow literals as a subject or predicate, and that avoids the type error that would result. Asami lets you do this, which actually allows for some interesting data expressions. Like:

12 :math/factor 6
12 :math/factor 4
12 :math/factor 3
12 :math/factor 2
It also let someone use strings as a kind of “magic” predicate in a graph that was being rendered (keywords were for attributes on object, and strings were edges between the objects). That was definitely a hack, but it allowed a previous datastructure to be ported into Asami with no effort.

quoll 2021-02-18T14:41:32.076800Z

Originally, everything was done via a graph protocol, which is reasonably small:

(defprotocol Graph
  (new-graph [this] "Creates an empty graph of the same type")
  (graph-add [this subj pred obj tx] "Adds triples to the graph")
  (graph-delete [this subj pred obj] "Removes triples from the graph")
  (graph-transact [this tx-id assertions retractions] "Bulk operation to add and remove multiple statements in a single operation")
  (graph-diff [this other] "Returns all subjects that have changed in this graph, compared to other")
  (resolve-triple [this subj pred obj] "Resolves patterns from the graph, and returns unbound columns only")
  (count-triple [this subj pred obj] "Resolves patterns from the graph, and returns the size of the resolution"))

quoll 2021-02-18T14:43:18.078700Z

originally, graph-transact and graph-diff weren’t there. graph-diff came about because people wanted to see what had changed after Naga had run on it. I consider it optional, as nothing in Asami uses it.

quoll 2021-02-18T14:46:39.081100Z

graph-transact is much more recent. Originally, there were 2 functions in the query namespace (not a great place for them, but :woman-shrugging: )

(defn add-to-graph
  [graph data]
  (reduce (fn [acc d] (apply graph/graph-add acc d)) graph data))

(defn delete-from-graph
  [graph data
  (reduce (fn [acc d] (apply graph/graph-delete acc d)) graph data))
As you can see, these just called graph-add or graph-delete for each statement

quoll 2021-02-18T14:48:05.082400Z

The new graph-transact function does exactly this… applying deletions first, then assertions. It also takes a transaction ID, which is not (currently) used in the in-memory database

quoll 2021-02-18T14:49:25.083300Z

Anyway, I believe that the `Graph` protocol is the API you want @rickmoynihan

quoll 2021-02-18T14:51:21.084600Z

What I found was that no one I worked with felt comfortable learning Asami. So I wrapped that Graph API in a Database/Connection façade, and voilà! They started using it! 🙂

quoll 2021-02-18T14:51:34.084900Z

But it’s all still a Graph under the covers

quoll 2021-02-18T14:53:49.086500Z

If you have a Connection, then you get the most recent Database using (asami.core/db connection) If you have a Database, then you get the graph for it using (asami.core/graph database)

quoll 2021-02-18T14:56:01.088400Z

You can then do whatever you want with the graph (querying it the same as you do a Database… in fact, the q function only works on graphs, and calls graph on a database to get the graph). When you’re done, you can make it look like a Connection/Database again by calling (asami.core/as-connection graph)

quoll 2021-02-18T14:56:57.089500Z

Optionally, you can give it a URI for the connection to associate it with:

(asami/as-connection graph "asami:<mem://my-data>")

quoll 2021-02-18T14:57:43.090400Z

This will actually replace any registration of prior connections at that URI, which is probably something to be aware of, and also really useful

quoll 2021-02-18T15:00:15.091700Z

So, referring back to that original code:

(-&gt; index/empty-graph
    (graph/graph-transact 0 [[:foo :bar :baz]] nil)
    asami/as-connection
    (naga-engine/run my-program)
    db
    graph
    (graph-q '[:find ?e ?a ?v :where [?e ?a ?v]]))

quoll 2021-02-18T15:00:40.091900Z

Should this all go in the Wiki?

2021-02-18T15:17:57.092Z

:thumbsup: Yes I’m aware of most of that (though you’re right to pick me up on misunderstanding rdfs:Datatype and rdfs1 — it’s about literals/datatype-uris and not quite what I thought — the classes of things you can speak of (i.e. BNodes/IRIs)). Re: RDFS I know it’s not part of skos… I just want a combination of some rdfs inferences (mostly domain / ranges, subproperties etc), and likely also some or all of the skos you have too. Though at this stage I’m really just tinkering, rather than having concrete plans for any of this — so I was mainly wanting to write the rdfs rules out in as an excuse to play with naga. The suggestion above came out of that tinkering. My suggestion really wasn’t about RDF though, it was primarily do you think it might be useful to support arbitrary clojure predicates in naga logic programs like this?

2021-02-18T15:21:16.092400Z

:thumbsup: yup I’m aware of this. matcha is similar.

2021-02-18T15:21:56.092900Z

Thanks for posting all the above… I’ll try and digest it all later 🙇

2021-02-18T15:26:48.093Z

true 🙂

quoll 2021-02-18T16:27:32.094Z

Also… I just remembered: Yes, you’re right. The datomic dependency is entirely optional and should be in a separate profile. I’ve been too lazy to do this.

quoll 2021-02-18T16:28:09.094800Z

When you say that it brings in a lot, that’s only because of Datomic. Remove that, and there’s VERY little to come in. Most of it is code that I’ve written myself

quoll 2021-02-18T16:38:04.094900Z

Quick answer: yes

🥳 1
quoll 2021-02-18T16:38:51.095200Z

They need to be identified as such (and not as edges to be searched for in the database), and then they get turned into filters instead

👍 1
quoll 2021-02-18T16:41:26.095500Z

a trivial way to do this might be to look for namespacing with a / character, instead of a : character

2021-02-18T16:41:51.095700Z

yeah that’s what I was thinking actually

2021-02-18T16:42:25.095900Z

though might be better not to put it in pabu?! And just have it with naga?

quoll 2021-02-18T16:42:34.096100Z

BTW, I just checked in a pabu modification that allows for -- comments 🙂

2021-02-18T16:42:49.096400Z

awesome! 🙂

quoll 2021-02-18T16:42:54.096600Z

Pabu is just a parser that generates Naga rules

quoll 2021-02-18T16:43:04.096800Z

It takes a string and returns Naga rules

2021-02-18T16:45:45.097Z

Incidentally something else I noticed in the README. /* */ comments aren’t iso prolog comments, prolog uses %. Some implementations e.g. swi do additionally support C style comments blocks; but as far as I know they’re non standard.

quoll 2021-02-18T16:45:59.097200Z

I was about to show my manager this thing I’d just gotten working, except it was only configurable in code, and I knew him well enough to know that he would immediately ask me to try changing things to see if they worked. That would work best if I hard a parser for rules. Which was why I put Pabu together so quickly. It was never supposed to last. But when I saw how it could run all sorts of simple Prolog code… well, I kept it 🙂

quoll 2021-02-18T16:46:40.097400Z

I don’t know Prolog very well, so that’s good to know, thank you

2021-02-18T16:47:15.097600Z

I’m not suggesting you add a 3rd comment form — but might % not be a better choice, as it will probably make your datalog a proper subset of prolog syntax… and thus work properly in emacs prolog-mode etc.

2021-02-18T16:48:49.097800Z

I think sicstus might also support C style ones

quoll 2021-02-18T16:49:08.098Z

errr, well… I’ve already done it, and am just running the regression tests now 😜

2021-02-18T16:49:49.098200Z

oh well the ship has sailed 🚢 👋

quoll 2021-02-18T16:51:51.098400Z

I like the -- style, because I kept seeing it in various places. Also SQL

quoll 2021-02-18T16:52:20.098600Z

plus, it should make those mulgara rules work

quoll 2021-02-18T16:52:49.098800Z

not the mulgara:UriReference function though. I’ll give that one some thought

2021-02-18T16:53:17.099Z

Yes I quite like them visually… just a shame to miss out on free syntax highlighting support

2021-02-18T16:53:33.099200Z

and automatically commenting out blocks

2021-02-18T16:53:54.099400Z

not a big deal though

quoll 2021-02-18T17:43:05.099800Z

A big exception there is Plumatic Schema.

quoll 2021-02-18T17:43:58.100900Z

I’m not sure if I should keep that or not. It was so useful during development, and people looking at my code said that they really appreciated seeing it. That said, it does nothing at runtime

quoll 2021-02-18T19:32:42.101900Z

@rickmoynihan this morning you inspired me to clean up my builds Try depending on Naga 0.3.12 and tell me what you think about the dependencies

2021-02-23T10:02:35.008900Z

Thanks. That all makes sense. Re: leiningen and modules… Have you considered using tools.deps instead? In my experience it makes having multiple modules within the same project quite a bit simpler. Especially if you don’t need a build. In particular git deps, with roots into the project can make stuff like the CLI work really nicely in an independent way. AFAIK the only build you have is for the clojurescript stuff, but I think that could probably be done quite easily too with just cljs.main and deps.edn.

2021-02-23T10:03:22.009100Z

It definitely makes some things harder though; but I think overall it may be simpler for you and provide you with more advantages than disadvantages for this collection of projects.

quoll 2021-02-23T15:18:42.009300Z

While trying to figure it all out yesterday, I started looking at what leiningen itself does. There’s actually another project within leiningen called leiningen-core. This is built manually, as if it were an unrelated project, and then the main project has a dependency on it. It’s not automated, but I’m actually OK with this approach. So I decided to go with it.

quoll 2021-02-23T15:19:51.009500Z

So now if you go to Naga, the project just builds the library, and nothing else. Inside of it there is a cli directory, which contains a dependency to this library.

quoll 2021-02-23T15:20:47.009700Z

This has the nice effect that the cli is now completely optional, and if you want it then it gets built with full AOT (meaning that it can be run easily as a CLI without needing dependencies set up for the classpath)

quoll 2021-02-23T15:21:13.009900Z

Also… no, I had not considered using tools.deps 🙂

2021-02-23T15:22:19.010100Z

what you have looks good. Definitely better to separate the concerns like that :thumbsup:

quoll 2021-02-23T15:22:36.010300Z

Thanks

quoll 2021-02-23T15:23:42.010500Z

The CLI was never envisioned for this in the first place. Neither was Pabu, to be honest 😄 But I’ll admit that when I took some simple Prolog and it just ran without modification, then I was so incredibly happy.

2021-02-23T15:23:52.010700Z

It’s essentially what I wanted to do in my initial PR. I’d just assumed you prefered the app to be the most prominent thing rather than the library. But as it’s mainly an example what you have here, inverting that relationship is perfect.

quoll 2021-02-23T15:24:53.011200Z

No, it wasn’t supposed to be prominent. I really just built the CLI to provide a template for people to understand how to call Naga. But then I kept being asked for more features on it :rolling_on_the_floor_laughing:

2021-02-23T15:25:26.011500Z

:thumbsup:

quoll 2021-02-23T15:25:37.011700Z

Sort of like “runnable documentation”

2021-02-23T15:26:12.011900Z

Presumably you can kill this line now too: https://github.com/threatgrid/naga/blob/e1e5e568400601ee1da313570642328ae853b408/project.clj#L8

quoll 2021-02-23T15:32:26.012200Z

Doh!

quoll 2021-02-23T15:32:29.012400Z

Yes

quoll 2021-02-23T15:43:30.012600Z

This is exactly why making everything open source is so valuable. People note my mistakes (ouch!) and I fix things that I wouldn’t have thought of otherwise.

👍 1
quoll 2021-02-24T00:17:06.013100Z

@rickmoynihan you may be amused to note that your PR not only prompted me to rearrange Naga, but it also resulted in submitting a PR to lein-modules 😂

👍 1
quoll 2021-02-18T19:34:01.102Z

Surprisingly, one of Asami’s plugin dependencies (`cider-nrepl`) was included in the Asami release. I have no idea why this would happen. But I don’t use it anyway, so I’ve removed that, and it’ll propagate through later

quoll 2021-02-18T19:51:02.102200Z

BTW, I don’t know if it was clear earlier… there is already external predicate support in Naga (via Asami). So the Pabu-style rules:

rdf:type(A, rdfs:Datatype) :- A(B,C), clojure.core/keyword?(A) .
rdf:type(B, rdfs:Datatype) :- A(B,C), clojure.core/keyword?(B) .
rdf:type(C, rdfs:Datatype) :- A(B,C), clojure.core/keyword?(C) .
Could be generated in code as:
(r [?a :rdf/type :rdfs/Datatype] :- [?b ?a ?c] [(keyword? ?a)])
(r [?b :rdf/type :rdfs/Datatype] :- [?b ?a ?c] [(keyword? ?b)])
(r [?c :rdf/type :rdfs/Datatype] :- [?b ?a ?c] [(keyword? ?c)])

quoll 2021-02-18T19:57:19.102400Z

I figure it was worth calling out since I’ve been explicit in stating my RDF background, and that a lot of design was influenced by Mulgara

2021-02-18T23:21:26.102900Z

Oh absolutely. I’m very grateful for all your explanations.

2021-02-18T23:26:06.103100Z

Ok this is fantastic and is exactly what I was asking for, I didn’t realise you were saying this, so it’s amazing to see you’re already doing it! :thumbsup: I can imagine this will be very handy.