asami

Asami, the graph database https://github.com/threatgrid/asami
2021-02-19T00:56:32.103300Z

Looks much better šŸ‘Œ You could possibly tidy it a little further like this: https://github.com/threatgrid/naga/pull/134 but arguably youā€™ll be hitting diminishing returns pretty soon.

quoll 2021-02-19T02:39:18.103600Z

Itā€™s coincidental that youā€™d be mentioning some of this, since it came up this week.

quoll 2021-02-19T02:41:42.103800Z

Iā€™m tempted to move cli.clj to a separate project entirely, or at least a module under this one (Iā€™ve never played with modules before, so I donā€™t know how useful that would be). The CLI was actually written as an example of how to use Naga. It ended up doing more than I first expected, and it was kinda cool. But itā€™s really not supposed to be part of the project.

quoll 2021-02-19T02:44:26.104Z

Iā€™m also tempted to remove Asami as a dependency, since itā€™s not needed at all if you want to run Naga with Datomic. But since weā€™re using it here, it wasnā€™t going to hurt me to leave it where it was. It also makes the CLI useful. Maybe if the Asami and Datomic adapters are made into modules, then the CLI can be separated out entirely, and depend on Naga + Asami-adapter? This is more Leiningen than I know right now.

quoll 2021-02-19T02:47:11.104300Z

The reason it came up this week was because the CLI was being referenced by :main which compiled Naga and all itā€™s dependencies. That was a horrible mistake to discover! šŸ˜– I started by removing :main altogether, but put it back when I discovered the ^:skip-aot metadata. But if I split it out, then a lot of these problems go away.

quoll 2021-02-19T02:51:21.104500Z

Finallyā€¦ Zuko needs Cheshire, due to its ability to parse JSON. I could use clojure/data.json but thatā€™s not as fast, so I was reluctant to go that way. The JSON related code is not actually core Naga functionality, but itā€™s something we use a lot.

quoll 2021-02-19T02:52:32.104700Z

Iā€™ll give some thought to all of this, and Iā€™ll also look into how modules are put together. If you have anymore to contribute Iā€™d love to hear it!

quoll 2021-02-19T02:54:31.104900Z

Honestlyā€¦ Naga isnā€™t all that complex. Executing a rule just means turning the body into a where clause, and then projecting the results into groups of 3, which then get inserted as statements for every result row.

quoll 2021-02-19T03:05:28.105100Z

The tricks are in things like: 1. identifying the parts of the where clauses which can be affected by parts of the output from any rule. 2. fill the queue with rules 3. If the queue is empty, exit 4. take the first rule from the queue, and check if any of the parts in its body (the :where clause patterns) have changed. If not, return to step 3. 5. cache the new results of the parts of the :where clause patterns (for comparison next time). We use semi-naĆÆve reasoning which means that we need only store the count. If we start getting more aggressive about negation operations, then this may need to turn into a hash (which is, more expensive) 6. something changed, so run the rule. This executes the :where clause, and projects each row into a group of triples. These are all inserted. 7. check if any rules had parts that can be changed by this rule. If so, add them to the queue. (The queue will ignore any duplicates) 8. go back to step 3

quoll 2021-02-19T03:06:56.105300Z

Also, when generating new entities (unbound variables in the head of the rule), then the :where clause is updated to exclude any results which will generate an entity that is exactly equal to one that already exists (this is a cute bit of query rewriting, and was actually the impetus to get not into Asami).

quoll 2021-02-19T03:07:22.105500Z

Thereā€¦ now you know how to build a rule engine šŸ™‚

quoll 2021-02-19T03:08:36.105900Z

BTW, Iā€™m not at work tomorrow

pithyless 2021-02-19T17:25:32.107900Z

In https://github.com/threatgrid/asami/wiki/Entity-Structure:

(#datom [:tg/node-10499 :db/ident :tg/node-10498 1 true]
 #datom [:tg/node-10499 :tg/entity true 1 true]
 #datom [:tg/node-10499 :name "Fitzwilliam" 1 true]
 #datom [:tg/node-10499 :home "Pemberley" 1 true])
Am I correctly assuming that's a typo and should be:
#datom [:tg/node-10499 :db/ident :tg/node-10499 1 true]

quoll 2021-02-22T16:09:13.000100Z

Fixed this. Thank you for the feedback!

pithyless 2021-02-22T16:28:03.000300Z

No problem, I just was trying to grok what the :db/ident was (and how it differs from Datomic's :db/ident); and the example made me do a double-take. Two questions that I had unanswered after reading the docs: 1. Why have both :db/id and :db/ident, if I can explicitly set :db/id myself and :db/ident is also treated as a global identifier of a single node? Is it an indexing issue? Or are there certain api functions that expect ident, but not id? For example, I checked MemoryDatabase d/entity but it actually considers both as valid inputs. 2. Is there any interest in supporting a lookup ref syntax ala Datomic (e.g. [:email "<mailto:x@y.com|x@y.com>"] ) in the future? Or is that considered out of scope? This also came up as I tried to understand Asami's identities. All I could find was a closed issue without a followup: https://github.com/threatgrid/asami/issues/97

quoll 2021-02-22T17:00:25.000800Z

Iā€™ll address the #1 to start with: Itā€™s a little different to Datomic. :db/ident is an explicit attribute added to entities. It can be any value. If you donā€™t supply one, then Asami allocates it, defaulting to using a loopback on the node. For the in-memory value, that node is represented by a keyword with a prefix of :tg/node-. In Datomic, you might explicitly state that you want a node using datomic.Peer/tempid, and after itā€™s inserted then it looks like a number (distinguished from actual long values be appearing in the ā€œentityā€ position of a statement, or if itā€™s in the ā€œvalueā€ position, then it gets determined by the attribute datatype). Asamiā€™s in-memory store just uses these magic keywords. (the on-disk storeā€¦ which is Real Soon Nowā€¦ uses long values internally, not keywords). Anyway, the :db/ident will either be what you specify, or it will refer to itself. :db/id is different. It is an implicit attribute that does not appear in the database. Instead, itā€™s used to refer to the entity that is represented by the node. To explain this, I want to explicitly describe the entity structures in the graph (I realize that youā€™ll know a lot of it, but I want to make sure weā€™re in the same place). Consider a simple entity:

{:db/ident "simple"
 :foo "bar"}
This has 2 attributes: :db/ident and :foo. To represent this in a graph, I need to have a node that will represent them. Letā€™s call that node my-entity. The graph for this entity can then be specified with the edges:
[my-entity :db/ident "simple"]
[my-entity :foo "bar"]
Allocating a node to represent structures like this means that we can also build nested structures:
{:db/ident "nested"
 :foo "hello"
 :bar {:foo "world"}}
The top level structure will be allocated a node (call it outer) and the nested structure will be allocated its node (call it inner):
[outer :db/ident "nested"]
[outer :foo "hello"]
[outer :bar inner]
[inner :foo "world"]
Of course, in an in-memory database in Asami, then outer may be :tg/node-1 and inner might be :tg/node-2 The question is, how can I refer to the node that represents an entity if I am just using the entity map style of structure? The entity has various attributes (like :db/ident and :foo, but no direct way to refer to the node itself. This is what :db/id does. In fact, Datomic uses :db/id to do exactly the same thing when inserting. So if I say that I want to insert an entity of:
{:db/id :my-marvelous-entity
 :db/ident "mine"
 :foo "hello"
 :bar {:foo "world"}}
Then the statements that this will be turned into are:
[:my-marvelous-entity :db/ident "mine"]
[:my-marvelous-entity :foo "hello"]
[:my-marvelous-entity :bar :tg/node-3]
[:tg/node-3 :foo "world"]
I can even add a :db/id to the nested entity.

quoll 2021-02-22T17:22:10.001Z

I hadnā€™t really thought about the ref syntax, but it should be doable. Itā€™s similar to using :db/ident in that it has to do a lookup. I just need to make sure I donā€™t forget any codepaths that could be affected by it

quoll 2021-02-22T17:24:05.001200Z

Actuallyā€¦ :db/ident is already a kind of lookup ref, so the machinery is basically there

quoll 2021-02-22T17:26:00.001400Z

The main difference is that I used the {:db/ident value} syntax for that, instead of [:db/ident value] (I donā€™t recall when Lookup Refs were introduced into Datomic, but they werenā€™t there early on. Asami reflects an older set of APIs in Datomic)

pithyless 2021-02-22T17:26:05.001600Z

Yeah, thanks for the explanation. The way I see it, the primary difference is :db/ident in Asami is a "global" lookup ref and in Datomic is a "namespaced" lookup ref.

quoll 2021-02-22T17:26:56.001800Z

Iā€™ll probably just do the lookup ref, look up the data, and if thereā€™s more than one value, throw an ex-info

quoll 2021-02-22T17:28:13.002Z

Thatā€™ll offer the best compatibility, I think

pithyless 2021-02-22T17:31:21.002200Z

I was wondering if perhaps it's more difficult, because of the open-world assumption and no schema that explicitly states that some attribute will be used as a reference lookup-ref

pithyless 2021-02-22T17:32:32.002400Z

But offering a way to do a lookup + throwing error if more than one exists may be a nice way for compatibility.

pithyless 2021-02-22T17:46:33.002600Z

This is my understanding of the difference:

;; Transactions - Datomic - need to make sure :person/name is unique for people

 [{:db/id "parent"
   :person/name "Jill"}
  {:person/name "Susie"
   :person/parent "parent"}]

 ;; or
 [{:person/name "Susie"
   :person/parent {:person/name "Jill"}}]

 ;; or, assuming Jill already exists...
 [{:person/name "Susie"
   :person/parent [:person/name "Jill"]}]

pithyless 2021-02-22T17:46:44.002800Z

;; Transactions - Asami - need to make sure :db/ident is unique for all datoms

 [{:db/ident "jill"
   :person/name "Jill"}
  {:person/name "Susie"
   :person/parent {:db/ident "jill"}}]

 ;; or
 [{:person/name "Susie"
   :db/ident "susie"
   :person/parent {:db/ident "jill"
                   :person/name "Jill"}}]

 ;; or, assuming Jill already exists...
 [{:person/name "Susie"
   :db/ident "susie"
   :person/parent {:db/ident "jill"}}]

pithyless 2021-02-22T17:49:01.003400Z

;; Datomic - queries
 [?e :person/parent [:person/name "Jill"]
  ?e :person/name ?name]

 ;; Asami - queries
 [?p :db/ident "jill"
  ?e :person/parent ?p
  ?e :person/name ?name]

quoll 2021-02-22T17:50:21.003600Z

Are you sure about queries allowing that?

quoll 2021-02-22T17:51:06.003800Z

From the Datomic docs: > Lookup refs have the following restrictions: > - The specified attribute must be defined as eitherĀ :db.unique/valueĀ orĀ :db.unique/identity. > - When used in a transaction, the lookup ref is evaluated against the specified attributeā€™s index as it exists before the transaction is processed, so you cannot use a lookup ref to lookup an entity being defined in the same transaction. > - Lookup refs cannot be used in the body of a query though they can be used asĀ https://docs.datomic.com/on-prem/query/query.html#multiple-inputs.

quoll 2021-02-22T17:51:21.004Z

That last thing suggests that you canā€™t using them in a query

pithyless 2021-02-22T17:53:24.004500Z

I'm pretty sure queries allow it, because we write a lot of code that depends on that sugar syntax :]

pithyless 2021-02-22T17:53:37.004700Z

(sorry, need to go afk ~1h)

quoll 2021-02-22T17:58:40.004900Z

Itā€™s a reasonably easy transformation on the query, but a surprising one, given that there is explicit documentation that says that you canā€™t do it šŸ™‚

pithyless 2021-02-22T18:43:14.005100Z

@quoll from the Datomic docs: > Resolving Entity Identifiers in V Position > Datomic performs automatic resolution of https://docs.datomic.com/on-prem/schema/identity.html#entity-identifiers, so that you can generally use entity ids, idents, and lookup refs interchangeably. https://docs.datomic.com/on-prem/query/query.html#limitations

pithyless 2021-02-22T18:44:28.005300Z

ah, I see now: > Lookup refs cannot be used in the body of a query though they can be used asĀ https://docs.datomic.com/on-prem/query/query.html#multiple-inputs.

quoll 2021-02-22T18:48:41.005500Z

I think it might actually be easier to just let them through in a query. So instead of their example of:

(q '[:find ?artist-name
     :in $ ?country
     :where [?artist :artist/name ?artist-name]
            [?artist :artist/country ?country]]
   db [:country/name "Belgium"])
it would actually be easier for me to just accept this instead:
(q '[:find ?artist-name
     :in $
     :where [?artist :artist/name ?artist-name]
            [?artist :artist/country [:country/name "Belgium"]]]
   db)
If I accept that, then it would automatically support the parameter query as well

pithyless 2021-02-22T18:50:57.005700Z

^ I just verified in a REPL both versions of that kind of query work in Datomic (assuming :artist/country is a unique identity)

pithyless 2021-02-22T18:51:11.005900Z

so apparently the docs are a little misleading (or not up-to-date)

quoll 2021-02-22T18:51:19.006100Z

probably not up to date

quoll 2021-02-22T18:51:26.006300Z

Datomicā€™s API keeps evolving

pithyless 2021-02-22T18:51:44.006500Z

lookup-refs were introduced here: https://docs.datomic.com/on-prem/changes.html#0.9.4556

quoll 2021-02-22T18:51:52.006700Z

Which is one reason for certain missing features. I havenā€™t kept up with Datomic over time

pithyless 2021-02-22T18:52:13.006900Z

But maybe it changed b/c of this? > Allow lookup refs for V position in users of VAET index, including :db.fn/retractEntity. https://docs.datomic.com/on-prem/changes.html#0.9.4766

pithyless 2021-02-22T18:53:22.007100Z

I've been using them like that in queries since I can remember; although your links to the docs had started to make me doubt my sanity. šŸ˜…

quoll 2021-02-22T18:54:02.007300Z

Thatā€™s OK

quoll 2021-02-22T18:54:51.007500Z

I was using Datomic several years before Lookup refs came out. Iā€™ve learned a few new things since, but I havenā€™t put the time in to learn everything

quoll 2021-02-22T18:55:00.007700Z

(they came out in February 2014)

quoll 2021-02-22T18:56:38.007900Z

Anyway, itā€™s not going to be top of my queue, but Iā€™ve added this: https://github.com/threatgrid/asami/issues/112

šŸ‘ 1
pithyless 2021-02-22T18:58:42.008300Z

Thanks for taking the time and sorry for the long tangents! šŸ™‚

quoll 2021-02-22T19:32:24.008500Z

No problem. I want to interact with people more, even if itā€™s just to convince everyone that the project is alive šŸ™‚

quoll 2021-02-22T19:32:31.008700Z

ā€¦ and responsive

quoll 2021-02-19T18:09:43.109200Z

Huh... yes. I must have copy pasted from something else and then tried to hand tweak for consistency

quoll 2021-02-19T18:10:02.109900Z

Iā€™m in a car right now. Iā€™ll try to fix when I get home