Hello,
How should I model the schema of entities whose identity is predicated on several attributes, some of which are optional?
Assuming the attributes are :a0
, :a1
, :opt0
, :opt1
The following are valid distinct entities:
{:a0 0 :a1 1}, {:a0 0 :a1 1 :opt0 0}, {:a0 0 :a1 1 :opt1 1}, {:a0 0 :a1 1 :opt0 0 :opt1 1}
Hi :) with Crux you can use these maps directly as explicit IDs
I'm still not sure which implementation I'll be going with. It's good to know crux supports it. I can always artificially collect these to a keyword in order to create an identifier in the system How's Crux's read performance? I'm considering using datalog for a very read-intensive system, whereas bitemporality is less important.
Crux's read performance is pretty great because it delegates so much of the real work to the KV store. LMDB can be up to 3x faster than RocksDB for reads in some of our measurements so might be the better choice for you. I can't comment on comparative performance with the other Datalog engines you might be looking at, but we run graph query "stress test" benchmarks every night and hold up fairly well against the likes of RDF4J and Neo4j (which have benefited from decades of performance engineering)
Well, in the meanwhile I'm in the proof of concept stage, more worried about my domain modelling. If you could point me to the specific part of the docs I'd be much obliged 🙂
A lot of Crux users seem to be happy doing the schema modelling purely in spec. What Crux enforces is very minimal, so you probably want to write a few transactions functions to enforce any invariants you might need: https://www.opencrux.com/reference/transactions.html It usually best to model relationships (ref attributes) as reverse references, where child entities point to their parents (child->parent). The trade-off though is that it modelling in the forward direction (parent->child) benefits from native sorting in the indexes
In systems that are similar to Datomic (Datascript, Datalevin, Datahike), the entity id is nothing but a system generated integer, so you really don’t need anything special to model “entities whose identity is predicated on several attributes”, because that’s already the case.
you can directly transact this data (d/transact conn [{:db/id -1 :a0 0 :a1 1}, {:db/id -2 :a0 0 :a1 1 :opt0 0}, {:db/id -3 :a0 0 :a1 1 :opt1 1}, {:db/id -4 :a0 0 :a1 1 :opt0 0 :opt1 1}])
in Datascript, Datalevin or Datahike, and it will work, as schema is optional for these systems.
If you care about read performance on durable storage, give Datalevin a try, which stores data solely in LMDB. As far as I know, LMDB is the fastest kv-store for read intensive workload.
Alright, I'll give it a shot, thank you! If I were to use a system where schemas aren't optional, how would I do it? How would I distinguish when querying between entities with the same required attributes which might not even have the optional attributes? (one entity has to have it for the use case to be interesting, ofc)
Datalevin is specifically optimized for LMDB to maximize the performance, unlike other systems that try to enable pluggable storage.
Datalevin is more like Datomic in term of schema, although schema is optional, but if you want to do range query on an attribute, you should define the :db/valueTye
for it, as the keys are compared bitwise.
I am not sure Datalog care about if an attribute is required or not, that seems to be an application level concern.
your application code should maintain such constraint, I don’t think Datalog has such facility to allow you to say “this attribute is required for this entity”, for entity is generic, not typed.
but in Clojure, we do use some conventions to facility this, for example, the namespaced key
for example, an “salses” entity will have attributes such as :sales/company
, :sales/category
,etc.
but these are just conventions, database is not aware of these
in your application code, you can use libraries such as prismatic/schema to maintain such constraint, but the database is not going to enforce it.
the only constraints datomic flavor of datalog systems enforce are uniqueness, references, as well as data type if you specify them.
of course, you can always write transaction functions to check whatever properties that you want to check, including “this kind of entities require this key”
Right now, Datalevin doesn’t support persistent transaction functions, it’s on our roadmap though