datomic

Ask questions on the official Q&A site at https://ask.datomic.com!
Ivar Refsdal 2020-11-12T08:08:36.380100Z

Hi. And thanks for a fine piece of software! I'm having a problem with excision and on-prem: My history database (eavto) looks like this:

[17592186045418 :m/info "secret data that should be removed" 13194139534313 true]
[17592186045418 :m/info "secret data that should be removed" 13194139534315 false]
[17592186045418 :m/info "OK data" 13194139534315 true]
Then I execute excision:
{:db/excise 17592186045418,
 :db.excise/attrs [:m/info],
 :db.excise/beforeT 13194139534315}
After waiting for syncing of excision, my history database looks like this:
[17592186045418 :m/info "secret data that should be removed" 13194139534315 false]
[17592186045418 :m/info "OK data" 13194139534315 true]
Thus the bad secret data is still present in the history, but only the retraction, which does not make sense in my opinion. Is it possible to fix this? To also get rid of the retraction information? Here is a gist that reproduces this issue: https://gist.github.com/ivarref/f92d9efd45d1c0cbd2d239bf4904a323 Thanks and kind regards.

Ivar Refsdal 2020-11-13T08:09:32.396800Z

I agree it's a grey area. I wouldn't be too concerned about performance as excision is a seldom thing, but yes I do not know the performance / implementation implications of this suggestion. Are you sure the rule is too simple? Why would the first transaction of an entity's attribute (cardinality-many) have any retractions? It's the equivalent of:

@(d/transact conn [[:db/retract "new-item" :m/many "data-1"]
                   [:db/retract "new-item" :m/many "data-2"]])
which does not make sense. Or did I miss something?

favila 2020-11-13T16:30:05.401500Z

what I mean is that the retracts may be spread throughout the transaction history after the excision time. You need to know what values used-to-be asserted at moment T, and you need to look for the first retraction or assertion of any of those values forward in time. For cardinality-many, there won’t be just one transaction. You can terminate early if all values are accounted for, not on the first transaction

favila 2020-11-13T16:30:42.401800Z

in the worst-case, a value is never retracted later, so you scan all of time

Ivar Refsdal 2020-11-12T08:12:10.380200Z

CC @ornulf.risnes @schmandle

favila 2020-11-12T12:14:57.380600Z

BeforeT is not inclusive

favila 2020-11-12T12:15:41.381900Z

The history items that remain have a tx == your beforeT argument to the excision

avocade 2020-11-12T14:32:02.385800Z

Hey guys! Anyone else having an issue when using expound and datomic dev-local's (d/db conn) value in specs (either directly, or using guardrails/ghostwheel which wraps expound)? We filed an issue on it here for reference: https://github.com/bhb/expound/issues/205

2020-11-12T16:10:36.386200Z

these are logically equivalent. are they equivalent from a perf standpoint?

(d/q '[:find ?e
         :in $ ?id
         :where [?e :e/id ?id]]
       db id)

  ;; vs

  (d/q '[:find ?e
         :in $ ?e]
       db [:e/id id])
where `[:e/id id]` is a lookup ref

zilti 2020-11-12T16:21:14.386700Z

Is there an usable tutorial somewhere on how to set up Metabase with Presto?

tatut 2020-11-12T16:23:52.386800Z

I don’t think they are completely equivalent, the first will return a :db/id number and the latter will just return the lookup ref as is in the results

2020-11-12T16:28:26.387Z

ah, you're right.

2020-11-12T16:29:22.387200Z

i should have included a pull pattern in the example

tatut 2020-11-12T16:29:36.387400Z

so I would think the latter should be faster as it does nothing

2020-11-12T16:29:50.387600Z

my question is more around whether it matters to pass in the unique identifier or the lookup ref

tatut 2020-11-12T16:29:50.387800Z

you can give the latter a non-existing lookup ref and it just happily returns it

2020-11-12T16:31:25.388Z

(d/q '[:find (pull ?e pull-pattern)
         :in $ ?id pull-pattern
         :where [?e :e/id ?id]]
       db id pull-pattern)

  ;; vs

  (d/q '[:find (pull ?e pull-pattern)
         :in $ ?e]
       db [:e/id id] pull-pattern)

2020-11-12T16:31:41.388200Z

do you think there would be a performance difference in the above?

tatut 2020-11-12T16:32:16.388400Z

feels to me that there shouldn’t be, but I don’t really know

tatut 2020-11-12T16:32:44.388600Z

and if there is, it is likely negligible

tatut 2020-11-12T16:33:45.388800Z

but in both cases, if you have a lookup ref, wouldn’t you just use (d/pull db pattern id)instead of q?

zilti 2020-11-12T16:35:55.389200Z

I've set everything up, but all I get is

Nov 12 16:35:28 the-network java[12958]: 2020-11-12 16:35:28,337 ERROR driver.util :: Database connection error
Nov 12 16:35:28 the-network java[12958]: java.io.EOFException: SSL peer shut down incorrectly

2020-11-12T16:41:21.389300Z

you could. it's a bad example, sorry. in truth, the real code has additional where clauses, so it's a real query.

2020-11-12T17:38:25.389500Z

@favila (I'm a colleague of @ivar.refsdal) Thank you for the response. If you look at Ivar's example, you will see that the problem is that the retraction of the problematic datom that we want to excise and its benign counterpart that we want to keep - they have the same tx. This typically happens when we add a new value to an attribute with cardinality one. So - beforeT isn't expressive enough to distinguish between the retraction of the problematic value and the adding of the benign value.

favila 2020-11-12T17:41:32.389800Z

Ah I understand your problem now.

favila 2020-11-12T17:41:48.390Z

Yes, it’s not expressive enough. There’s no way to get exactly what you want.

2020-11-12T18:03:00.390200Z

@favila Thank you again. Since the first entry of the datom in the post-excision history now is a (logically invalid) retraction, we were hoping for some kind of "garbage collection" mechanisms to rescue us here, and help us get rid of the problematic value completely. Will send a question about possible workarounds to Datomic support. (Cc @jaret)

favila 2020-11-12T18:04:42.390400Z

I suspect any gc or reindex mechanisms, even if they remove the item from the index, will not remove them from the tx-log

favila 2020-11-12T18:05:02.390600Z

excision is special in that it alters tx-log entries; even noHistory doesn’t do that

Ivar Refsdal 2020-11-12T18:35:08.391Z

Thanks @favila and @ornulf.risnes I've noticed the following: retraction is about existing data, thus it does not make sense to keep [17592186045418 :m/info "secret data that should be removed" 13194139534315 false] in the history database. Ref: https://docs.datomic.com/on-prem/transactions.html#retracting-data If I do @(d/transact conn [[:db/retract "item-that-does-not-yet-exist" :m/info "secret data"]]) this will be silently discarded, which is OK, though I would prefer an exception. It does not end up in the history database. In this respect I think there is a mismatch between retract and excision, and I think the excision logic should be improved with the following: the new database history of the excised entity and attribute should never contain a retraction in the first transaction. This simple rule would solve the problem (I think!). Thanks and kind regards again.

respatialized 2020-11-12T19:05:52.394800Z

data modeling question: is there any semantics for disjoint attributes in Datomic - something like "an entity can have attribute x or attribute y, but not both"? Or is that anathema to the open composition of attributes that Datomic's data model encourages and those constraints should be left up to the application?

favila 2020-11-12T19:23:40.395200Z

I don’t speak for cognitect, but because this alters transactions which happened after the beforeT, I can see this as a semantic grey area about the meaning of excision

favila 2020-11-12T19:23:56.395400Z

it’s probably also a performance concern because many more datoms and transactions need inspection

favila 2020-11-12T19:25:03.395600Z

your rule is also too simple for cardinality-many attributes

benoit 2020-11-12T20:30:32.396500Z

You cannot express this constraint with the Datomic schema attributes but you can always enforce it with a custom database function. Whether it is a good idea from a logical perspective, I'm not sure. This looks like a sum type to me. You can also think about other ways to implement it like creating a ref attribute that points to an entity that can have the x or y attribute.