Hi. And thanks for a fine piece of software! I'm having a problem with excision and on-prem: My history database (eavto) looks like this:
[17592186045418 :m/info "secret data that should be removed" 13194139534313 true]
[17592186045418 :m/info "secret data that should be removed" 13194139534315 false]
[17592186045418 :m/info "OK data" 13194139534315 true]
Then I execute excision:
{:db/excise 17592186045418,
:db.excise/attrs [:m/info],
:db.excise/beforeT 13194139534315}
After waiting for syncing of excision, my history database looks like this:
[17592186045418 :m/info "secret data that should be removed" 13194139534315 false]
[17592186045418 :m/info "OK data" 13194139534315 true]
Thus the bad secret data is still present in the history, but only the retraction, which does not make sense in my opinion.
Is it possible to fix this? To also get rid of the retraction information?
Here is a gist that reproduces this issue:
https://gist.github.com/ivarref/f92d9efd45d1c0cbd2d239bf4904a323
Thanks and kind regards.I agree it's a grey area. I wouldn't be too concerned about performance as excision is a seldom thing, but yes I do not know the performance / implementation implications of this suggestion. Are you sure the rule is too simple? Why would the first transaction of an entity's attribute (cardinality-many) have any retractions? It's the equivalent of:
@(d/transact conn [[:db/retract "new-item" :m/many "data-1"]
[:db/retract "new-item" :m/many "data-2"]])
which does not make sense.
Or did I miss something?what I mean is that the retracts may be spread throughout the transaction history after the excision time. You need to know what values used-to-be asserted at moment T, and you need to look for the first retraction or assertion of any of those values forward in time. For cardinality-many, there won’t be just one transaction. You can terminate early if all values are accounted for, not on the first transaction
in the worst-case, a value is never retracted later, so you scan all of time
BeforeT is not inclusive
The history items that remain have a tx == your beforeT argument to the excision
Hey guys! Anyone else having an issue when using expound
and datomic dev-local's (d/db conn)
value in specs (either directly, or using guardrails/ghostwheel
which wraps expound)?
We filed an issue on it here for reference: https://github.com/bhb/expound/issues/205
these are logically equivalent. are they equivalent from a perf standpoint?
(d/q '[:find ?e
:in $ ?id
:where [?e :e/id ?id]]
db id)
;; vs
(d/q '[:find ?e
:in $ ?e]
db [:e/id id])
where `[:e/id id]` is a lookup refIs there an usable tutorial somewhere on how to set up Metabase with Presto?
I don’t think they are completely equivalent, the first will return a :db/id
number and the latter will just return the lookup ref as is in the results
ah, you're right.
i should have included a pull pattern in the example
so I would think the latter should be faster as it does nothing
my question is more around whether it matters to pass in the unique identifier or the lookup ref
you can give the latter a non-existing lookup ref and it just happily returns it
(d/q '[:find (pull ?e pull-pattern)
:in $ ?id pull-pattern
:where [?e :e/id ?id]]
db id pull-pattern)
;; vs
(d/q '[:find (pull ?e pull-pattern)
:in $ ?e]
db [:e/id id] pull-pattern)
do you think there would be a performance difference in the above?
feels to me that there shouldn’t be, but I don’t really know
and if there is, it is likely negligible
but in both cases, if you have a lookup ref, wouldn’t you just use (d/pull db pattern id)
instead of q
?
I've set everything up, but all I get is
Nov 12 16:35:28 the-network java[12958]: 2020-11-12 16:35:28,337 ERROR driver.util :: Database connection error
Nov 12 16:35:28 the-network java[12958]: java.io.EOFException: SSL peer shut down incorrectly
you could. it's a bad example, sorry. in truth, the real code has additional where clauses, so it's a real query.
@favila (I'm a colleague of @ivar.refsdal) Thank you for the response. If you look at Ivar's example, you will see that the problem is that the retraction of the problematic datom that we want to excise and its benign counterpart that we want to keep - they have the same tx. This typically happens when we add a new value to an attribute with cardinality one. So - beforeT isn't expressive enough to distinguish between the retraction of the problematic value and the adding of the benign value.
Ah I understand your problem now.
Yes, it’s not expressive enough. There’s no way to get exactly what you want.
@favila Thank you again. Since the first entry of the datom in the post-excision history now is a (logically invalid) retraction, we were hoping for some kind of "garbage collection" mechanisms to rescue us here, and help us get rid of the problematic value completely. Will send a question about possible workarounds to Datomic support. (Cc @jaret)
I suspect any gc or reindex mechanisms, even if they remove the item from the index, will not remove them from the tx-log
excision is special in that it alters tx-log entries; even noHistory doesn’t do that
Thanks @favila and @ornulf.risnes
I've noticed the following:
retraction is about existing data, thus it does not make sense to keep [17592186045418 :m/info "secret data that should be removed" 13194139534315 false]
in the history database.
Ref: https://docs.datomic.com/on-prem/transactions.html#retracting-data
If I do
@(d/transact conn [[:db/retract "item-that-does-not-yet-exist" :m/info "secret data"]])
this will be silently discarded, which is OK, though I would prefer an exception. It does not end up in the history database.
In this respect I think there is a mismatch between retract and excision, and I think the excision logic should be improved with the following: the new database history of the excised entity and attribute should never contain a retraction in the first transaction. This simple rule would solve the problem (I think!).
Thanks and kind regards again.
data modeling question: is there any semantics for disjoint attributes in Datomic - something like "an entity can have attribute x
or attribute y
, but not both"? Or is that anathema to the open composition of attributes that Datomic's data model encourages and those constraints should be left up to the application?
I don’t speak for cognitect, but because this alters transactions which happened after the beforeT, I can see this as a semantic grey area about the meaning of excision
it’s probably also a performance concern because many more datoms and transactions need inspection
your rule is also too simple for cardinality-many attributes
You cannot express this constraint with the Datomic schema attributes but you can always enforce it with a custom database function.
Whether it is a good idea from a logical perspective, I'm not sure. This looks like a sum type to me.
You can also think about other ways to implement it like creating a ref attribute that points to an entity that can have the x
or y
attribute.