This quick and dirty hack seems to solve it for now:
#!/bin/sh
if [ -z ${1} ] || [ -z ${2} ]
then
echo "sort schema of datahike export to top
usage: infile outfile"
exit 1
fi
if [ -f ${1} ]
then
:
else
echo "infile ${1} not found"
exit 1
fi
fgrep db/ident ${1} | cut -d' ' -f1,2 | sed 's/$/ /g' >/tmp/schema-to-top.tmp
fgrep -f /tmp/schema-to-top.tmp ${1} > ${2}
fgrep -vf /tmp/schema-to-top.tmp ${1} >> ${2}
If I understand correctly, it seems that the export should happen in transaction order (ie sorted by tx in [e a v tx op]
. However, I see that the export code is using the :eavt
index to dump the file: https://github.com/replikativ/datahike/blob/73438a612205e27ff712e35f3d61c56c2978f0da/src/datahike/migrate.clj#L12
You are right!
Hi @magra, I'm working on a more sophisticated migration mechanism that tries to avoid the clashes of IDs. I had to do this for data transfer between Datomic and Datahike.
I am getting seeing a bug I can't narrow down: I use schema
#datahike/Datom [12 :db/cardinality :db.cardinality/many 536870913 true]
#datahike/Datom [12 :db/ident :legacy/id 536870913 true]
#datahike/Datom [12 :db/index true 536870913 true]
#datahike/Datom [12 :db/unique :db.unique/identity 536870913 true]
#datahike/Datom [12 :db/valueType :db.type/uuid 536870913 true]
for a datahike db into which I pulled sql csv file data.
To do bulk-transforms of this data I use a query to get a list of entity-ids, then do transformations on them pulling them by look-up ref [:legacy/id xxx]
. About 100 of them work fine then one responds:
Execution error (ExceptionInfo) at datahike.db/entid-strict (db.cljc:900).
Nothing found for entity id [:legacy/id #uuid "80eab21d-93a8-415d-94b4-fc8beb6979d1"]
When I query on the entity-id it shows the :legacy/id, when I query on the entity-id all is well. When I query on the lookup-ref I get the error.
But hundreds of similar datoms work fine and behave as expected. On one db the error went away after export and reimport. On the current one the error stays. But every try to shrink the import-file to get a minimal reproducible error kills the error.
Any hints how to debug this? The db has 112000 datoms.@konrad.kuehne Great!! I am looking forward to that. And I have a workaround till then 😉
I'm also working on SQL import but this is a little bit more complicated.
Great!!
@magra Can you open an issue with an example to reproduce?
@whilo I will try. At the moment I it seems to be reproducible with 120000 Datoms of people related data. anonymizing kills reproducibility at the moment.
Can you shrink the size first maybe?
I will keep on trying tomorrow.