@whilo Ok, I got something. First I have to correct myself. It was not 120000 datoms but 120000 entities. I do now have an export-file of 10000 lines, 880K unzipped. It has data like email-addresses removed and I have a really small programm that produces 13 Errors out of 2358 lookup-refs on my machine. I am reluctant to put the db on github but am willing to share it with you privately. I can put the testcode on github. I can, of course try to make it smaller still but that would reduce the number of errors further. If you prefer more errors to look for patterns then I should not shrink it down any further.
I get a set of all :legacy/id's with a query and then pull every single one of them by lookup-ref:
(d/create-database "datahike:file:///..path..anonymized-10000")
(def conn-anon-10 (d/connect "datahike:file:///..path..anonymized-10000"))
(def counter (atom 0))
(def error-count (atom 0))
(m/import-db conn-anon-10 "anonymized-10000")
(defn test-pull [conn lid]
(swap! counter inc)
(try
(d/pull @conn '[*] [:legacy/id (first lid)])
(catch Exception e (do
(swap! error-count inc)
(println (str @counter " " @error-count " "(.getMessage e) "for" (second lid)))))))
(def uuids-anon-10 (d/q '[:find ?lid ?e
:where
[?e :legacy/id ?lid]]
@conn-anon-10))
(count uuids-anon-10) => 2358
(run! (partial test-pull conn-anon-10) uuids-anon-10)
hmmm. I might be able to get something way simpler.
I think I have a minimal case (at least on my machine) here: https://github.com/markusalbertgraf/datahike-uuid-lookup
I have also opened issue #122 on github.