datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
magra 2020-02-15T14:54:00.135600Z

@whilo Ok, I got something. First I have to correct myself. It was not 120000 datoms but 120000 entities. I do now have an export-file of 10000 lines, 880K unzipped. It has data like email-addresses removed and I have a really small programm that produces 13 Errors out of 2358 lookup-refs on my machine. I am reluctant to put the db on github but am willing to share it with you privately. I can put the testcode on github. I can, of course try to make it smaller still but that would reduce the number of errors further. If you prefer more errors to look for patterns then I should not shrink it down any further.

magra 2020-02-15T15:01:58.136600Z

I get a set of all :legacy/id's with a query and then pull every single one of them by lookup-ref:

magra 2020-02-15T15:02:58.136900Z

(d/create-database "datahike:file:///..path..anonymized-10000")

  (def conn-anon-10 (d/connect "datahike:file:///..path..anonymized-10000"))
  (def counter (atom 0))
  (def error-count (atom 0))

  (m/import-db conn-anon-10 "anonymized-10000")

  (defn test-pull [conn lid]
    (swap! counter inc)
    (try
      (d/pull @conn '[*] [:legacy/id (first lid)])
      (catch Exception e (do
                           (swap! error-count inc)
                           (println (str @counter " " @error-count " "(.getMessage e) "for" (second lid)))))))
  
  (def uuids-anon-10 (d/q '[:find ?lid ?e
                            :where
                            [?e :legacy/id ?lid]]
                          @conn-anon-10))

  (count uuids-anon-10) => 2358

  (run! (partial test-pull conn-anon-10) uuids-anon-10)
  

magra 2020-02-15T15:52:04.137600Z

hmmm. I might be able to get something way simpler.

magra 2020-02-15T16:24:08.138200Z

I think I have a minimal case (at least on my machine) here: https://github.com/markusalbertgraf/datahike-uuid-lookup

👀 1
magra 2020-02-15T16:24:35.138700Z

I have also opened issue #122 on github.