datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
magra 2020-02-14T09:47:59.100500Z

This quick and dirty hack seems to solve it for now:

#!/bin/sh 

if [ -z ${1} ] || [ -z ${2} ]
 then
  echo "sort schema of datahike export to top
usage: infile outfile"
  exit 1
fi

if [ -f ${1} ]
 then
 :
else 
  echo "infile ${1} not found"
  exit 1
fi

fgrep db/ident ${1} | cut -d' ' -f1,2 | sed 's/$/ /g' >/tmp/schema-to-top.tmp

fgrep -f /tmp/schema-to-top.tmp ${1} > ${2}
fgrep -vf /tmp/schema-to-top.tmp ${1} >> ${2}

cjsauer 2020-02-14T14:08:06.108600Z

If I understand correctly, it seems that the export should happen in transaction order (ie sorted by tx in [e a v tx op]. However, I see that the export code is using the :eavt index to dump the file: https://github.com/replikativ/datahike/blob/73438a612205e27ff712e35f3d61c56c2978f0da/src/datahike/migrate.clj#L12

magra 2020-02-14T14:17:52.109100Z

You are right!

kkuehne 2020-02-14T14:26:37.117800Z

Hi @magra, I'm working on a more sophisticated migration mechanism that tries to avoid the clashes of IDs. I had to do this for data transfer between Datomic and Datahike.

magra 2020-02-14T14:29:14.119600Z

I am getting seeing a bug I can't narrow down: I use schema

#datahike/Datom [12 :db/cardinality :db.cardinality/many 536870913 true]
#datahike/Datom [12 :db/ident :legacy/id 536870913 true]
#datahike/Datom [12 :db/index true 536870913 true]
#datahike/Datom [12 :db/unique :db.unique/identity 536870913 true]
#datahike/Datom [12 :db/valueType :db.type/uuid 536870913 true] 
for a datahike db into which I pulled sql csv file data. To do bulk-transforms of this data I use a query to get a list of entity-ids, then do transformations on them pulling them by look-up ref [:legacy/id xxx]. About 100 of them work fine then one responds:
Execution error (ExceptionInfo) at datahike.db/entid-strict (db.cljc:900).
Nothing found for entity id [:legacy/id #uuid "80eab21d-93a8-415d-94b4-fc8beb6979d1"]
When I query on the entity-id it shows the :legacy/id, when I query on the entity-id all is well. When I query on the lookup-ref I get the error. But hundreds of similar datoms work fine and behave as expected. On one db the error went away after export and reimport. On the current one the error stays. But every try to shrink the import-file to get a minimal reproducible error kills the error. Any hints how to debug this? The db has 112000 datoms.

magra 2020-02-14T14:29:53.120400Z

@konrad.kuehne Great!! I am looking forward to that. And I have a workaround till then 😉

kkuehne 2020-02-14T14:30:23.121Z

I'm also working on SQL import but this is a little bit more complicated.

magra 2020-02-14T14:31:38.121500Z

Great!!

whilo 2020-02-14T18:01:01.122700Z

@magra Can you open an issue with an example to reproduce?

magra 2020-02-14T18:50:50.125Z

@whilo I will try. At the moment I it seems to be reproducible with 120000 Datoms of people related data. anonymizing kills reproducibility at the moment.

whilo 2020-02-14T19:02:30.125300Z

Can you shrink the size first maybe?

magra 2020-02-14T19:26:56.129600Z

I will keep on trying tomorrow.