datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
csm 2019-11-09T00:46:18.134Z

I made another attempt at storing datahike in DynamoDB (basically, it stores the tree roots here) and S3 (everything else). I’m running a large test against real DynamoDB tables and S3 buckets, and it seems to be working fairly well. The datahike branch: https://github.com/csm/datahike/tree/aws and konserve backend: https://github.com/csm/konserve-ddb-s3 . This is using master of https://github.com/replikativ/hitchhiker-tree

csm 2019-11-09T00:47:17.135100Z

I get about 1-1.5s per transaction (unless AWS throttles dynamodb); it seems like the transaction time is dominated by updating the in-memory tree, and not the AWS calls

csm 2019-11-09T00:49:38.136600Z

one significant change I made to datahike was to always store the tree roots as index nodes, so data nodes are wrapped in an index if they’re the root. The reasoning being Dynamo isn’t too great for very large items, and I thought S3 storage would be best if large segments were used

whilo 2019-11-09T06:19:44.138600Z

@csm301 cool! how big are your transactions? that updating the in-memory tree cost dominates should only happen if you transact around a thousand datoms.

whilo 2019-11-09T06:20:48.138700Z

this is only a problem for a root node that has not yet overflowed. is it a problem in these rare cases?

csm 2019-11-09T07:08:02.139800Z

Yep, only an issue when the trees have only a data node as the root

csm 2019-11-09T07:10:14.142600Z

The transactions are likely in the range of 1K Datoms each. I don’t actually know, since this is a preprocessed dataset (it is derived from the musicbrainz dataset).

2019-11-09T07:18:14.144Z

@whilo @konrad.kuehne I'm having trouble requiring both datascript and datahike because they have conflict data-reader mappings. Have you come across this and/or know of a nice solution?

kkuehne 2019-11-10T18:18:46.149600Z

I had the issue with Datomic :db/id and I have to figure how this could work with Datahike.

knubie 2019-11-09T19:38:47.145900Z

Some time ago I read something on GitHub about replicating datahike with dat protocol, but I can’t find it now. Anyone know where that lives?

kkuehne 2019-11-10T18:13:55.148700Z

I started rewriting this post. The latest version can be found here and will be released next week. https://github.com/lambdaforge/lambdaforge.github.io/blob/dat-post/_drafts/database_replication_with_dat.md

kkuehne 2019-11-10T18:15:30.149Z

And code can be found here: https://github.com/kordano/datahike-sync

2019-11-10T18:41:30.149800Z

thanks for the update!

knubie 2019-11-09T21:11:19.146800Z

@sogaiu that’s it, thanks!

👍 1