I made another attempt at storing datahike in DynamoDB (basically, it stores the tree roots here) and S3 (everything else). I’m running a large test against real DynamoDB tables and S3 buckets, and it seems to be working fairly well. The datahike branch: https://github.com/csm/datahike/tree/aws and konserve backend: https://github.com/csm/konserve-ddb-s3 . This is using master of https://github.com/replikativ/hitchhiker-tree
I get about 1-1.5s per transaction (unless AWS throttles dynamodb); it seems like the transaction time is dominated by updating the in-memory tree, and not the AWS calls
one significant change I made to datahike was to always store the tree roots as index nodes, so data nodes are wrapped in an index if they’re the root. The reasoning being Dynamo isn’t too great for very large items, and I thought S3 storage would be best if large segments were used
@csm301 cool! how big are your transactions? that updating the in-memory tree cost dominates should only happen if you transact around a thousand datoms.
this is only a problem for a root node that has not yet overflowed. is it a problem in these rare cases?
Yep, only an issue when the trees have only a data node as the root
The transactions are likely in the range of 1K Datoms each. I don’t actually know, since this is a preprocessed dataset (it is derived from the musicbrainz dataset).
@whilo @konrad.kuehne I'm having trouble requiring both datascript and datahike because they have conflict data-reader mappings. Have you come across this and/or know of a nice solution?
I had the issue with Datomic :db/id and I have to figure how this could work with Datahike.
Some time ago I read something on GitHub about replicating datahike with dat protocol, but I can’t find it now. Anyone know where that lives?
@steedman87 do you mean this? https://github.com/kordano/kordano.github.io/blob/master/_drafts/database_replication_with_dat.md
I started rewriting this post. The latest version can be found here and will be released next week. https://github.com/lambdaforge/lambdaforge.github.io/blob/dat-post/_drafts/database_replication_with_dat.md
And code can be found here: https://github.com/kordano/datahike-sync
thanks for the update!
@sogaiu that’s it, thanks!