datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
whilo 2020-11-14T06:26:46.070600Z

@geodrome Yes, the root node contains a part of the transaction log in the sense that Rich Hickey in the video you linked describes when he points out that it is already logged to storage. I am not sure whether Datomic uses also a fractal data structure, it does not sound like it. We definitely do not have the problem that we create new paths on the tree for every chunk of data that is transacted in the way Rich describes and we therefore also do not create a lot of garbage. You can also think of us doing what he says incrementally automatically through the fractal tree, I think. Our write throughput with our latest upsert branch is not as high as Datomic yet, but it would be super interesting to explore how we compare by just using the hitchhiker-tree with a distributed key-value store, e.g. postgres or redis, and streaming information about new root nodes whenever they happen or just pull out the root whenever each peer needs it. If this is not efficient enough we can easily tweak the hitchhiker-tree or introduce another level of buffering along the lines of Datomic.

whilo 2020-11-14T06:28:27.071400Z

@konrad.kuehne probably can help us with integrating it into datahike-server.

whilo 2020-11-14T06:33:53.072300Z

For the "pull" approach we only need to sidestep the cache of konserve to load the value from the distributed key value store when we query on a peer.

2020-11-14T10:35:43.075400Z

@whilo Thank you for all the clarifications. You say “Yes, the root node contains a part of the transaction log in the sense that Rich Hickey in the video you linked describes when he points out that it is already logged to storage.” Perhaps I am misunderstanding something, but the root node accumulates the novelty in memory, doesn’t it? How do you ensure durability of every transaction? Don’t you have to write out this novelty to storage somehow? Either as a separate tx log in storage or by writing the root node itself to storage on every tx?

whilo 2020-11-14T19:44:51.076Z

Yes, we write the root node after every transaction. It contains the current transaction log in the sense that I described.

whilo 2020-11-14T19:47:32.078100Z

@geodrome We could also use a separate transaction log data structure, but the fractal tree is also a transaction log, by bounding the number of Datoms in the log of each node, in our case currently ~300, we only write a constant size tree fragement for most transactions and on average we need to do log(n) write operations for n Datoms.

whilo 2020-11-14T20:23:23.079Z

Where our logarithm scales much better than normal balanced trees.