datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
2020-11-12T12:00:03.061900Z

@whilo Yes, I’m talking about datoms submitted to transact. When I say “Are there any plans to implement a live in memory index that accumulates novelty and periodically flushes to storage similar to datomic?” I am asking about the process that Rich Hickey describes here: https://youtu.be/Cym4TZwTCNU?t=2109 Summary from video: Since we’re using immutable segments “maintaining sort live in storage” is inefficient and also creates a lot of garbage in storage, requiring a lot of storage GC. The datomic solution is to ensure durability by keeping a tx log and accumulate novelty in a live in memory index. Note, this “live in memory index” only contains the novelty since the last flush to storage - that is all the txs that came in since the last flush. It is separate from the index segments that are pulled in from storage. To satisfy queries the “live in memory index” must be combined with index segments to provide the current db value. When the live in memory index grows to X megabytes, a process runs to integrate the live in memory index with the existing index segments in storage. Again, this is for efficiency - to avoid “maintaining sort live in storage.” Seems like in datahike the hitchhiker-tree buffer(s) on the root node(s) serves as the “live in memory index” described above. The key issue I’m trying to understand is this: While the hitchhiker-tree is accumulating novelty in the buffers, how is durability ensured? I was under the impression that there is no tx log, which means at least the root node of the hitchhiker-tree would have to be written to storage on each tx?