@csm301 very nice 🙂. one thing that i have not done for the hitchhiker-tree is to flush the dirty segments to disk in parallel before returning the root node. right now they are written in sequence, which is unnecessary. https://github.com/replikativ/hitchhiker-tree/blob/master/src/hitchhiker/tree.cljc#L484
as you have pointed out, only the root needs to be written atomically, all the other nodes can be written in parallel.
this requires two phases: 1. walk the tree and trigger all write operations and 2. walk the tree again and wait for each write operation.
for core.async that is easy to do, unfortunately core.async is a bit slow so we compile (macro-expand) it away on the JVM. but we could decide not to compile it away here, i.e. use it directly in the code and do the two phases.
maybe it is easier to use futures though, i would have to think about it a bit more.
Konserve returns channels already, though, so it’s not a stretch to think about firing off writes and then collecting the results
I’m surprised to hear core.async being a bottleneck itself though
it is when the hitchhiker-tree is used in memory. especially if we use the go-try variant. seems exception handling + dispatching to the thread pool is adding a cost somewhere around one magnitude. @mpenet had also done some experiments and came to this conclusion.
we do not use the hitchhiker-tree for in-memory databases anymore though, but @tonsky’s in-memory indices: https://github.com/tonsky/persistent-sorted-set
so for datahike it would not matter too much anymore.
unless we use redis, hmm.
Yeah after writing that it occurred to me why that could be a problem
it is pretty cool that you could import a million datoms in half an hour without doing a lot of tweaking. i think there is still a lot of room though. the first is to write as much as possible in parallel.
Yeah, I was getting discouraged, but that last test went well
what was discouraging?
It does sorely need a GC though— I spent a good hour cleaning up unused nodes
fair point
Oh I wasn’t getting the import to work well until that point
the consistent key function trick for konserve is neat.