datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
whilo 2020-04-23T12:05:41.055Z

@alekcz360 Regarding adding new indices you only need to implement these protocols: https://github.com/replikativ/datahike/blob/master/src/datahike/index.cljc#L64 and the multimethods for empty-index and init-index. We have an outdated FDB prototype here https://gitlab.com/grischoun/datahike-fdb that we have not yet plugged into these protocols. We totally agree that having more scalable indices is super interesting and our core motivation for working on Datahike is brining all these stores together in one query engine. We have focused on factoring the core of Datahike and provide backends with the hitchhiker tree first because that provides Clojure's memory model on durable storage. The FDB indices have different isolation levels and are mutable, so they behave significantly different to immutable snapshots and will not compose in that sense. As you point out the benefits in terms of write throughput are still definitely worth it, so I totally agree with the FDB value proposition and we would love to make it happen. We still have a few other things with higher priority at the moment unfortunately, mostly releasing a network interface to our transactor, support for garbage collection (with the hitchhiker tree), and an interactive editable web interface in the spirit of https://airtable.com/ (but probably much simpler for now). We are able to re-prioritize depending on impact though. Could you help us get a prototype running in the FDB direction?

adamfeldman 2020-04-26T18:02:57.069700Z

Thank you for sharing the prototype!!

adamfeldman 2020-04-26T18:11:21.069900Z

Thank you for the useful background info! It’s definitely interesting to bring all these backends together in one query engine. I will take some time to learn, research, and think about your words regarding the indices. I am quite early in working to understand immutable structures and the Hitchhiker Tree. I’m interested in how you are approaching the design of the transactor’s network interface. I’ve spent some time looking at Crux and their unbundled architecture. It’s funny you mention Airtable…I am in the earliest of stages building a company and product inspired by Airtable and https://malleable.systems (among other influences).

adamfeldman 2020-04-26T18:13:34.070100Z

(Please don’t feel obligated to respond here to all of my points above. I am watching the Datahike repo issues, and if, for example, you choose to share design info on the transactor, that seems the better forum.)

adamfeldman 2020-04-26T18:27:50.070500Z

With regard to me doing an FDB or other backend prototype, I’m definitely interested! I believe I should continue working to understand Datahike’s internals as they currently exist before moving forward. (I also need to figure out other issues related to what storage backend is most useful to my product strategy 😄)

whilo 2020-04-29T18:49:04.091500Z

The internals can be taken apart in fairly simple primitives, but we need to make this more obvious. Compositionality with clean abstractions is one of our core objectives, Datomic did this very well also. The indices are pattern matched against here: https://github.com/replikativ/datahike/blob/master/src/datahike/db.cljc#L158. All they need to do is provide an (efficient) iterator over triples in EAVT, AVET, AEVT form. Since we want to have efficient persistent indices, we use the hitchhiker-tree logic and semantics over multiple durable storage backends which are fairly simple blob stores (konserve provides a lense-like interface in general though). So for persistent semantics one should implement a konserve backend while for more fine-grained integration into another store (probably without persistent semantics) directly implementing the index interface is the way to go. We then also provide a (new) configuration data-DSL by @timo, which is mapped onto the stores here: https://github.com/replikativ/datahike/blob/master/src/datahike/store.cljc.