datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
alekcz 2020-04-21T07:45:22.007200Z

@konrad.kuehne @whilo question about datahike: I managed to get to get firebase as a backend. I'm noticing that all the data is stored in one kv. Is that the case in the long term?

alekcz 2020-04-21T07:45:46.007700Z

What happens when the kv pair is at capacity?

alekcz 2020-04-21T10:13:09.008400Z

@adam622 I've written a ton more data. It actually makes multiple entries.

đź‘Ť 1
adamfeldman 2020-04-21T13:47:22.038400Z

*As a thought experiment, is it possible to map Datahike’s storage needs more directly to the semantics and capabilities of the underlying storage layer?* Specifically, could a _*performant and highly-scalable*_ Datahike be implemented by directly using the features of the <https://foundationdb.github.io/fdb-record-layer/|FoundationDB (FDB) record layer>, and/or <https://cloud.google.com/datastore/docs/firestore-or-datastore|Google Cloud Firestore’s >_<https://cloud.google.com/datastore/docs/firestore-or-datastore|Datastore mode>_? I’ve spent some time exploring these possibilities. I see this has been considered already by<@UB95JRKM3> <@U1C36HC6N>: <https://github.com/replikativ/datahike/issues/21#issuecomment-486991977>. As I see it, *the potential prize is expanding the server-side use-cases for Datahike to support larger-scale applications.* Relatedly, Datahike on Cloud Firestore’s _Native mode_ could be used to keep data in-sync between millions of concurrent clients and the backend, as <@U8KKDKPG8> is no doubt thinking about. *Does Datahike already have internal interfaces that would allow one to replace Datahike’s hitchiker-tree indexes with indexes natively supported by the datastore underlying Datahike?* My thought experiment is this: Datahike uses the hitchiker-tree to create a performant storage abstraction over a simple kv storage interface. If the underlying storage has available _similarly performant_ capabilities (indexing, etc), then it is possible that Datahike would benefit from directly using the underlying storage capabilities. As I am fairly new to distributed systems and database architecture, is my thought process sound? Both FDB and Firestore in _Datastore mode_ (A) can scale up to millions of operations per second (<https://apple.github.io/foundationdb/scalability.html|FDB>, _<https://cloud.google.com/datastore/docs/firestore-or-datastore#choosing_a_database_mode|Datastore mode>_) and, at the same time, (B) provide serializable isolation over those transactions (<https://apple.github.io/foundationdb/consistency.html|FDB>, _<https://cloud.google.com/datastore/docs/concepts/transactions#isolation_and_consistency|Datastore mode>_). FDB is extremely flexible and tunable, while Cloud Firestore is a “serverless” product with low effort required for operations. My current understanding is that Datahike batches-up multiple datoms into index “segments” which are themselves stored in the value for a single key in the storage; this happens as part of the `hitchiker-tree` layers in Datahike’s implementation. If my recent database-implementation learnings are correct, I believe this is a common strategy in datastore implementations: batching reads and writes amortizes network, disk, and other overhead across the batch of data.

adamfeldman 2020-04-21T13:49:56.039400Z

tl;dr: implementing Datahike as a stateless layer over FoundationDB and/or Cloud Firestore could result in a highly-scalable ACID-compliant datastore, and making it work would likely require Datahike to be more tightly coupled to the specific, performant capabilities of the underlying datastore

kkuehne 2020-04-21T16:24:02.040Z

Thanks for the input @adam622. Let me think about that properly and then I'll get back to you.

adamfeldman 2020-04-21T16:24:43.040200Z

Thanks @konrad.kuehne…sorry for the essay, that’s been building up in my head for months 🙂

kkuehne 2020-04-21T16:25:59.040400Z

Great, I'm always happy when people think about new ideas in Datahike. 🙂

❤️ 1