Just finished reading through the https://github.com/smothers/cause repo (thanks @nathan.probst) and it looks promising.
just thinking out loud here: would it make sense to build a datomic/datahike-like database using cause’s nodes instead of datoms?
I think my first answer would be that Datahike (as well as replikativ) is designed more like a set of components that can be adapted to many state distribution scenarios, e.g. replicating the full database to all frontends, providing CRDT semantics for attributes in the schema (planned) to not rely on a transactor or using a carefully designed classical database backend with read scaling and integration into other services that cannot be openly replicated down to a DataScript/SQLite setup without any transactor or internal interface. The main goal is to establish Datalog as a way to be more declarative in the programming model and then make use of these constraints to optimize and automate it better. It represents the best tradeoff for distributed systems between expressivity and static analysis that I know.
More concretely I think a good first step is to allow full replication into JS runtimes by exposing the full backend index to a P2P replication scheme like dat or over HTTP such that frontends run transparently on the local copy and only send transactions to the server. A more fine-grained variant of that would be the web after tomorrow architecture, where subsets of Datoms are streamed into the frontend databases. I think from there CRDT semantics can be established to distribute the transactor.
Ideally an application can migrate between these scenarios, i.e. it can start with the server and full replication for easy prototyping und optimal local caching and then go to the web after tomorrow architecture. Or a traditional backend can factor out a common knowledge base for all users and distribute that to all frontends.
The first step from my perspective is to get the ClojureScript in Datahike working again. The hitchhiker-tree was already ported to ClojureScript by me, but maintaining the full stack was too much for us so we focused on the JVM first.
I am also very interested in incremental view maintenance (and in general incremental computation) that has been explored in 3df. I think it will be necessary in the longer run, but it has its own restrictions and trade-offs and ideally queries can be declared by the user to be incrementally updated or to be run eagerly.
@steedman87 I am fine with keeping the discussion here. It would be cool if somebody could notify the #datahike channel when a related topic is discussed here. I will try to do so, but I am not always able to follow all the threads 🙂.
Thanks for the detailed reply @whilo! I think the vision you’ve laid out makes a lot of sense to me. I can’t speak for everyone in the channel, but I’d be keen on helping datahike realize this potential.
Getting datahike running in clojurescript sounds like a good first step, do you think that’s something an outside contributor could reasonably contribute?
Also, do you have a link where I can learn more about 3df?
As far as getting Datahike working in Clojurescript, would that require changes to the API?
@niko963’s work is providing good background for 3df: https://www.nikolasgoebel.com/, his master thesis is also providing a high-level overview and related work.
Yes, for eager evaluation of queries we need to handle asynchronous IO of JavaScript engines. So far we opted for core.async and it works ok, but it slows down the hitchhiker-tree on the JVM, so we made it optional on macroexpansion. https://github.com/replikativ/hitchhiker-tree/blob/master/src/hitchhiker/tree/utils/async.cljc
So the synchronous API of datascript will not work in ClojureScript with IndexedDB. Looking at FRP style pipelines like fulcro or pedestal it would probably be reasonable to send the queries to the local database like to a remote system, but we will have to figure out what works best, I think.
I think getting the hitchhiker-tree to run again in ClojureScript would be good first step doable by an external contributor.
The changes to Datahike are not super hard, but we need to basically wrap a lot of code with these optional core.async macros to provide compilation to both Clojure and ClojureScript. Alternatively one could build a separate implementation, but we have avoided that so far in our stack.
So for ClojureScript our core API will probably return go-channels.
But I am very open to discuss the options, I think it all depends on the larger programming models used.
So for example d/transact
and d/q
would return go-channels?
Yes. If they have to do IO eagerly (by pattern matching on index fragments that require async IO) then we need a way to synchronize. I think core.async is still the best option in Clojure-land.
We can also wrap that in a callback API to make it more convenient for outside users. Ideally we should provide JavaScript bindings also.
> it would probably be reasonable to send the queries to the local database like to a remote system, but we will have to figure out what works best, I think. Is this in contrast to keeping a replica in memory, or?
That would be one option. Another option would be to have a UI framework that handles the effects automatically, e.g. by supporting core.async or some other execution semantics. That would be much more heavy, basically creating a DSL.
Fulcro is building its own internal memory for instance.
What do you mean by handling the effects automatically?
Fulcro describes effects as mutations, I think. Basically that is a common pattern in functional programming that you describe the effects (e.g. IO) in terms of data and then pass it to a monad for instance. Pedestal and Fulcro have such event loops that allow you to send data descriptions of mutations to other systems and retrieve them as well. Maybe the most interesting work is Conal Elliot's FRP and I also know a PhD student working on a distributed version of that.