local-first-clojure

knubie 2020-04-29T01:08:56.007200Z

Just finished reading through the https://github.com/smothers/cause repo (thanks @nathan.probst) and it looks promising.

knubie 2020-04-29T01:09:26.008Z

just thinking out loud here: would it make sense to build a datomic/datahike-like database using cause’s nodes instead of datoms?

whilo 2020-04-29T19:31:36.012500Z

I think my first answer would be that Datahike (as well as replikativ) is designed more like a set of components that can be adapted to many state distribution scenarios, e.g. replicating the full database to all frontends, providing CRDT semantics for attributes in the schema (planned) to not rely on a transactor or using a carefully designed classical database backend with read scaling and integration into other services that cannot be openly replicated down to a DataScript/SQLite setup without any transactor or internal interface. The main goal is to establish Datalog as a way to be more declarative in the programming model and then make use of these constraints to optimize and automate it better. It represents the best tradeoff for distributed systems between expressivity and static analysis that I know.

whilo 2020-04-29T19:34:49.015300Z

More concretely I think a good first step is to allow full replication into JS runtimes by exposing the full backend index to a P2P replication scheme like dat or over HTTP such that frontends run transparently on the local copy and only send transactions to the server. A more fine-grained variant of that would be the web after tomorrow architecture, where subsets of Datoms are streamed into the frontend databases. I think from there CRDT semantics can be established to distribute the transactor.

whilo 2020-04-29T19:36:48.017Z

Ideally an application can migrate between these scenarios, i.e. it can start with the server and full replication for easy prototyping und optimal local caching and then go to the web after tomorrow architecture. Or a traditional backend can factor out a common knowledge base for all users and distribute that to all frontends.

whilo 2020-04-29T19:38:07.018100Z

The first step from my perspective is to get the ClojureScript in Datahike working again. The hitchhiker-tree was already ported to ClojureScript by me, but maintaining the full stack was too much for us so we focused on the JVM first.

whilo 2020-04-29T19:40:03.019600Z

I am also very interested in incremental view maintenance (and in general incremental computation) that has been explored in 3df. I think it will be necessary in the longer run, but it has its own restrictions and trade-offs and ideally queries can be declared by the user to be incrementally updated or to be run eagerly.

whilo 2020-04-29T19:41:28.020700Z

@steedman87 I am fine with keeping the discussion here. It would be cool if somebody could notify the #datahike channel when a related topic is discussed here. I will try to do so, but I am not always able to follow all the threads 🙂.

knubie 2020-04-29T19:42:44.021600Z

Thanks for the detailed reply @whilo! I think the vision you’ve laid out makes a lot of sense to me. I can’t speak for everyone in the channel, but I’d be keen on helping datahike realize this potential.

knubie 2020-04-29T19:43:17.022200Z

Getting datahike running in clojurescript sounds like a good first step, do you think that’s something an outside contributor could reasonably contribute?

knubie 2020-04-29T19:43:48.022500Z

Also, do you have a link where I can learn more about 3df?

knubie 2020-04-29T19:45:01.023200Z

As far as getting Datahike working in Clojurescript, would that require changes to the API?

whilo 2020-04-29T19:46:18.023900Z

@niko963’s work is providing good background for 3df: https://www.nikolasgoebel.com/, his master thesis is also providing a high-level overview and related work.

whilo 2020-04-29T19:48:12.025300Z

Yes, for eager evaluation of queries we need to handle asynchronous IO of JavaScript engines. So far we opted for core.async and it works ok, but it slows down the hitchhiker-tree on the JVM, so we made it optional on macroexpansion. https://github.com/replikativ/hitchhiker-tree/blob/master/src/hitchhiker/tree/utils/async.cljc

whilo 2020-04-29T19:49:45.027100Z

So the synchronous API of datascript will not work in ClojureScript with IndexedDB. Looking at FRP style pipelines like fulcro or pedestal it would probably be reasonable to send the queries to the local database like to a remote system, but we will have to figure out what works best, I think.

whilo 2020-04-29T19:50:29.027600Z

I think getting the hitchhiker-tree to run again in ClojureScript would be good first step doable by an external contributor.

whilo 2020-04-29T19:51:37.028900Z

The changes to Datahike are not super hard, but we need to basically wrap a lot of code with these optional core.async macros to provide compilation to both Clojure and ClojureScript. Alternatively one could build a separate implementation, but we have avoided that so far in our stack.

whilo 2020-04-29T19:53:14.029600Z

So for ClojureScript our core API will probably return go-channels.

whilo 2020-04-29T19:53:38.030200Z

But I am very open to discuss the options, I think it all depends on the larger programming models used.

knubie 2020-04-29T19:54:24.031Z

So for example d/transact and d/q would return go-channels?

whilo 2020-04-29T19:55:16.031900Z

Yes. If they have to do IO eagerly (by pattern matching on index fragments that require async IO) then we need a way to synchronize. I think core.async is still the best option in Clojure-land.

whilo 2020-04-29T19:56:21.032700Z

We can also wrap that in a callback API to make it more convenient for outside users. Ideally we should provide JavaScript bindings also.

knubie 2020-04-29T20:03:33.033300Z

> it would probably be reasonable to send the queries to the local database like to a remote system, but we will have to figure out what works best, I think. Is this in contrast to keeping a replica in memory, or?

whilo 2020-04-29T20:04:50.034300Z

That would be one option. Another option would be to have a UI framework that handles the effects automatically, e.g. by supporting core.async or some other execution semantics. That would be much more heavy, basically creating a DSL.

whilo 2020-04-29T20:05:05.034600Z

Fulcro is building its own internal memory for instance.

knubie 2020-04-29T20:06:18.034900Z

What do you mean by handling the effects automatically?

whilo 2020-04-29T21:52:20.037100Z

Fulcro describes effects as mutations, I think. Basically that is a common pattern in functional programming that you describe the effects (e.g. IO) in terms of data and then pass it to a monad for instance. Pedestal and Fulcro have such event loops that allow you to send data descriptions of mutations to other systems and retrieve them as well. Maybe the most interesting work is Conal Elliot's FRP and I also know a PhD student working on a distributed version of that.