datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
feelextra 2019-05-20T12:03:02.170600Z

I've been trying to figure out a performant way to persist incremental updates initiated by a client running a DataScript instance to a remote server. In my case every user account has a different DataScript db that is completely separate from other users db's. My best idea so far has been to : 1. When the connection is first initiated with a user, spin up per's DataScript instance on the server. 2. At this point, the server may hold any number of DataScript instances at the same time depending on how many users are connected. 3. Maintain a long-lasting connection via WebSockets in order to determine when the client will not be sending updates anymore, i.e. when the DataScript instance can be persisted to disk and then discarded from memory. Any thoughts or improvements on this are welcome 👀

devurandom 2019-05-20T12:58:25.171500Z

@d4hines You were right. I produced a minimal test case that shows that [?e2] does not add ?e2 into the result set.

devurandom 2019-05-20T13:01:54.172400Z

Also the [?a ?b] does not set ?a to be the same element(s) as ?b (i.e. rename). I think I saw that somewhere and thought it's handy. Sadly it does not work. 😞

2019-05-20T13:04:56.174400Z

Thanks for reporting back 😄 In regards to [?a ?b], in what scenarios do you think that kind of one-to-one renaming would be useful? I think there’s a function that can do that (or else, I think you easily pass one in)

2019-05-20T13:05:53.174500Z

You may want to at least checkout https://github.com/metasoarous/datsync

2019-05-20T13:06:58.174800Z

It’s focused on Datomic for backend persistence, but I think the idea is that you can swap out different implementations (such as Datahike, or a custom, persisted datascript), without too much difficulty.

devurandom 2019-05-20T13:19:46.176600Z

@d4hines In scenarios like the above, where I want ?e to be the union of ?x and ?y. Or in the (subproject-or-self ?sub ?top) scenario, where I want ?sub to also contain ?top.

2019-05-20T13:21:00.177400Z

Hmm. There are definitely ways of working with sets/vectors in Datascript and Datomic, but that’s beyond my understanding.

feelextra 2019-05-20T13:21:02.177500Z

@d4hines thanks for addressing this question 🙂 I'm not sure I understand DatSync approach. I have watched the Clojure/West 2016 talk by Christopher Small about it but still not sure I got the small details right. 1. In DatSync model clients send transactions to a "master" Datomic db on the server before committing the changes to the local DataScript db.. 2. Server's Datomic executes the transactions coming from clients, and sends back the transactions it executed (by order of transaction, which Datomic can do with the log it maintains) to clients via pub/sub. 3. Clients receives transactions and execute them locally on the client. I assume all queries are run on the clients local db without needing to involve the "main" server db.

devurandom 2019-05-20T13:21:51.177900Z

Is not every ?var a set?

devurandom 2019-05-20T13:22:05.178200Z

i.e. the set of :db/id that it was bound to?

feelextra 2019-05-20T13:23:04.179200Z

that said, queries might have to be delayed before being executed when pending transactions that have been sent to the server haven't been transacted locally yet.

2019-05-20T13:24:45.179400Z

I would think you could employ optimistic updates at that point. “Eventual consistency” and all that. But I’m not sure. Is optimistic update not a hard enough line for you?

2019-05-20T13:25:23.179600Z

Also, wouldn’t you encounter this issue in any remote setup?

devurandom 2019-05-20T13:26:42.180200Z

Interesting... I am able to make DataScript select a nil entity...

2019-05-20T13:36:17.180800Z

I’ve definitely seen some stuff that drops down to Clojure code and manipulates vectors directly.

feelextra 2019-05-20T13:38:14.181200Z

@d4hines not sure what you're saying. 1. Optimistic updates can be implemented to eliminate need for delaying queries that haven't been transacted yet, but will require maintainig transaction log on the browser which might be too costly for my application. 2. Some remote setups are easier depending on the stack. a synchronization solution for document stores can be easy when choosing CouchDB/PouchDB stack or Meteor stack. but not necessarily be easy without that stack.

feelextra 2019-05-20T13:39:55.181400Z

Thanks for the help. i'm still not sure what DatSync is capable of.

feelextra 2019-05-20T13:49:36.181700Z

also just to take note of something I just read on DatSync wiki: > Data scoping mechanisms: Currently, assumption is that we sync the whole db This approach is a tradeoff: - Instead of having to spin up DataScript instances in order to execute transactions coming from the client, just replace the entire db every time, meaning that you remove limitations of memory space on the server, allowing for cheaper scaling as number of users grows. - On the expense of requests from the client to server (and back afterwards) with big payload, i.e. the entire db. this implies more resources are needed on the clients to send and read incoming responses which is expensive especially on battery life (mobile clients, laptops).

2019-05-20T14:05:35.181900Z

Points taken. Worth noting though is seen as a large limitation and active research topic. A number of folks have high hopes for Differential Dataflow solving this problem.

feelextra 2019-05-20T14:42:20.182100Z

@d4hines thank you for the great discussion 🙂 interesting reference to Differential Dataflow by the way. not the first time I heara bout it, but at least to know that it is an active research topic gave me some context.

feelextra 2019-05-20T18:42:10.184500Z

Is there a way to serialize DataScript to EDN using the JavaScript API? if not, shouldn't it be possible to do so by exporting a function that invokes ClojureScript's prn-str for that?

2019-05-20T18:45:55.184600Z

Hi @icyiceling184. A few things: • Datsync server sends just the datoms produced by the transaction to the clients, not the original tx itself. This lets you use transaction functions and whatnot. • You could do optimistic updates without a full transaction log; tx data locally, but keep a reference to previous db state, and only drop that reference once the tx comes back from the server. Have to get more clever though if you want transactions submitted in the interim to be handled properly. • I've been thinking about query scopes and subscriptions again lately, and we may be putting out some api for this soon. • Queries don't have to be delayed in the current model if you're using "reactive" queries (posh or differential datalog or whatever), as the queries will just update once the new data comes in.

2019-05-20T18:47:52.186700Z

@icyiceling184 You may want to look at datascript-transit, which lets you easily serialize the datascript database. You can also just serialize all the datoms, but then you have to rehydrate the database from that yourself using d/init-db, which takes in datoms without running them through all the normal transaction machinery/overhead.

2019-05-20T18:48:18.187300Z

I just implemented this for a project, and hope to put it into more automated subscription based bootstraps for datsync in the near future.

feelextra 2019-05-20T19:28:15.187400Z

@metasoarous It's inspiring to see your dedication to this project, still sticking around after having been involved with it for so long at this point. Really appreciate you taking the time to respond at such length. Having people like you around to collaborate with is one of the biggest reasons to participate in the Clojure ecosystem in my point of view. I really want to use DataScript for more than just reading data, which is all an application of mine is doing at this point. Trying to grasp your points to get a complete picture: >- Datsync server sends just the datoms produced by the transaction to the clients, not the original tx itself. This lets you use transaction functions and whatnot. I'm confused about the last part (transaction functions), but as for the first part, do you mean that only the :tx-data part are sent from the server to client, and not the entire #datascript.db.TxReport map? (which includes both :previous-db, :next-db, :tx-data and some other stuff) >- You could do optimistic updates without a full transaction log; tx data locally, but keep a reference to previous db state, and only drop that reference once the tx comes back from the server. Have to get more clever though if you want transactions submitted in the interim to be handled properly. Hmm interesting. so what you describe is a solution for having a temporary mode after sending a transaction before it is approved by the server. In order to keep things simple one could have this mode be blocking on additional transactions until the first one was approved. Not bad but overall not ideal. It makes it possible to query the new state of the db locally after the transaction immediately. >- I've been thinking about query scopes and subscriptions again lately, and we may be putting out some api for this soon. Cool! >- Queries don't have to be delayed in the current model if you're using "reactive" queries (posh or differential datalog or whatever), as the queries will just update once the new data comes in. Yea well not sure about differential datalog yet. I assume you refer to the work done by Nikolas Gobel particularly clj-3df, which I find fascinating. Cool stuff! I hope to be able to look more into it sometime, but I think it's still evolving fast and really cutting edge technology at this point, so I would rather give it time to stabilize. Regarding Posh, I think it's ok for some simple queries use cases. Unfortunately for my case I need to use recursive queries (using rules) which it's not suited for.

1👍
feelextra 2019-05-20T19:46:07.192500Z

@metasoarous Is this reply aimed at JavaScript API? if so, it seems that when using transit-js I would still need to have the readers API available: datascript.db/db-from-reader and datascript.db/datom-from-reader, however they're not currently exported. Regarding serializing the raw datoms and then ingesting them using d/init-db I still need to play with that for a bit as I'm still unfamiliar how that works. Glad to hear you've been exploring this space recently 🙂

2019-05-20T20:29:34.192700Z

You're very welcome @icyiceling184, and many thanks for the kind words. Always really nice to get acknowledgement for open source work 🙂

2019-05-20T20:30:01.192900Z

Regarding your questions, yes, it's only the tx-data that gets sent; Don't need resend the whole db (before & after).

1👍
2019-05-20T20:32:36.193100Z

Regarding the optimistic updates, you block additional transactions as a start, yes. The smarter thing to do is build up a queue of transactions that have been asserted in the tentative db, and if a remote transaction fails, walk back everything that occurred after, and (possibly anyway), rerun later transactions in the queue on the backup db value.

2019-05-20T20:33:29.193300Z

Obviously, not the most straightforward thing to do, and it may not always be obvious when you'd want to retry those later txs or just scrap them (if they depend on the original remote tx having gone through).

2019-05-20T20:33:56.193500Z

Cleaner solution here is something like a CRDT

2019-05-20T20:34:34.193700Z

Though you're going to be constrained around certain things like identity, and not able to express "all of datomic/datascript" that way.

2019-05-20T20:35:45.194100Z

3df is super cool, yes; And I did get a chance to meet Nikolas at the Conj, and had a good conversation about future directions.

2019-05-20T20:36:05.194300Z

It's definitely still early days and could use some settling and hardening, but I'm optimistic.

2019-05-20T20:36:57.194500Z

You're right that Posh doesn't do recursive queries/rules, but honestly, it might not be that hard to add in if you're really motivated.

feelextra 2019-05-20T20:58:02.194700Z

@metasoarous If you don't mind, I'd like to pick your brains about this particular problem I'm facing, for which I've been deliberating a solution for quite unsuccessfully so far. I'm developing a system where browser clients have their own DataScript instances to operate on, each hosting data entirely separate from each other, but with the same underlying schema. (think a personal Wikipedia per user) The average use case would be a few thousands nodes per graph, so this shouldn't be too hard for browsers in terms of memory requirements. However, I need to persist that data somehow, somewhere. I'm thinking that on a remote server is the best bet but it doesn't have to be Datomic since the client can run its own queries / transactions locally already, just need to replicate them. I gues this isn't the typical DatSync use-case, since I don't have a "central big database" serving all these clients. So the two problem are: 1. Finding a performant synchronization solution from client to server (doesn't have to be multi-clients, just a single client would suffice) 2. Finding a format to persist the data to. Any thoughts on this? would be glad to hear 😅

feelextra 2019-05-20T21:23:45.194900Z

One idea that I'm thinking about is that the client will only send :tx-data to the server, which the server will add the :tx-data as a message to a queue and add a label to it with the user who the transaction was enacted by. When available, the message at the top of the queue is picked. An already existing DataScript instance will be d/reset-conn!ed with data belonging to that message's labelled user, and the message's :tx-data will be d/transact!ed into it and then serialized to EDN using prn-str.