I've heard that there can be performance issues with Datascript in single page apps with "large" databases ? Is that still true or an old myth ? What are your experiences with performance ? Are there any types of queries / use cases to avoid ? Looking from the perspective of re-frame, possibly official support etc.
Thanks for the reply @huxley đ Roughly how big is your datascript db ?
several thousand entities
@huxley nice - if damning - review. Itâs good to hear from people who have used it in anger (literally, it would seem).
there was once a pretty interesting conversation on this topic here, but the archive doesn't work
lilactown created autonormal https://github.com/lilactown/autonormal
roman01la also mentioned problems with re-posh and datascript in general
joinr commented on this same topic on Reddit
Hah, he even made that comment as a reply to me đ
I'm a compulsive github browser, and overall what I've noticed recently is that everyone seems to have run into the same problems and everyone is trying to solve them somehow.
one is state management, preferably using as flat a normalized db as possible or using a datalog.
the other is to get rid of unnecessary multiple recalculations using graphs.
third is closer integration with react. rumext
, helix
or uix
are only first examples.
yup, noticed the same thing
been trying to get an overview by compiling a list of the graph stuff: https://github.com/simongray/clojure-graph-resources
Thanks! That is all very helpful. I'll look into all that. We want to improve the state management and subs (graph) storey in re-frame, but I doubt we'll ever diverge from reagent as the backwards compat is probably too much of an issue.
@simongray you can also add the mentioned autonormal
there is also no problem to use meander
to search graphs, it is even in examples
@huxley yup, I have a few things I need to add, just been a bit busy (just got a kid)
https://github.com/noprompt/meander/blob/epsilon/examples/datascript.cljc
@simongray congrats! How old ? I have a 3 month old boy.
6 weeks on thursday đ
and thank you - you too!
Oh awesome, that is a good milestone. Those initial weeks can be full-on, after that its all a bit more reasonable.
@superstructor reagent is great. so great that one of the main inspirations for the hooks was reagent itself. However, I understand people who are just now facing the choice of react wrapper and I also understand that they may want to use the leanest one possible. Today reagent doesn't offer as much more than pure react as it did a few years ago.
@simongray congratulations. Even though I don't have children myself, I am glad that others do;)
@huxley That meander datascript example is cool; basically a macro-based conversion of datalog to meander at compile time.
very cool, though not sure how practical it is
@huxley thank you!
I wrote for fun and to get to know meander better
is not practical at all and probably has a lot of bugs, but it's just an example that maybe it's stupid but possible ; )
Another anecdotal experience report: I don't have much experience with re-posh, but last week I pitched in to investigate some performance issues with the athensresearch project. The codebase uses re-posh and re-frame and does a lot of recursive pulls, which seems to cause havoc on the posh pull-analyzer. Here's an example: https://github.com/athensresearch/athens/pull/665#issuecomment-790088361
Are you involved with Athens, @pithyless? Iâm amazed at how that project came out of a single tweet and just started snowballing.
I came across it by accident when I reading about all these new org-like tools that are popping up and ended up submitting a couple of PRs. They definitely seem to have a lot of momentum right now, their discord channel has a lot of activitiy, (and IIUC, they have some funding sources); but the competition is fierce. Also, they're definitely going to have to fight through some scaling pains - I mentioned the re-posh stuff and also the way it's now handling durable storage.
I found it interesting when comparing Athens to what the LogSeq project (also CLJS project) is doing; where LogSeq is e.g. using git for their sync layer and OCaml for their Markdown parser (and modeling data at the page, not block level).
@pithyless If you had to pick some stack for modelling data in the frontend, what would you go with?
@simongray not sure what you mean; if you're talking about frameworks/libraries, my goto stack is fulcro+shadow-cljs (vs say re-frame+figwheel); but you know... it depends. đ If you're talking about more specifically datastores, Fulcro's generic 3-layer DB approach is fast enough for most cases (since it's just maps and lookup-refs); you can always add reactive mutations if you'd like; and if you put Pathom behind it you're free to swap out and add a more complicated datastore (Datomic / DataScript / SQL / etc). I'm definitely keeping an eye on Asami for it's speed and durability promises and I hope to use it in anger sometime.
I think that was kind of rambling; so I usually need a reason not to hide everything behind a Pathom EQL API (irrespective of what ends up resolving the query).
but with that approach, Fulcro's DB map with lookup-refs works nicely for fetching data locally to components
I am pretty new to Datascript but I noticed the not-so-good performance as well. On less than 2000 entities, on a very recent iPhone (React Native), queries can take as much as 100ms+ (!!!) to fetch around 100 entities.
As far as I can tell, the query performance is proportional to the size of the result.
It seems to be about ~1ms per entity (thatâs a very rough ballpark estimate)
One solution Iâve devised it to use DataScript very carefully and avoid using it as the primary source of reads. Basically I came up with a solution that puts an additional atom as a sort of a cache in front of Datascript. I use that cache atom to do most reads (instead of doing directly to datascript via (query⌠) etc which is quite slow)
But I like the expressive power of the Datalog queries⌠so itâs definitely a trade-off.
I wish I could use it directly as a primary source of reads but itâs simply too slow for the needs of my mobile application, where sometimes I need to read values from the the app state dozens of times a second.
Even though datalog is infinitely powerful, it's usually not used to its full potential on the frontend, and thanks to clojure's expressiveness you can achieve the same effect with not much more code. For the more ambitious there is still meander
.
@huxley how many datoms approx. do you have in your database when you noticed the slow down? Where you using indices in DataScript?
As much as I was a big proponent of datascript, I currently advise against it for everyone. Datalog on the BE side â¤ď¸. On the frontend side, state is best managed in fulcro or in an identical way to fulcro.
@pithyless I was just wondering what libs you used for handling state and how you handle those transitions between frontend and backend, basically. Thank you for answering.
@raspasov We have several thousand entities in production.
It doesn't make much sense to me that an in-memory db can be that slow.
@huxley Did you have to requirement to run on mobile? Thatâs where I noticed the bulk of the slow down. I tested on non-mobile and the perf. was quite a bit better .
@simongray I was SUPER surpised as well.
yes
if someone really wants a datalog, I recommend asami
is 100-200x faster
Thereâs an explanation by tonsky here: https://github.com/tonsky/datascript/issues/130
So what does Asami do differently? AFAIK it started as a fork of Datascript.
Just like Datahike and Datalevin
Perhaps the query planner? (I have no knowledge of asami): ⢠Query planner: Queries are analyzed to find an efficient execution plan. This can be turned off.
Asami has a planner that is additionally cached.
Yeah⌠I felt that explanation by tonsky gives a lot of clarity: âDataScript is in different category, so expect different tradeoffs: query speed depends on the size of result set, you need to sort clasuses to have smaller joins, accessing entity properties is not free given its id, etc. As a benefit, you gain ability to query dataset for different projections, forward and reverse reference lookups, joins between different sets, etc. And direct index lookup (`datascript.core/datoms`) is still fast and comparable to lookup in a map (at least comparable, think binary lookup vs hashtable lookup, logarithm vs constant). Queries do much more than that.â
This was key for me: âquery speed depends on the size of result setâ
Canât expect to fetch a giant result set in constant time⌠It feels more like linear time.
[(datascript-q1) (asdb-q1) (mdb-q1)]
;; => [3.99 0.15 17.51]
[(datascript-q4) (asami-q4) (mdb-q4)]
;; => [169.43 3.58 167.25]
@huxley are those times in ms?
yes
jvm
asdb: asami?
mdb
is a simple replica of the datalog
in meander
, which I posted here
yes
Cool
Any downsides of asami youâve noticed?
apart from testing, I have not had the opportunity to use
It seems around 50x faster
Based on those two queries
I was talking with noprompt
from cisco while discussing meander
, and they are using asami
in production along with re-frame
so it's battle tested
It feels like DataScript is a pretty simple implementation and leaves a lot on the table for improvement.
actually, considering the speed it offers, I'd say it's rather complicated
Databases are tricky things (in memory or not), you need to resort to clever tricks to squeeze performance.
@huxley alright đ I havenât explored the internals, so I canât speak; only speculate.
simple it is to use filter, and it is not slower specifically, despite the lack of indexing
Rrrright đ
> Any downsides of asami youâve noticed? It's not a port of Datascript - it was started independently around the same time - and it doesn't try to be 1:1 feature compatible with Datomic API. So you might be surprised by how certain things are incompatible with your existing queries (e.g. no pull syntax at the moment, db/idents work different than DS/Datomic, etc.)
Hmmm⌠giving me a lot of food for thought here; I like the organization of data that datalog provides
And FYI - there is an active #asami channel on Slack ;]
@pithyless just joined, thanks! đ
There was also a #datalog channel that was created sometime ago, meant for these kind of cross-library discussions, but it has been quiet recently
(defn transduce-q4 []
(e/qb 1e1
(into []
(comp
(filter (fn [[_ m]] (= "Ivan" (m :name))))
(filter (fn [[_ m]] (= :male (m :sex))))
(map (fn [[_ m]] (select-keys m [:db/id :last-name :age]))))
mdb100k)))
[(datascript-q4) (asami-q4) (mdb-q4) (transduce-q4)]
;; => [158.78 4.39 151.03 46.03]
transduce-q4 is just regular Clojure transduce code, yes?
yes
you have the code above
@huxley have you tried q4 with specter?
yes
Yes⌠Well⌠One âtrickâ Iâve resorted to on React Native is runAfterInteractions https://reactnative.dev/docs/interactionmanager (not sure if thereâs comparable browser API/trick)
I even have the code, just let me find
Basically it delays the execution of a given fn after all user interactions have ended
Another option Iâve explored is run DataScript in its own workerâŚ
(That would definitely help, but it comes with its own set of challenges)
Aka, you can only communicate with your in-memory db asynchronously⌠but to be fair⌠with runAfterInteractions⌠itâs already happening! Lol
@huxley I have a suspicion the destructs [_ m]
are killing your perf in transduce-q4
Yes, but it's just a quick write-up
db is of the form
{{?id {:db/id ?id ?k ?v ...} ...}
this is slower to query, but pull syntax/eql is lightning fast
I am really curious where the major slow down in DataScript is compared to other options.
The similarity to Datomic is still very compelling for me, and the power of Datalog + pull syntax is definitely useful.
@huxley have you explored putting :db/index on certain schema elements in DataScript?
I find Fulcro's approach pragmatic - seldom do you need the full power of Datalog when you're re-rendering a component; I think of it as a UI data cache for my EQL-backed data (which can still be a proxy for a DataScript instance running in the browser; just not something that needs to run every animation frame).
[(datascript-q4) (asami-q4) (mdb-q4) (transduce-q4) (specter-q4)]
;; => [168.09 4.36 153.06 48.14 49.84]
I had to rewrite because I lost the q4 with the specter
please note that I am far from proficient with specter
Specter is quite an amazing tool IMO⌠Esp. when it comes down to data transformation (less so for just data reading)
@huxley if you're yak-shaving you may be interested in updating that transduce with some macros from https://github.com/bsless/clj-fast (and bsless also has this libary I never played with - https://github.com/bsless/impedance#performance-differences)
but I think it's going to be hard to beat asami, since it looks like the query-planner short-circuits a lot of work in your benchmark đ
I only joined this channel this morning, so I didnât see any of the questions here until now. If anyone is interested I can explain how Asami works? Itâs quite different to the structures in DataScript.
Truth be told, if someone had told me about DataScript 5 years ago then I wouldnât have started Asami
(Asami was originally part of Naga, and that project started in 2016)
heya, I'm the sync person from Roam Research
I can't talk about query performance very much, as I mostly operate on transaction semantics and database persistence
I can say that for large databases (50+ mb of datascript transit) transact starts getting slow
proportionally to the database size
transit deserialization is also overall slow, but that's unsurprising
this is all from browser CLJS
@quoll hey there! kinda curious about the asami query planner, is it still efficient when the data keeps changing?
Weâre still working on durable storage for CLJS, so thatâll be a while, sorry
e.g. query query query vs query transact query transact query
Yes, the whole point of the planner is to base the plan on the data
uhm... at Roam we have a mostly generic persistence layer for Datascript
It relies on the âcountâ of resolution of individual patterns (these get cached too, so itâs not hitting the DB too much for this)
Can you explain what you mean by that please?
it syncs datascript transactions as a totally ordered list locally first and then remotely
right now we use it to sync first to indexeddb, then to firebase
but it's based on an abstract driver system
so it was easy to make variants for indexeddb+datomic
the in-memory db is still datascript
but the only things that matter as far as syncing is concerned the the transaction fn and error handling
so that can be abstracted to use asami or anything else (e.g. datahike) as long as it's an in-memory database
it's important to do in-memory because there can be a lot of rollbacks as optimistic transactions are turned into confirmed txs
e.g. two clients doing txs at the same time will have different optimistic orders than the final confirmed order, the sync system "rebases" the optimistic txs on top of the confirmed as these come
we were thinking of open sourcing this
can asami run as an in-memory db? if so maybe we could work together to make the sync system generic
then you could use arbitrary persistence layers via these drivers
Asami on CLJS is currently only in-memory
oh cool then that'd definitely work
our sync thing (we call it Link) could persist it to indexedb and other places
I donât have a pull API for it yet. It hasnât been a priority
are you interested in some collaboration if we can provide an open source persistence layer for in-memory asami dbs?
Sure. I keep it open source for a reason đ
coolio, going to see what I can do WRT making our stuff open source
will keep you posted
Iâm doing persistence right now though. Everything is based on a block abstraction that can be stored in anything (the first implementation of this is memory mapped files in the JVM, but the second one is going to be indexedb⌠partly implemented now)
Probably better to ask in #asami đ
the sync persistence we have is just based on the transaction log (and optionally snapshots)
saving immutable serialized transactions instead of mutable data structures
https://clojurians.slack.com/archives/C07V8N22C/p1615898168016200?thread_ts=1615872613.002600&cid=C07V8N22C
@simongray from a high level, one of the main differences Iâd seen is that DataScript (and Datomic) store datoms, and then index them. Asami doesnât do that. Instead, it has indexes for the valid statements, without pointing at instances of statements. Itâs all just nested maps. The main consequence of this is that searching for when statements get created or deleted isnât so straight forward. But so far we havenât needed that.
If youâre looking for a :where
clause with a single pattern in it, then that might be [entity :my-property '?value]
. In this case, both the entity and the attribute have been set. So you can just go to the EAV index, and say: (get-in eav [entity :my-property])
and you have your values. So simple queries that just do a single pattern are literally just a lookup in a map, followed by a lookup in the nested map.
Joins cost a bit more. For instance, [?person :name "Betty"][?person :age 20]
First of all, the optimizer figures out the pattern with a smaller result, and uses the above to get a result. If this first one is people named âBettyâ, then it will go to the AVE index, to get the set of all person entities. It then iterates over that, and uses it to modify the second pattern, which it then looks up. So the first person named âBettyâ may be an entity identified by :node-123
, which means that the second pattern gets updated to [:node-123 :age 20]
. This is resolved with (get-in eav [:node-123 :age 20])
, and if it is true, then that value for ?person
gets returned. The same goes for every other person who was resolved as well.
How does this compare to joins in DataScript? I donât actually know! I never looked đ
@filipematossilva That makes sense. And itâs easy to replay. Itâs not what Asami is using though. Iâm saving immutable data structures.
the difference is this model (for asami) is that the in memory version would be fed the relevant transactions on load, and those transactions would be persisted to disk or network separately of asami
I see.
Well, Asami doesnât store the relevant transactions. That said, they do get returned from a call to transact
(like datomic does), meaning that theyâre easy to accumulate
We use datascript in production, or rather what's left of it. We had to cut out most of the functionality due to tragic performance with more products in the db. Basically, all that's left is the pull syntax.
Datascript
, contrary to what we can read on github
, has very poor performance. For example datalevin
despite the fact that it uses data stored on disk is much faster.
Almost everything is faster than datascript
, even queries written in meander
, operating on flat db in fulcro
style. Simplest macro that creates transreducer
on the fly beats datascript
.
As about re-posh
, sometimes it loses changes, especially if you evict data from db. It also doesn't allow to use all possibilities of datascript
, and with bigger number of arguments in :in
it loses order, so you have to wrap arguments in vector.
much better db, but still having many rough edges, is Asami