Anyone following Asami may notice that there hasn’t been updates in the last couple of months. A few weeks of that was due to some non-Asami priorities I’ve had, but most of it has been due to implementing durable storage. There’s still a way to go with it, but it’s making progress!
What’s the intented use-case for Asami durable storage? Is it to store data client side (e.g. in indexdb) and and then later sync up to another backend db (like crux or datomic)? Or will it serve as a primary storage solution itself? If so how will it compare to those other alternatives? Going to use Asami in my project so appreciate your work!
Well, Asami works in Clojure and ClojureScript. In ClojureScript, the idea is that data that is accumulated by the client will be kept, and can be added to. I haven’t done a lot on federated querying yet, but ultimately the user will be able to query across multiple stores, including the user’s in the browser. We’re using Asami in a security product where the user accumulates data from multiple sources (network events, incidents, etc) and uses this information to determine threats and how to respond to them. Some of this data takes time to accumulate, and the user wants to annotate it, so a short term goal is to store the user’s session data. A secondary goal is to scale better, since the hashmap indexes slow down after enough information has been accumulated. We also want to start using Asami within the services that speak to the client app. That’s because they’re doing work to accumulate security data, and link it wherever possible. That all runs in Clojure on the JVM.
The idea for now is to scale out to multiple GB storage. There are several longer term goals after that:
• All written data is immutable. This allows replication which can provide horizontal scaling for queries.
• Querying from multiple stores at once (i.e. properly using the :in
parameter), where some of those stores are remote.
• Provisioning to multiple back-end systems, a la Datomic provisioning.
• A new index, built with similar infrastructure, which is optimized for indexing, but not updating. This is for analyzing large datasets (this will be more of a JVM focus, though it will run on ClojureScript the same)
• RDF compatibility (it’s most of the way there already)
Some of this is wanted by my team. Some of it is stuff I just want to do, because I can do it better with Asami than I could with Mulgara 🙂
Also, I need to improve some of the integration with Naga, which is ironic, since Asami was designed as a part of Naga in the first place 🙂
sounds great, appreciate the details and looking forward to the updates! 🙂
Exciting!
A description of the design is here: https://github.com/threatgrid/asami/wiki/Storage-Whitepaper
I’ve had memory-mapped blocks going for some time. @noprompt is working on the ClojureScript blocks right now. These form an abstraction for the other layers to work on, and last night I got the persistent, transactional AVL trees working. Hopefully, once the ClojureScript blocks show up these will “just work” (that’s the plan, anyway). 🙂
Now that I have the trees, the Data Pool (as described in the whitepaper), should be done shortly. Then the triple indexes.
It’s nice to create a tree, insert lots of data into it, and then get a seq out of it completely ordered, which I can then access later on. It seems to be nice and fast too (which was the point)
I’ve even ported Fogus’s LRU cache (in core.cache) to ClojureScript 🙂
so there’s lots happening
I just wanted to let people know in case anyone was wondering why the project is quiet at the moment
Nice write-up 🙂 What's the context for the 16 million transactions
limitation?
Just choosing a number of bits to work with. It was somewhat arbitrary
Ah okay, my concern is that for some OLTP use-cases it might not be enough. Only enough for 1 year @ ~0.5 TPS
It’s a balance. The more triples supported, the fewer transactions. And vice versa. Overall size is more important, whereas transactions have a cost, so that seemed to be a reasonable balance
It’s possible to expand it to 5-tuples and have 64 bits for each, but that uses more space
It’s all tradeoffs
Of course, plenty of tradeoffs...plenty of fun! I appreciate the discussion. Now, back to studying RocksDB compaction options for my Saturday evening 😅
I have a related design that’s not described on the wiki yet. It’s for loading and indexing large datasets quickly, specifically for analysis. It should be faster and more compact than the design described on the wiki... BUT it won’t update well. Again, it’s a trade off. Create the graph best suited to your needs.
I’m interested in handling large datasets efficiently, so as to handle imported RDF
Asami is designed to handle RDF well, but to look like Datomic 🙂