asami

Asami, the graph database https://github.com/threatgrid/asami
quoll 2020-10-31T18:57:52.058600Z

Anyone following Asami may notice that there hasn’t been updates in the last couple of months. A few weeks of that was due to some non-Asami priorities I’ve had, but most of it has been due to implementing durable storage. There’s still a way to go with it, but it’s making progress!

alidlorenzo 2020-11-01T20:25:54.078Z

What’s the intented use-case for Asami durable storage? Is it to store data client side (e.g. in indexdb) and and then later sync up to another backend db (like crux or datomic)? Or will it serve as a primary storage solution itself? If so how will it compare to those other alternatives? Going to use Asami in my project so appreciate your work!

quoll 2020-11-01T21:20:42.078200Z

Well, Asami works in Clojure and ClojureScript. In ClojureScript, the idea is that data that is accumulated by the client will be kept, and can be added to. I haven’t done a lot on federated querying yet, but ultimately the user will be able to query across multiple stores, including the user’s in the browser. We’re using Asami in a security product where the user accumulates data from multiple sources (network events, incidents, etc) and uses this information to determine threats and how to respond to them. Some of this data takes time to accumulate, and the user wants to annotate it, so a short term goal is to store the user’s session data. A secondary goal is to scale better, since the hashmap indexes slow down after enough information has been accumulated. We also want to start using Asami within the services that speak to the client app. That’s because they’re doing work to accumulate security data, and link it wherever possible. That all runs in Clojure on the JVM.

2👍
quoll 2020-11-01T21:26:46.078400Z

The idea for now is to scale out to multiple GB storage. There are several longer term goals after that: • All written data is immutable. This allows replication which can provide horizontal scaling for queries. • Querying from multiple stores at once (i.e. properly using the :in parameter), where some of those stores are remote. • Provisioning to multiple back-end systems, a la Datomic provisioning. • A new index, built with similar infrastructure, which is optimized for indexing, but not updating. This is for analyzing large datasets (this will be more of a JVM focus, though it will run on ClojureScript the same) • RDF compatibility (it’s most of the way there already)

2👍
quoll 2020-11-01T21:27:25.078600Z

Some of this is wanted by my team. Some of it is stuff I just want to do, because I can do it better with Asami than I could with Mulgara 🙂

quoll 2020-11-01T21:28:12.078800Z

Also, I need to improve some of the integration with Naga, which is ironic, since Asami was designed as a part of Naga in the first place 🙂

alidlorenzo 2020-11-02T00:48:08.079200Z

sounds great, appreciate the details and looking forward to the updates! 🙂

1👍
borkdude 2020-11-04T17:56:46.081600Z

Exciting!

1💯
quoll 2020-10-31T18:58:09.059Z

A description of the design is here: https://github.com/threatgrid/asami/wiki/Storage-Whitepaper

2👌
quoll 2020-10-31T19:02:08.061900Z

I’ve had memory-mapped blocks going for some time. @noprompt is working on the ClojureScript blocks right now. These form an abstraction for the other layers to work on, and last night I got the persistent, transactional AVL trees working. Hopefully, once the ClojureScript blocks show up these will “just work” (that’s the plan, anyway). 🙂

quoll 2020-10-31T19:03:14.063Z

Now that I have the trees, the Data Pool (as described in the whitepaper), should be done shortly. Then the triple indexes.

quoll 2020-10-31T19:05:03.064400Z

It’s nice to create a tree, insert lots of data into it, and then get a seq out of it completely ordered, which I can then access later on. It seems to be nice and fast too (which was the point)

quoll 2020-10-31T19:06:02.065100Z

I’ve even ported Fogus’s LRU cache (in core.cache) to ClojureScript 🙂

quoll 2020-10-31T19:06:18.065300Z

so there’s lots happening

quoll 2020-10-31T19:06:39.065800Z

I just wanted to let people know in case anyone was wondering why the project is quiet at the moment

3👍
refset 2020-10-31T19:17:22.065900Z

Nice write-up 🙂 What's the context for the 16 million transactions limitation?

quoll 2020-10-31T19:19:20.066800Z

Just choosing a number of bits to work with. It was somewhat arbitrary

refset 2020-10-31T19:26:11.069100Z

Ah okay, my concern is that for some OLTP use-cases it might not be enough. Only enough for 1 year @ ~0.5 TPS

quoll 2020-10-31T19:26:24.069500Z

It’s a balance. The more triples supported, the fewer transactions. And vice versa. Overall size is more important, whereas transactions have a cost, so that seemed to be a reasonable balance

quoll 2020-10-31T19:27:05.070700Z

It’s possible to expand it to 5-tuples and have 64 bits for each, but that uses more space

quoll 2020-10-31T19:27:16.071100Z

It’s all tradeoffs

1✔️
refset 2020-10-31T19:36:20.071300Z

Of course, plenty of tradeoffs...plenty of fun! I appreciate the discussion. Now, back to studying RocksDB compaction options for my Saturday evening 😅

quoll 2020-10-31T21:47:17.075Z

I have a related design that’s not described on the wiki yet. It’s for loading and indexing large datasets quickly, specifically for analysis. It should be faster and more compact than the design described on the wiki... BUT it won’t update well. Again, it’s a trade off. Create the graph best suited to your needs.

quoll 2020-10-31T21:47:55.076Z

I’m interested in handling large datasets efficiently, so as to handle imported RDF

quoll 2020-10-31T21:48:35.077200Z

Asami is designed to handle RDF well, but to look like Datomic 🙂

2👍