asami

Asami, the graph database https://github.com/threatgrid/asami
quoll 2021-03-06T03:20:13.135500Z

@here https://github.com/threatgrid/asami/wiki/Asami-2

5🎉
quoll 2021-03-06T03:22:11.137300Z

The JVM version now accepts URLs of the form: asami:<local://database-name>

quoll 2021-03-06T03:24:04.138800Z

These are saved in the current working directory in a directory with the same name as the database name

quoll 2021-03-06T03:24:44.139600Z

It’s alpha, so it has a way to go. I would love some feedback please

mpenet 2021-03-06T06:17:07.146500Z

Congratulations, it's very impressive and I am looking forward to trying it out on Monday first thing. I quite enjoyed reading the white paper, these kind of things are not always very well documented and the fact you wrote this early speaks for the attention to details while designing Storage.

mpenet 2021-03-06T06:45:08.151400Z

About tempids: I was wondering if it would not be better to have them returned as strings (or something else than keywords, since they will end up filling the keyword "cache", the weakmap for interning on the jvm and whatever is used in cljs).

quoll 2021-03-06T11:01:14.157600Z

I have been considering this as well. The storage doesn’t actually care, but of course as you access them you’ll start interning them. I was wondering about creating a type (as Datomic seems to have done) or else using URIs (which are flexible and follow RDF). Strings aren’t really a great idea as they become difficult to distinguish between scalars and entity references

mpenet 2021-03-06T11:24:17.158100Z

Good points.

quoll 2021-03-06T22:28:33.160Z

OK… I have an approach to work on this. I’m adopting a new type. My biggest issue was encoding, but I’ve handled that part now. I’ll do an update when it’s all done 🙂

quoll 2021-03-06T22:31:56.163Z

The memory usage had occurred to me a few years ago, but at that point the entire thing was in memory, so I decided not to worry. I had thought of it again more recently, but decided that since they’re only seen when you pull them in from disk, then it was probably OK for the moment. But it was never going to work long term. However, when I thought about it today a bigger issue came up. These keywords are created with gensym. That means that if the database is opened multiple times, then generating internal nodes has the potential to reuse identifiers. Oops 😳

quoll 2021-03-06T22:32:32.163600Z

I’m adding the id counter to transactions, and creating a new datatype that encodes efficiently.

1👍