asami

Asami, the graph database https://github.com/threatgrid/asami
quoll 2021-03-19T02:52:35.028700Z

Alpha5 is now done. Same as Alpha4, but queries are significantly faster on large datasets

Craig Brozefsky 2021-03-19T13:33:47.029400Z

putting Alpha5 thru some paces

1šŸ‘1šŸ˜¬
quoll 2021-03-19T13:35:38.030800Z

Well, itā€™s alpha so we can find the big problems and address them before itā€™s called a ā€œreleaseā€ šŸ™‚

Craig Brozefsky 2021-03-19T13:50:04.031Z

gosh I'm rusty at clojure 8^)

Craig Brozefsky 2021-03-19T13:50:48.031300Z

also, do not buy Brach's 24 Flavours tiny Jelly Beans

Craig Brozefsky 2021-03-19T13:51:25.031900Z

their attempt at Jelly Belly knockoffs... It's like they didn't realize that the flavors must harmonize when you shovel a handful into your maw

1šŸ˜†
Craig Brozefsky 2021-03-19T13:57:37.033200Z

ok, think I broked it

Craig Brozefsky 2021-03-20T13:26:50.055300Z

New times: lein test netgrok.core-test Importing o1 into asami Importing to asami "Elapsed time: 19520.553168 msecs" Imported 0 Importing o2 into asami Importing to asami "Elapsed time: 208748.131329 msecs" Imported 0

Craig Brozefsky 2021-03-20T13:27:04.055500Z

So yah, I can confirm your estimate on perf win with Alpha6

quoll 2021-03-20T13:28:03.057500Z

Iā€™m guessing that youā€™re saying ā€œimported 0ā€ to mean a count of tempids?

Craig Brozefsky 2021-03-20T13:29:03.057700Z

yah, just ignored that this time since I havent' updated my tests yet -- still drinking first cup of coffee

quoll 2021-03-20T13:30:25.059300Z

Have a look at the count on tx-data. Thatā€™s the number of statements inserted

quoll 2021-03-20T13:31:07.060500Z

If you want the number of entities insertedā€¦ do a count on your input šŸ˜Š

quoll 2021-03-20T13:33:48.063900Z

The tempids, is so you can provide a negative number for :db/id on an entity and it will generate an ID for you and tell you what your negative number got mapped to (like Datomic)

Craig Brozefsky 2021-03-20T13:34:17.064100Z

I have some utility functions for exploring the shape of the data and the schema

Craig Brozefsky 2021-03-20T13:34:51.064300Z

next step is to do some query clause generators

Craig Brozefsky 2021-03-20T13:35:22.064500Z

for functional composition of where clauses...

Craig Brozefsky 2021-03-20T13:35:48.064700Z

just doing export-data a bunch helped me grok what is happening

quoll 2021-03-20T14:11:39.068700Z

export-data gives you a view of everything, but if you insert individual things (or just small numbers of entities) then have a look at the contents of tx-data in the results of the transaction. That shows you the triples that were generated and inserted.

quoll 2021-03-20T14:13:17.070500Z

Iā€™m curious how many triples you got from your data that took 3m28s to load. (I have to work to improve this)

Craig Brozefsky 2021-03-20T14:33:20.070700Z

data coming up...

Craig Brozefsky 2021-03-20T14:36:50.071Z

ein test netgrok.core-test Importing o1 into asami Importing to asami "Elapsed time: 19489.224782 msecs" Imported 31881 statements Importing o2 into asami Importing to asami "Elapsed time: 209601.057757 msecs" Imported 296271 statements

quoll 2021-03-20T14:38:29.072Z

Thanks for that

Craig Brozefsky 2021-03-20T14:39:02.072200Z

heading out for brunch

quoll 2021-03-19T13:57:41.033300Z

For anyone wondering, Craig is allowed to push me around in here. Heā€™s no longer at Cisco, but it was his bright idea that I write my own graph database.

quoll 2021-03-19T13:57:51.033500Z

Yup? Whatā€™s happened?

Craig Brozefsky 2021-03-19T13:58:58.034200Z

Locked up importing a few thousand objects

Craig Brozefsky 2021-03-19T13:59:12.034400Z

I'll break up the txn

Craig Brozefsky 2021-03-19T13:59:31.034600Z

and we'll see what's happening. I must first eliminate my own stupidity...

quoll 2021-03-19T13:59:46.034800Z

Actually, breaking up the transaction is a bad thing to do. How big is the file that youā€™re importing?

quoll 2021-03-19T14:00:40.035Z

(bad, because you end up expanding the indexes significantly)

quoll 2021-03-19T14:03:02.035200Z

Also, if youā€™ve made a mistake, Iā€™d like to know about that too. I should document gotchas, and mitigate some of the more obvious ones

Craig Brozefsky 2021-03-19T14:27:43.035400Z

yah, so it's not locked up, but just slow. Mind you I'm throwing a lot of large complex objects at it

Craig Brozefsky 2021-03-19T14:27:54.035600Z

I'll get data to you shortly

quoll 2021-03-19T14:28:50.035800Z

ā€œlarge complexā€ is going to be an issue. Zuko (the module that breaks it up into triples) is now faster than it used to be, but thereā€™s still a lot of work for it to do

Craig Brozefsky 2021-03-19T14:29:08.036Z

Yah, I'm thinking it's a chance to intrument the whole thing with metrics data

Craig Brozefsky 2021-03-19T14:30:31.036200Z

Importing o1 into asami Importing to asami "Elapsed time: 53402.866849 msecs" Imported 4503 Importing o2 into asami Importing to asami "Elapsed time: 644524.77233 msecs" Imported 41484

Craig Brozefsky 2021-03-19T14:31:22.036400Z

FAIL in (load-test) (core_test.clj:21) Test loading and parsing expected: (= (count o1) (count (:tempids tx1))) actual: (not (= 271 4503)) lein test :only netgrok.core-test/load-test FAIL in (load-test) (core_test.clj:22) Test loading and parsing expected: (= (count o2) (count (:tempids tx2))) actual: (not (= 2570 41484))

Craig Brozefsky 2021-03-19T14:31:51.036600Z

So the failures are me expecting the entity count to be the input object count. The difference tells you just how complex some of the objects are, with many nested entities...

quoll 2021-03-19T14:34:28.036800Z

It creates lots of temporary IDs, but unless you ask, I would generally think theyā€™d match the provided objects. :thinking_face:

quoll 2021-03-19T14:34:55.037Z

Iā€™m assuming that some or all of this data can be shared?

Craig Brozefsky 2021-03-19T14:35:11.037200Z

not sure. it's packet dumps form my home network

Craig Brozefsky 2021-03-19T14:36:10.037400Z

I will find some representative data

Craig Brozefsky 2021-03-19T14:36:38.037600Z

The :tempids in the tx would include the nested objects right?

quoll 2021-03-19T14:37:32.037800Z

I didnā€™t think so (unless you asked it to). So unless Iā€™ve forgotten something itā€™s a problem

quoll 2021-03-19T14:38:34.038Z

It creates lots of IDs, but that map is supposed to just be for the top level entities, and things youā€™ve provided your own temporary IDs to

Craig Brozefsky 2021-03-19T14:38:44.038200Z

yah, so that seems wrong then

Craig Brozefsky 2021-03-19T14:38:58.038400Z

I provided no temp IDs for anything

Craig Brozefsky 2021-03-19T14:55:14.038600Z

So, I ran the same thing with in mem db

Craig Brozefsky 2021-03-19T14:55:16.038800Z

lein test netgrok.core-test Preparing o1 "Elapsed time: 0.005525 msecs" Preparing o2 "Elapsed time: 8.82E-4 msecs" Importing o1 into asami Importing to asami "Elapsed time: 262.289293 msecs" Imported 4503 Importing o2 into asami Importing to asami "Elapsed time: 2622.369329 msecs" Imported 41484

Craig Brozefsky 2021-03-19T14:55:32.039Z

Is zuko involved in that too?

quoll 2021-03-19T14:56:36.039200Z

yes

quoll 2021-03-19T14:57:06.039400Z

Itā€™s a library that pulls entities apart into triples

Craig Brozefsky 2021-03-19T14:57:14.039600Z

ok, so it's not in zuko then eh

Craig Brozefsky 2021-03-19T15:52:35.040100Z

So I am coercing string keys in JSON to keywords... OUt of.. habit?

Craig Brozefsky 2021-03-19T15:52:51.040700Z

Would I be violating any assumptions of Asami if I did not do that?

quoll 2021-03-19T15:53:00.040900Z

no

quoll 2021-03-19T15:53:12.041100Z

orā€¦ I hope not šŸ™‚

Craig Brozefsky 2021-03-19T17:42:21.041700Z

I need to check my undertanding here:

Craig Brozefsky 2021-03-19T17:42:38.042200Z

[:tg/node-929806 "ip.flags" "0x00000040"]
 [:tg/node-623767 "layers" :tg/node-623768]
 [:tg/node-623767 :db/ident :tg/node-623767]
 [:tg/node-623767 :tg/entity true]

Craig Brozefsky 2021-03-19T17:42:58.042500Z

netgrok.core> (d/entity (d/db (conn)) :tg/node-623767)
{}

Craig Brozefsky 2021-03-19T17:43:05.042800Z

I would not expect that to be an empty entity

Craig Brozefsky 2021-03-19T17:44:06.043200Z

the triples are from: (d/export-data (d/db (conn)))

Craig Brozefsky 2021-03-19T17:44:15.043500Z

this is asami alpha5 running in memory

Craig Brozefsky 2021-03-19T17:45:08.043900Z

(conn) is just (d/db-connect URI) ...

Craig Brozefsky 2021-03-19T17:45:26.044300Z

so I'm making a new connection using the DB uri, and a new DB...

Craig Brozefsky 2021-03-19T18:00:39.044900Z

Ah, ok, if I don't coerce keys to keywords in the structs, entiy loading fails

Craig Brozefsky 2021-03-19T18:00:43.045100Z

So I think that's a boog?

quoll 2021-03-19T18:00:49.045300Z

could be

Craig Brozefsky 2021-03-19T18:01:25.045700Z

interesting!

Craig Brozefsky 2021-03-19T18:01:38.046100Z

this is in memory DB

quoll 2021-03-19T18:01:41.046300Z

Iā€™m working with the cleaned up data right now, so the attributes are all keywords. Iā€™ll try with the strings shortly

quoll 2021-03-19T18:01:51.046600Z

Oh, thatā€™s interesting too

Craig Brozefsky 2021-03-19T18:02:25.046900Z

yah, since the entity func is basically per storage...

Craig Brozefsky 2021-03-19T18:02:40.047100Z

export-data is my new pal

Craig Brozefsky 2021-03-19T18:03:08.047400Z

yah, so the file I sent you, I beleive has string keys

quoll 2021-03-19T18:03:50.047700Z

it does, yes

Craig Brozefsky 2021-03-19T18:32:47.048500Z

:smiling_face_with_3_hearts: this is a pleasant way to explore data

quoll 2021-03-19T18:33:34.048600Z

BTW, the large number of entities in tempids was expected, but Iā€™m revisiting it, and I think they should not be included.

quoll 2021-03-19T18:33:43.048800Z

So Iā€™m going to update Zuko to remove them

Craig Brozefsky 2021-03-19T21:30:15.049400Z

ok, gotten more familiar with the query language. Able to identify all the devices on my network, and start digging into their behavior

Craig Brozefsky 2021-03-19T22:35:53.050100Z

The query language is impressie Paula

1šŸ’–
Craig Brozefsky 2021-03-19T22:36:13.050500Z

calling it a day tho, so I stop obsessing over it