datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
whilo 2020-07-11T00:08:28.465700Z

@j.massa Maybe this is part of the slow down of loading Datahike for the first time.

magra 2020-07-11T09:13:47.466400Z

@whilo I put reminders into github. Thanks!

respatialized 2020-07-11T15:14:26.470200Z

I have a use case where I want to record structured information with an enforced schema, but also want to record unstructured information in a way where it can be reliably linked with a structured datahike DB. Specifically, I'm thinking of storing hiccup documents in a database. The metadata is structured: document name, author, origin, etc. The content is obviously not: it could consist of arbitrary hiccup forms. How should I approach this? Just create two DBs, one with schema-on-read and one with schema-on-write? Persist a reference to an atom or some other way of storing the history and current state of the unstructured forms?

magra 2020-07-11T16:05:15.473600Z

In similar cases I hash the blob and put the structured info with the hash into the db and spit the blob into a file named by the hash. I usually take the first two letters of the hash as a directory to put the file into. A bit like git. I use zfs as a file system so I can do snaps etc. there.

timo 2020-07-12T11:24:16.480500Z

Hey @magra. do you have some code snippets on this zfs use case? was looking into this as well.

magra 2020-07-12T13:21:32.480700Z

Wow to the management tool! I at the moment I am just building an System for my first prototype customer. So only a handful of users in production. All my machines, including laptops, are FreeBSD. I just zfs snapshot and zfs send / recieve for backup. No life replication. I use datahike with pathom as a backend for a fulcro app. The sofistication of my zfs boils down to shell-command-history in tmux at the moment (arrow-up return). I will not need more than a cron entry in the foreseeable future.

magra 2020-07-12T13:34:18.480900Z

And I really like how FreeBSD-Jails integrate with zfs. I have encrypted datasets and Jails for a reverse proxy and the datahike+pathom backend. The datahike jail has access to an encrypted partition with the data. And totally unrelated to datahike I boot a laptop from an encrypted disk with a long passphrase. Then come boot environments. A boot environment is a zfs snapshot that includes the kernel, etc, and selectively the right parts of /var and so on. So if kernel or package updates go wrong I can just return to an old snapshot. Boot environments are just finegrained clever defaults so I do not accidentally rollback a postgres database when I want to rollback the kernel. I you want your kernel to go back to yesterday you do not want to roll back the emails you wrote since then or the log or a database.

magra 2020-07-12T13:37:27.481100Z

Or here https://markusgraf.net/2020-02-26-Acme.html I use a small shared zfs dataset to share TLS certificates between a the reverse proxy and acme jails for certificate renewal.

magra 2020-07-12T13:43:23.481300Z

The big idea is that a customer has one zfs dataset with his datahike-files, data directories and app config. This is encrypted and can be backed up or migrated to another machine. It is independent of the Jail that runs the backend with a jvm and the clojure code, which just keeps data like session info or the port or the reverse proxy. The customer dataset gets mounted into the backend-jail.

magra 2020-07-11T16:12:11.475200Z

If you want the hiccup to not be stored as one but cut into peaces: Rich Hickey mentioned in one talk how he uses datomic for source control on an s-expression level. I forget which talk tough.

😮 1
respatialized 2020-07-12T16:44:26.486500Z

thanks!

magra 2020-07-11T16:13:11.475500Z

or might have been Stu.

respatialized 2020-07-11T16:48:14.477800Z

Figuring out how to do something like that in datahike would certainly unlock some very interesting applications for managing and discovering Clojure code.

kkuehne 2020-07-11T18:24:11.478600Z

You can also use two datahike versions with different schema flexibility and join both databases in queries.

kkuehne 2020-07-11T18:30:49.479200Z

Multiple inputs can be handled like here: https://github.com/replikativ/datahike/blob/master/examples/basic/src/examples/store.clj#L66

whilo 2020-07-11T20:10:23.479400Z

That is very cool. How are you using ZFS? I am also using on my laptop for the last half year including encrypted snapshots for backups and so far it works fine. We are thinking about building a snapshot management tool with Datahike for it at the moment.