@j.massa Maybe this is part of the slow down of loading Datahike for the first time.
@whilo I put reminders into github. Thanks!
I have a use case where I want to record structured information with an enforced schema, but also want to record unstructured information in a way where it can be reliably linked with a structured datahike
DB. Specifically, I'm thinking of storing hiccup
documents in a database. The metadata is structured: document name, author, origin, etc. The content is obviously not: it could consist of arbitrary hiccup
forms. How should I approach this? Just create two DBs, one with schema-on-read and one with schema-on-write? Persist a reference to an atom
or some other way of storing the history and current state of the unstructured forms?
In similar cases I hash the blob and put the structured info with the hash into the db and spit the blob into a file named by the hash. I usually take the first two letters of the hash as a directory to put the file into. A bit like git. I use zfs as a file system so I can do snaps etc. there.
Hey @magra. do you have some code snippets on this zfs use case? was looking into this as well.
Wow to the management tool! I at the moment I am just building an System for my first prototype customer. So only a handful of users in production. All my machines, including laptops, are FreeBSD. I just zfs snapshot
and zfs send / recieve
for backup. No life replication. I use datahike with pathom as a backend for a fulcro app. The sofistication of my zfs boils down to shell-command-history in tmux at the moment (arrow-up return). I will not need more than a cron entry in the foreseeable future.
And I really like how FreeBSD-Jails integrate with zfs. I have encrypted datasets and Jails for a reverse proxy and the datahike+pathom backend. The datahike jail has access to an encrypted partition with the data. And totally unrelated to datahike I boot a laptop from an encrypted disk with a long passphrase. Then come boot environments. A boot environment is a zfs snapshot that includes the kernel, etc, and selectively the right parts of /var and so on. So if kernel or package updates go wrong I can just return to an old snapshot. Boot environments are just finegrained clever defaults so I do not accidentally rollback a postgres database when I want to rollback the kernel. I you want your kernel to go back to yesterday you do not want to roll back the emails you wrote since then or the log or a database.
Or here https://markusgraf.net/2020-02-26-Acme.html I use a small shared zfs dataset to share TLS certificates between a the reverse proxy and acme jails for certificate renewal.
The big idea is that a customer has one zfs dataset with his datahike-files, data directories and app config. This is encrypted and can be backed up or migrated to another machine. It is independent of the Jail that runs the backend with a jvm and the clojure code, which just keeps data like session info or the port or the reverse proxy. The customer dataset gets mounted into the backend-jail.
If you want the hiccup to not be stored as one but cut into peaces: Rich Hickey mentioned in one talk how he uses datomic for source control on an s-expression level. I forget which talk tough.
thanks!
or might have been Stu.
Figuring out how to do something like that in datahike
would certainly unlock some very interesting applications for managing and discovering Clojure code.
You can also use two datahike versions with different schema flexibility and join both databases in queries.
Multiple inputs can be handled like here: https://github.com/replikativ/datahike/blob/master/examples/basic/src/examples/store.clj#L66
That is very cool. How are you using ZFS? I am also using on my laptop for the last half year including encrypted snapshots for backups and so far it works fine. We are thinking about building a snapshot management tool with Datahike for it at the moment.