datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
peterdee 2020-05-26T00:09:26.249500Z

Answering my own question…. My data had a vector containing a vector. I suppose there is no way to interpret that. By the way, thanks for datahike! It is really a nice thing!

whilo 2020-05-26T00:12:02.249800Z

Thank you @peterd!

whilo 2020-05-26T00:14:12.250600Z

We follow the Datomic specification closely so far, so you can also look into their documentation until ours is up (WIP).

Petrus Theron 2020-05-26T10:49:50.257900Z

Hey Guys 🙂 I've been evaluating Datomic vs Crux vs DataScript vs Datahike vs rolling my own EAVT-shaped KV store on top of RocksDB for my project Bridge (https://www.tradebridge.app/). My conclusions so far: • Datomic is too pricey, On-Prem is hard to tune (memory hungry) and Datomic Cloud is tied to AWS (plus I'm too dumb to deal with Lambda). • Crux does not seem ready for primetime, makes you do the transactor's work and Kafka is a whole can o' beans. However, Differential Dataflow integration might twist my arm. • DataScript doesn't have persistence. • Datahike looks like the way to go and seems to be progressing nicely. If I want to deploy Datahike in production and I suddenly have to deal with a lot of data, how should I future-proof things for high read throughput and running out of disk space? Perhaps store all txes in a separate write-only log in case I need to move to something else? How far will flat-file persistence take me? Is there a way to not use Environ? (filed an issue) Lastly, how is datahike-rocksdb coming along? Keep going, guys!

👍 1
mpenet 2020-05-26T10:55:53.261500Z

Fronting with kafka or some event log makes sense. You can use something like https://github.com/mpenet/tape otherwise. You can have daily/hourly logs rollups that you ship/store on cheap storage

kkuehne 2020-05-26T11:16:04.261900Z

Hi @petrus, regarding the config, you can use just simple hashmaps as stated https://github.com/replikativ/datahike/blob/master/doc/config.md. Environments are defaults for the stores if nothing else is set. But I can add options to even opt-out of env checks if you like. And probably I'll extend the docs a little bit to make it more clear how to configure everything.

👍 1
kkuehne 2020-05-26T11:17:27.262300Z

Also an example can be found https://github.com/replikativ/datahike/blob/master/dev/sandbox.clj#L18

alekcz 2020-05-26T16:09:29.266900Z

A backend for mysql, postgres, h2. Comments welcome https://github.com/alekcz/konserve-jdbc

👍 3
🎉 4
alekcz 2020-05-26T16:10:16.267100Z

@whilo

whilo 2020-05-26T18:56:41.271Z

ok, we will extend the header once more to cover compression and encryption and then we are hopefully good for some time: 1 byte konserve store version, 1 byte serializer type, 1 byte compressor type, 1 byte encryptor type, 4 bytes metadatasize (only for stores that store metadata with the value such as the file store)

whilo 2020-05-26T18:56:54.271400Z

@alekcz360 Is this reasonable to you?

whilo 2020-05-26T18:57:35.272500Z

I think we will always have the first 4 bytes. This will allow to change the store configuration while it is being used.

whilo 2020-05-26T18:58:04.273Z

Ok, I will have a look later.

alekcz 2020-05-26T18:58:41.273800Z

Yeah. That makes sense. I’ll make the changes. It’ll give us more flexibility moving forward.

kkuehne 2020-05-26T19:02:44.274100Z

When this is done we can think about datahike-jdbc. Should be straight forward like the other two.

kkuehne 2020-05-26T19:05:58.276400Z

I'll create the skeleton project with the appropriate protocol for it.

alekcz 2020-05-26T19:06:44.277400Z

@whilo should all the bytes initially be 1? i.e. 1111 or should we go with 1000?

whilo 2020-05-26T19:07:56.278100Z

compressor and encryptor are optional, which i would encode as 0

whilo 2020-05-26T19:09:14.278700Z

we are creating lookup tables @ferdi.kuehne

alekcz 2020-05-26T21:05:24.280700Z

@konrad.kuehne @ferdi.kuehne I've added the 4 bytes. It's now 1100. I've assumed that string-serializer is 0 and fressian-serializer is 1. I'll make the final switch over when we have the table

👍 1