Hi @willier. I thought there is a datahike-backend for dynamodb... :thinking_face: but it is quite straight-forward to write a datahike-backend https://cljdoc.org/d/io.replikativ/datahike/0.3.6/doc/backend-development, I already did it recently for cassandra: https://github.com/timokramer/datahike-cassandra.
Hi @timo, what does k/new-your-store do? i don't see the source for that, assume it creates the tables?
@willier I think you need to look at the development
branch: https://github.com/TimoKramer/datahike-cassandra/blob/development/src/datahike_cassandra/core.clj
ah thanks! @taylor.jeremydavid
@brownjoshua490 First of all the file system backend got some improvements (Java's async nio basically) that are not in the official dependencies yet because we want to provide a seamless migration experience with @konrad.kuehne work on https://github.com/replikativ/wanderung/tree/8-dh-dh-version-migration, which is almost done. So to get the optimal performance you should add [io.replikativ/konserve "0.6.0-alpha3"]
first. (The older store had too small buffer sizes and you basically hit Java's FileOutputStreams all the time. Other backends should not be affected by this problem.)
(ns sandbox
(:require [datahike.api :as d]
[taoensso.timbre :as t]))
(comment
(t/set-level! :warn)
(def schema [{:db/ident :age
:db/cardinality :db.cardinality/one
:db/valueType :db.type/long}])
(def cfg {:store {:backend :file :path "/tmp/datahike-benchmark"}
:keep-history? false ;; true
:schema-flexibility :write
:initial-tx schema})
(d/delete-database cfg)
(d/create-database cfg)
(def conn (d/connect cfg))
(time
(do
(d/transact conn
(vec (for [i (range 100000)]
[:db/add (inc i) :age i])))
nil))
;; "Elapsed time: 7387.64224 msecs"
;; with history: "Elapsed time: 14087.425566 msecs"
(d/q '[:find (count ?a)
:in $
:where [?e :age ?a]]
@conn) ;; => 100000
)
Then this is the behaviour on my machine. It took me around 5 seconds (100k/5 = 20k datoms/sec) to transact this last time I checked this microbenchmark 3 months ago, so we might have introduced a slight performance regression in the last releases or my machine got slower.
We currently do not write to the history index in parallel, something that should bring this number closer to 7 seconds (that is why it is approximately double).
Of course this is just a microbenchmark, and as I mentioned, measures bulk throughput. We know how to add a buffer to the transactor to saturate at a similar speed for finer transaction granularity, but we have not done this work yet.
@brownjoshua490 You should definitely see a better performance around 4-5k datoms/sec even on the old file store though. Maybe your Datoms contain a lot of data?
@whilo Thanks for the example. Our datoms are string heavy, lot’s of strings that are ~100 - 200 chars I ran that example code a couple of times here are my results
I’m still seeing 2-4x slower than the times that you have, were you on 0.3.3
?
I am on 0.3.6
(current master).
@brownjoshua490 Maybe the string serialization costs us. It could also be our crypto hashing (which is optional). Can you provide either a data set or a representative workload that I can test?
It definitely looks like it is a constant overhead here, because with history is now close to without history.
I created 100k shuffled strings of length 300 from https://github.com/replikativ/zufall/blob/master/src/zufall/core.clj#L4, randomized insertion order and used a different filesystem and I still have similar performance (7.3 secs to transact). @brownjoshua490 Not sure what to make of this.