Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike

timo 2021-05-17T08:41:21.056100Z

Hi @willier. I thought there is a datahike-backend for dynamodb... :thinking_face: but it is quite straight-forward to write a datahike-backend https://cljdoc.org/d/io.replikativ/datahike/0.3.6/doc/backend-development, I already did it recently for cassandra: https://github.com/timokramer/datahike-cassandra.

willier 2021-05-17T08:57:45.059500Z

Hi @timo, what does k/new-your-store do? i don't see the source for that, assume it creates the tables?

refset 2021-05-17T10:02:36.059700Z

@willier I think you need to look at the development branch: https://github.com/TimoKramer/datahike-cassandra/blob/development/src/datahike_cassandra/core.clj

👍 1

willier 2021-05-17T11:20:24.060700Z

ah thanks! @taylor.jeremydavid

👌 1

whilo 2021-05-17T18:16:43.064300Z

@brownjoshua490 First of all the file system backend got some improvements (Java's async nio basically) that are not in the official dependencies yet because we want to provide a seamless migration experience with @konrad.kuehne work on https://github.com/replikativ/wanderung/tree/8-dh-dh-version-migration, which is almost done. So to get the optimal performance you should add [io.replikativ/konserve "0.6.0-alpha3"] first. (The older store had too small buffer sizes and you basically hit Java's FileOutputStreams all the time. Other backends should not be affected by this problem.)

whilo 2021-05-17T18:17:00.064600Z

(ns sandbox
  (:require [datahike.api :as d]
            [taoensso.timbre :as t]))

(comment

  (t/set-level! :warn)

  (def schema [{:db/ident       :age
                :db/cardinality :db.cardinality/one
                :db/valueType   :db.type/long}])

  (def cfg {:store  {:backend :file :path "/tmp/datahike-benchmark"}
            :keep-history? false ;; true
            :schema-flexibility :write
            :initial-tx schema})

  (d/delete-database cfg)

  (d/create-database cfg)

  (def conn (d/connect cfg))

  (time
   (do
     (d/transact conn
                 (vec (for [i (range 100000)]
                        [:db/add (inc i) :age i])))
     nil))

  ;; "Elapsed time: 7387.64224 msecs"
  ;; with history: "Elapsed time: 14087.425566 msecs"

  (d/q '[:find (count ?a)
         :in $
         :where [?e :age ?a]]
       @conn) ;; =&gt; 100000
  )

whilo 2021-05-17T18:19:00.066Z

Then this is the behaviour on my machine. It took me around 5 seconds (100k/5 = 20k datoms/sec) to transact this last time I checked this microbenchmark 3 months ago, so we might have introduced a slight performance regression in the last releases or my machine got slower.

whilo 2021-05-17T18:19:42.066700Z

We currently do not write to the history index in parallel, something that should bring this number closer to 7 seconds (that is why it is approximately double).

whilo 2021-05-17T18:20:46.067800Z

Of course this is just a microbenchmark, and as I mentioned, measures bulk throughput. We know how to add a buffer to the transactor to saturate at a similar speed for finer transaction granularity, but we have not done this work yet.

whilo 2021-05-17T18:29:22.069100Z

@brownjoshua490 You should definitely see a better performance around 4-5k datoms/sec even on the old file store though. Maybe your Datoms contain a lot of data?

Josh 2021-05-17T19:33:17.071200Z

@whilo Thanks for the example. Our datoms are string heavy, lot’s of strings that are ~100 - 200 chars I ran that example code a couple of times here are my results

Josh 2021-05-17T19:34:16.072Z

I’m still seeing 2-4x slower than the times that you have, were you on 0.3.3?

whilo 2021-05-17T21:04:59.072600Z

I am on 0.3.6 (current master).

whilo 2021-05-17T21:06:33.073800Z

@brownjoshua490 Maybe the string serialization costs us. It could also be our crypto hashing (which is optional). Can you provide either a data set or a representative workload that I can test?

whilo 2021-05-17T21:21:33.074700Z

It definitely looks like it is a constant overhead here, because with history is now close to without history.

whilo 2021-05-17T22:11:07.076700Z

I created 100k shuffled strings of length 300 from https://github.com/replikativ/zufall/blob/master/src/zufall/core.clj#L4, randomized insertion order and used a different filesystem and I still have similar performance (7.3 secs to transact). @brownjoshua490 Not sure what to make of this.