Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>

2018-10-05T16:58:40.000100Z

I'm preparing batches of segments to insert into a database which requires large batch inserts to be efficient. Is a reduce task with a window and a trigger the way to do this or am I missing something obvious? batch-fn doesn't look appropriate since it requires that the number of segments produced to be the same as the number of input segments.

2018-10-05T18:13:49.000100Z

you can configure a batch-size of the output connector

2018-10-05T18:14:02.000100Z

that will make onyx (try to) work in batches of that size

2018-10-05T18:14:34.000100Z

if you combine that with, say, the postgresql copy implementation (part of onyx-sql), you can actually have fairly efficient inserts

2018-10-05T18:27:40.000100Z

Thanks @lmergen -- I was hoping for something like this. Looking at onyx-sql though it seems that it always inserts each segment as a separate transaction. Maybe I'll need to implement an output plugin myself. The database -- clickhouse -- accepts writes by http request, so jdbc isn't helpful anyway. https://github.com/onyx-platform/onyx-sql/blob/0.13.x/src/onyx/plugin/sql.clj#L168

2018-10-05T18:28:05.000200Z

ah right

2018-10-05T18:28:16.000100Z

i implemented copy support for postgresql about a year or so ago

2018-10-05T18:28:36.000100Z

you can take a look at it and see whether you can do something similar for clickhouse

2018-10-05T18:28:53.000100Z

it would be the best way to do it, anyway

2018-10-05T18:29:30.000100Z

2018-10-05T18:29:59.000100Z

In the case of the postgresql copy -- would the right approach be to have an upstream task that combined multiple rows into a single segment? Would this upstream step be using windows and triggers?

2018-10-05T18:30:43.000200Z

well, then you would need to make the sql plugin "understand" these batches -- since you would probably be putting vectors inside single segments in this case

2018-10-05T18:30:48.000100Z

i wouldn't go there

2018-10-05T18:31:23.000100Z

you will always want inserts to be as efficient as possible, so it makes the most sense to reuse the actual batches that onyx uses

2018-10-05T18:31:59.000100Z

then you can configure this using :onyx/batch-size in your task map

2018-10-05T18:33:49.000100Z

I think the sql plugin would accept this segment right? -- {:rows [{:col1 "row1-val"} {:col1 "row2-val"}]} But I see what you are saying about using the batching mechanism.

2018-10-05T18:34:29.000100Z

oh yeah that's correct

2018-10-05T18:35:29.000200Z

i think a window could work in this case, but.. you will probably want to have logic like "every 3 seconds or every 3000 rows, whichever is reached first"

2018-10-05T18:35:40.000100Z

i have found it difficult to define triggers like that

2018-10-05T18:35:48.000100Z

maybe @lucasbradstreet knows how to do that

2018-10-05T18:37:36.000100Z

I'm aiming more towards batches of 1 million rows and will have enough traffic that I don't need to worry about it taking too much time to accumulate enough rows

2018-10-05T18:38:36.000100Z

well then, luxury problems 🙂

2018-10-05T18:39:39.000100Z

Thanks for discussing -- The options I'm seeing are 1. Prepare the batches of rows using a window aggregation and then submit the results using the onyx-http output plugin 2. Roll the batch preparation and database submission into my own output plugin and rely on onyx/batch-size

2018-10-05T18:40:49.000100Z

if you need help with it, i would be happy to assist

2018-10-05T18:41:24.000100Z

i've been meaning to want to take a look at clickhouse anyway, been hearing a lot about it the past year 🙂

2018-10-05T18:44:06.000100Z

@lmergen Do you happen to know why each segment is inserted as a separate database call rather than batching all the segments into a single database call?

2018-10-05T18:44:41.000100Z

i think it's chosen initially out of simplicity

2018-10-05T18:44:48.000100Z

"make it work, then improve"

2018-10-05T18:47:54.000100Z

Got it -- so to write all the batches together, I would do something like:

(write-batch [this {:keys [onyx.core/write-batch]} replica messenger]
             (jdbc/with-db-transaction [conn pool]
               (insert-fn conn (mapcat :rows write-batch)))
             true)

2018-10-05T19:04:50.000100Z

yep, pretty much

👍 1

2018-10-05T19:05:16.000100Z

As a side question do you know of any examples using onyx.plugin.protocols/prepare-batch?

2018-10-05T19:09:48.000100Z

i saw this the other day: https://github.com/onyx-platform/onyx-amazon-s3/blob/0.13.x/src/onyx/plugin/s3_output.clj#L74

2018-10-05T19:10:25.000100Z

i think the idea is that prepare-batch should be pure

2018-10-05T19:10:36.000100Z

or rather, can be called multiple times without nasty side effects

2018-10-05T19:11:02.000100Z

but i hardly ever use prepare-batch myself

2018-10-05T19:13:29.000100Z

For clickhouse the rows need to be loaded into a large byte array using some classes provided by clickhouse, so I thought prepare-batch might be the right place to do this.

lucasbradstreet 2018-10-05T19:13:41.000100Z

If you use prepare batch it’s mostly to setup a buffer that you drain with write batch, since write batch may be called multiple times. Usually you can get away without it but it’s useful sometimes.

2018-10-05T19:14:26.000100Z

^ sounds like prepare batch is the right place then for what you want

🧡 1

lucasbradstreet 2018-10-05T19:14:35.000100Z

Right, that could be a good place to put it

🧡 1