Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/

richiardiandrea 2021-03-01T04:11:58.356600Z

Does anyone print logging data in edn directly? The goal would be to copy paste directly to the repl. Edit: of course only in Dev mode 😃

richiardiandrea 2021-03-01T20:02:22.392800Z

Yes I was thinking about that but it would require more work

2021-03-01T04:39:21.358600Z

Pedestal at one time logged in edn to some degree

2021-03-01T04:40:32.359300Z

https://gist.github.com/hiredman/64bc7ee3e89dbdb3bb2d92c6bddf1ff6 is a little library for using java util logging to log in edn

NoahTheDuke 2021-03-02T18:53:49.457100Z

this looks super cool. do you have any examples of usage?

2021-03-02T19:17:16.459400Z

https://gist.github.com/hiredman/3443693c5994a8b0bb0a41f068107abd

NoahTheDuke 2021-03-02T19:28:15.459600Z

awesome, thank you!

dpsutton 2021-03-01T05:02:38.359700Z

i almost linked to that. i use it constantly now

👍 1

2021-03-01T05:05:46.360700Z

People get excited about macros writing macros, but what about non-macros writing macros

🤯 2

p-himik 2021-03-01T08:24:21.362700Z

> non-macros writing macros Oh, I am one! :)

pez 2021-03-01T11:34:47.368Z

What impresses me the most are those non-macros that write macros that write macros.

borkdude 2021-03-01T12:47:12.380100Z

You mean functions that emit code as a string / .clj file? Legit.

2021-03-01T18:30:53.392600Z

I didn't mean for this to be enigmatic, if you look back in main chat there is a gist I posted of some code, and it generates macros by doseq'ing over a list, interning some functions, then call the setMacro method on the var

richiardiandrea 2021-03-01T05:18:54.361Z

Thank you! that's what this code base is actually using, will check the link

richiardiandrea 2021-03-01T05:19:50.361200Z

A bit magical indeed 😃

flowthing 2021-03-01T06:40:15.361600Z

There’s https://github.com/BrunoBonacci/mulog.

Jeongsoo Lee 2021-03-01T06:57:14.361900Z

I am unsure that's possible.

pez 2021-03-01T11:27:56.366900Z

Is there some kind of pfilter around? Like pmap. with its nice interface similarity with map. The lack of pfilter in the core library makes me think I might not be reasoning correctly about the problem… (Which is to filter a sequence of integers as fast as I possibly can. 😄 )

pez 2021-03-02T08:00:37.413500Z

If I use criterium/quick-bench instead, the transduce and reducers wins are a bit more appearant:

filter
Evaluation count : 30 in 6 samples of 5 calls.
             Execution time mean : 23,654346 ms
    Execution time std-deviation : 301,776435 µs
   Execution time lower quantile : 23,352585 ms ( 2,5%)
   Execution time upper quantile : 24,050820 ms (97,5%)
                   Overhead used : 14,507923 ns
transduce
Evaluation count : 36 in 6 samples of 6 calls.
             Execution time mean : 20,129352 ms
    Execution time std-deviation : 595,084459 µs
   Execution time lower quantile : 19,646010 ms ( 2,5%)
   Execution time upper quantile : 21,079716 ms (97,5%)
                   Overhead used : 14,507923 ns
core.reducers/filter
Evaluation count : 36 in 6 samples of 6 calls.
             Execution time mean : 17,903643 ms
    Execution time std-deviation : 186,971423 µs
   Execution time lower quantile : 17,675913 ms ( 2,5%)
   Execution time upper quantile : 18,138291 ms (97,5%)
                   Overhead used : 14,507923 ns
core.async/pipeline
Evaluation count : 22230 in 6 samples of 3705 calls.
             Execution time mean : 27,089463 µs
    Execution time std-deviation : 136,898899 ns
   Execution time lower quantile : 26,919838 µs ( 2,5%)
   Execution time upper quantile : 27,291471 µs (97,5%)
                   Overhead used : 14,507923 ns

(For some reason, it fails to measure the pipeline code. It doesn’t in my real code.)

pez 2021-03-02T08:44:05.413900Z

Interestingly (to me, at least 😄 ) for performs on par with transduce with this task:

(println "for")
    (quick-bench #_time
                 (count
                  (for [i every-other
                        :when (aget ba i)]
                    i)))

for
Evaluation count : 36 in 6 samples of 6 calls.
             Execution time mean : 19,456574 ms
    Execution time std-deviation : 100,364503 µs
   Execution time lower quantile : 19,370228 ms ( 2,5%)
   Execution time upper quantile : 19,618312 ms (97,5%)
                   Overhead used : 14,507923 ns

p-himik 2021-03-02T09:34:07.418600Z

Out of curiosity - what about a plain loop?

raspasov 2021-03-02T09:36:01.420Z

@pez I see… Ok, I think the biggest gains from pipeline are to be had when the pipeline transducer is CPU intensive (think like parsing HTML into data, file compression, etc); here you have a pretty straightforward xf (filter #(aget ba %)) Also, I think 1,000,000 samples is not that much really, so (pipeline …) would be suffering from all the channel, etc overhead of passing the data around;

raspasov 2021-03-02T09:38:51.420600Z

Also, a sidenote, (time …) is almost never a good benchmark strategy (but quick-bench is); I’ve seen cases where a simple (time …) benchmark would be “slow” but quick-bench would actually show a huge improvement since the JVM does its JIT magic and code really speeds up after a few iterations in some cases;

raspasov 2021-03-02T09:39:27.420800Z

I think that’s a good idea @p-himik (loop []…)

raspasov 2021-03-02T09:39:59.421Z

That’s probably the fastest thing you can get in terms of raw single thread perf… pretty much Java speed;

pez 2021-03-02T10:08:36.421400Z

BOOM

(println "loop")
(quick-bench #_time
             (count
              (loop [res []
                     i 1]
                (if (&lt;= i n)
                  (recur (if (aget ba i)
                           (conj res i)
                           res)
                         (+ i 2))
                  res))))

loop
Evaluation count : 84 in 6 samples of 14 calls.
             Execution time mean : 7,518441 ms
filter
Evaluation count : 30 in 6 samples of 5 calls.
             Execution time mean : 23,020098 ms
transduce
Evaluation count : 36 in 6 samples of 6 calls.
             Execution time mean : 19,090405 ms
core.reducers/filter
Evaluation count : 42 in 6 samples of 7 calls.
             Execution time mean : 16,328693 ms
for
Evaluation count : 36 in 6 samples of 6 calls.
             Execution time mean : 19,678977 ms

raspasov 2021-03-02T10:14:27.421700Z

Yup, loop is the king 🙂

raspasov 2021-03-02T10:14:48.421900Z

If you really care about perf. I highly recommend YourKit

raspasov 2021-03-02T10:15:18.422100Z

I bet it will help you gain 50% in no time

raspasov 2021-03-02T10:15:44.422300Z

I’ve used it, it’s like magic; the gains will come from a place you least expect… some reflection call that’s using 50% of your CPU time

p-himik 2021-03-02T10:16:47.422600Z

@pez Now try making res a transient. :)

pez 2021-03-02T10:17:26.422800Z

transient, huh? Doin’ it!

raspasov 2021-03-02T10:18:51.423Z

Try also unchecked-math 🙂

➕ 1

p-himik 2021-03-02T10:20:26.423300Z

In my previous adventure with single-threaded high perf, I ended up writing a Java class. :D All my data consisted of integers and Clojure doesn't really like them.

raspasov 2021-03-02T10:20:31.423500Z

Also, http://clojure-goes-fast.com (various ideas how to go fast)

pez 2021-03-02T10:21:25.423900Z

I’ll be trying YourKit too. Though only out of curiosity really. I don’t have performance tasks often. This is a little toy challenge I have, mainly to learn more about Clojure. I profile it with tufte right now, which is pretty nice.

pez 2021-03-02T10:24:11.424400Z

Seems like I should be able to parallelize the loop, no?

p-himik 2021-03-02T10:25:25.424600Z

Absolutely, your problem is a textbook map(filter)/reduce problem.

pez 2021-03-02T18:34:35.448200Z

transient shaves some more of the time, as hinted at 😃

loop
             Execution time mean : 7,704050 ms
loop-transient
             Execution time mean : 5,017702 ms
filter
             Execution time mean : 24,047486 ms
transduce
             Execution time mean : 19,687393 ms
core.reducers/filter
             Execution time mean : 17,303117 ms
for
             Execution time mean : 21,142251 ms

👍 1

pez 2021-03-02T19:52:42.463400Z

Unchecked math doesn’t seem to make much of a difference for the particular problem.

p-himik 2021-03-02T19:58:10.463600Z

I think that's because there's only a single math operation there, and its arguments' types are well known by the compiler. If you really want to pursue it further, I would try to get the bytecode for that code and see if there's something fishy going on. I've had some success with https://github.com/gtrak/no.disassemble/ and https://github.com/clojure-goes-fast/clj-java-decompiler before.

pez 2021-03-02T20:57:40.468700Z

Unchecked doesn’t attract me so much. I would rather to figure out how to parallellize it. I can’t immediately see how:

(quick-bench #_time
                 (count
                  (loop [res []
                         i 1]
                    (if (&lt;= i n)
                      (recur (if (aget ba i)
                               (conj res i)
                               res)
                             (+ i 2))
                      res))))

p-himik 2021-03-02T20:59:10.468900Z

- Split ba into N chunks - For each chunk, run a thread that creates its own res - Combine the resulting collection of res vectors in a single vector, preserving the order

p-himik 2021-03-02T20:59:51.469100Z

Just out of interest - why (+ i 2)? Does ba store something unrelated at even indices?

pez 2021-03-02T21:15:13.469300Z

Yes, I am only interested in the odd indices. ba contains the results of an eratosthenes sieve, where I have skipped sieving even numbers, b/c we all know there’s only one even prime number. 😃

pez 2021-03-02T21:16:46.469500Z

I was hoping there was some reducer or something that would do all those steps for me.

p-himik 2021-03-02T21:19:10.469700Z

Oh, is that code just to find prime numbers up to n? If so, then even constructing the sieve could be made parallel. And I'm 95% certain there's already a Java library that does it. :)

pez 2021-03-02T21:19:36.469900Z

Haha, I’m in this to learn about Clojure. 😃

pez 2021-03-02T21:20:24.470100Z

That code is only to pick out the prime numbers I have found up to n.

pez 2021-03-02T21:21:49.470300Z

Here’s the full thing, using loop and transient:

(defn pez-ba-loop-transient-sieve [^long n]
  (let [primes (boolean-array (inc n) true)
        sqrt-n (int (Math/ceil (Math/sqrt n)))]
    (if (&lt; n 2)
      '()
      (loop [p 3]
        (if (&lt; sqrt-n p)
          (loop [res (transient [])
                 i 3]
            (if (&lt;= i n)
              (recur (if (aget primes i)
                       (conj! res i)
                       res)
                     (+ i 2))
              (concat [2] (persistent! res))))
          (do
            (when (aget primes p)
              (loop [i (* p p)]
                (when (&lt;= i n)
                  (aset primes i false)
                  (recur (+ i p p)))))
            (recur  (+ p 2))))))))

pez 2021-03-02T21:23:42.471600Z

I haven’t ventured into how to speed up the sieving (beyond the obvious optimizations) b/c most of the time has been spent in picking out the indicies from the sieve.

pez 2021-03-03T21:30:50.029800Z

I’m trying to figure out how to parallelize the work with converting my byte-array to indexes. Parallelizing with filter was so easy that I get surprised by how much things grow when I try to do it with the loop. I have this so far.

(comment
  (let [n 1000000
        ba (boolean-array n)
        prob 0.15
        sample-size (long (* n prob))]
    
    (doseq [i (take sample-size
                    (random-sample prob (range n)))]
      (aset ba i true))
    
    (let [^ExecutorService
          service (Executors/newFixedThreadPool 6)
          ^Callable
          mk-collector (fn [^long start ^long end]
                         (fn []
                           (loop [res (transient [])
                                  i start]
                             (if (&lt;= i end)
                               (recur (if (aget ba i)
                                        (conj! res i)
                                        res)
                                      (+ i 2))
                               (persistent! res)))))
          num-slices 10
          slice-size (/ n num-slices)]
      (doseq [[start end] (partition
                           2
                           (interleave
                            (range 1 (inc n) slice-size)
                            (range slice-size (inc n) slice-size)))
              :let [f (.submit service (mk-collector start end))]]
        @f))))

There are two unsolved things here: 1. My future f contains nil even though I know that the collector I create with mk-collector produces the collection I want. 2. I don’t know how to combine my slices in the order I start the threads. And also. This is slower than my single thread solution. Not by very much, but anyway. Am I even on the right track?

p-himik 2021-03-03T21:39:28.030100Z

Things that I notice immediately: - ^Callable there marks mk-collector and not the result of calling (mk-collector ...). And that, I think, it useless because the compiler already knows that mk-collector is callable. If you want to say "mk-collector returns a callable", you have to tag its arguments list. - Don't deref within doseq - this way, you start a thread and immediately wait for its completion, then start the second one, and so on. Instead, create a vector of futures and only then deref all of them in order. And that will be the exact order in which you have created them. In fact, you can deref them in order in reduce - it even optimizes it further, albeit not substantially since all threads have roughly the same amount of work in your case. - You start 6 threads but create 10 slices - why? Choose the number of threads you want to have and create the same amount of slices, one per thread. - That (partition ...) form makes my head spin. I have a strong feeling that whatever it does could be rewritten in a much simpler way in the overall context. I might be wrong though.

pez 2021-03-03T21:41:21.030400Z

Thanks. About the partitition… I’m sure you are right. I had it hard coded at first and then just translated the way I hard coded it. 😃

pez 2021-03-03T21:42:46.030700Z

Threads vs slices. I tried using 10 for both, but it didn’t make a difference. I have six cores on my machine so went for that, but my partition blows up with 6 slices. Haha.

p-himik 2021-03-03T21:43:22.030900Z

> partition blows up What exactly does that mean? It was working just fine with 1 huge slice after all.

p-himik 2021-03-03T21:43:47.031100Z

It makes a difference in the overall code - you won't need any explicit executor, you would be able to just use future.

p-himik 2021-03-03T21:44:16.031300Z

And it also should make some performance difference as well. It might not be noticeable in this context, but in general it should exist.

pez 2021-03-03T21:47:37.031700Z

Blows up means that if my start, end , indices get out of whack and I get index out of bounds errors. I didn’t want to focus on this before I have got the basic infrastructure right.

p-himik 2021-03-03T21:48:37.032Z

Ah, it just means that your partition incantation is incorrect. :) It has nothing to do with threads.

pez 2021-03-03T21:52:23.032200Z

Yeah, nothing to do with threads. I just didn’t succeed with this naive partition to create 6 slices for my 6 threads. But it won’t matter if I don’t need the executor service, anyway.

p-himik 2021-03-03T21:52:23.032400Z

You should end up with something like this:

(let [n-partitions 6
      ;; Notice how it says `mapv` and not `map` - this is important.
      ;; You want to be eager to start all the futures right away.
      futures (mapv (fn [partition]
                      (future
                        %magic%))
                    (range n-partitions))]
  (into []
        (mapcat deref)
        futures))

❤️ 1

pez 2021-03-03T22:06:38.033Z

Yes. now it runs about 2X faster than the non-future version. And, even produces the right result.

pez 2021-03-03T22:07:15.033200Z

(let [mk-collector (fn [^long start ^long end]
                           (fn []
                             (loop [res (transient [])
                                    i start]
                               (if (&lt;= i end)
                                 (recur (if (aget ba i)
                                          (conj! res i)
                                          res)
                                        (+ i 2))
                                 (persistent! res)))))
            num-slices 10
            slice-size (/ n num-slices)
            slices (partition
                    2
                    (interleave
                     (range 1 (inc n) slice-size)
                     (range slice-size (inc n) slice-size)))
            futures (mapv (fn [[start end]]
                            (future
                              ((mk-collector start end))))
                          slices)]
        (into []
              (mapcat deref)
              futures))

p-himik 2021-03-03T22:13:20.034600Z

Great! Although I would personally inline mk-collector. Using (( is a hint to that. And you can do all the partition work inside future, thus making it parallel as well.

pez 2021-03-03T22:14:41.035800Z

The partition work takes zero time though?

pez 2021-03-03T22:15:48.036300Z

Interesting that you suggest inlining mk-collector. I thought the same, but it then started to take 3X more time…

p-himik 2021-03-03T22:16:12.036500Z

Depends on its inputs. But moving it inside futures will make the code much simpler.

pez 2021-03-03T22:19:20.037Z

With the above I have Execution time mean : 3,113634 ms Inlining:

(let [num-slices 10
            slice-size (/ n num-slices)
            slices (partition
                    2
                    (interleave
                     (range 1 (inc n) slice-size)
                     (range slice-size (inc n) slice-size)))
            futures (mapv (fn [[start end]]
                            (future
                              (loop [res (transient [])
                                     i start]
                                (if (&lt;= i end)
                                  (recur (if (aget ba i)
                                           (conj! res i)
                                           res)
                                         (+ i 2))
                                  (persistent! res)))))
                          slices)]
        (into []
              (mapcat deref)
              futures))))

Execution time mean : 10,080578 ms

p-himik 2021-03-03T22:22:17.037200Z

How about this? I haven't tested it, might not even work:

(let [num-slices 10
        slice-size (int (/ n num-slices))
        offset 2
        _ (assert (zero? (mod slice-size offset))
                  "Dealing with slices that have fractional chunks would be too complicated.")
        futures (mapv (fn [slice-idx]
                        (future
                          (let [start (* slice-idx slice-size)
                                end (if (= slice-idx (dec num-slices))
                                      n
                                      (+ start slice-size))]
                            (loop [res (transient [])
                                   i start]
                              (if (&lt; i end)
                                (recur (cond-&gt; res
                                         (aget ba i) (conj! i))
                                       (+ i offset))
                                (persistent! res))))))
                      (range num-slices))]
    (into []
          (mapcat deref)
          futures))

p-himik 2021-03-03T22:25:14.037500Z

The difference in your code is not only inlining but also the lack of type hints. Try adding ^long wherever necessary. To help further analyze such issues, always do this:

(set! *unchecked-math* :warn-on-boxed)
(set! *warn-on-reflection* true) ;; Doubt it will be useful here, but it's useful in general.

pez 2021-03-03T22:34:40.043600Z

I simplified the ranges similar to what you suggest here. Did try throwing in type hints, but didn’t seem to bite. Will look closer at where you suggest they should be…

pez 2021-03-03T23:26:28.044300Z

Unfortunately it doesn’t gain me the slightest in with my prime number sieve. 😃 But this was very, very good for me to investigate and get to know a bit about, so I am good and happy. Many thanks for the guidance!

p-himik 2021-03-03T23:44:03.044500Z

Sure thing. I'm actually quite curious for why extracting that fn makes the code faster.

pez 2021-03-03T23:49:31.048200Z

Oh, it didn't in the end. Setting those warning levels helped me find where I lost the time. Using quot instead of / fixed it.

👍 1

pez 2021-03-01T11:33:06.367Z

I might add that the filter predicate is fast, afaik. So this note on pmap seems to tell me I should be looking for other ways to speed the process up: > Only useful for computationally intensive functions where the time of f dominates the coordination overhead.

dharrigan 2021-03-01T11:33:54.367300Z

Might this be of use? https://github.com/reborg/parallel

dharrigan 2021-03-01T11:34:25.367600Z

Although it doesn't have a pfilter, it works with transducers so you can supply a filter

dharrigan 2021-03-01T11:34:45.367800Z

<https://github.com/reborg/parallel#pfold-pxrf-and-pfolder>

borkdude 2021-03-01T11:38:47.368900Z

If your predicate is fast, why do you need pmap at all?

borkdude 2021-03-01T11:39:04.369100Z

because the collection is huge?

pez 2021-03-01T11:39:13.369300Z

Thanks @dharrigan! I’ll have a look!

borkdude 2021-03-01T11:39:17.369500Z

in this case you might be better off with reducers perhaps

pez 2021-03-01T11:40:14.369700Z

Yes, the collection can potentially be huge, and then I want it to go much quicker than it does today.

pez 2021-03-01T11:42:20.369900Z

So, I filter 500K in 20ms and imagine that if all 6 cores of my machine took a slice each it would be done in less than 4ms. 😃

borkdude 2021-03-01T11:42:44.370100Z

@pez I don't think pmap will buy you anything here. Take a look at clojure.core.reducers

borkdude 2021-03-01T11:43:53.370300Z

reducers will slice the collection in multiple parts and then do the work on each slice in separate threads and then concat the result

borkdude 2021-03-01T11:43:57.370500Z

this is not how pmap works

pez 2021-03-01T11:44:08.370700Z

I will. Interestingly @dharrigan linked to pfold form that parallell lib. 😃

vemv 2021-03-01T11:45:40.371Z

I tend to want pfilter from time to time, but always procastinate implementing one (that also suits my sensibilities) my usual workaround is to run the predicate through pmap and then use a vanilla filter identity as the next step (which won't be parallel, but can be assumed to be fast since identity is a simple pred)

pez 2021-03-01T11:45:52.371200Z

Also interesting that in that beginner’s guide to Clojure I am writing, yesterday I wrote “I won’t be going into reducers here”. 😃

borkdude 2021-03-01T11:46:22.371400Z

That approach only helps if the predicate itself is slow

raspasov 2021-03-01T11:46:48.371600Z

Have you considered the core.async pipeline utils?

pez 2021-03-01T11:46:51.371800Z

My predicate is an index lookup in a boolean array.

vemv 2021-03-01T11:47:08.372Z

ah whoops, didn't read I might add that the filter predicate is fast

raspasov 2021-03-01T11:47:35.372300Z

They are quite powerful and nice to use in my experience https://clojuredocs.org/clojure.core.async/pipeline

p-himik 2021-03-01T11:47:38.372500Z

If you do a lot of number crunching, perhaps using Neanderthal would be worth it. Map/reduce tutorial section: https://neanderthal.uncomplicate.org/articles/tutorial_native.html#fast-mapping-and-reducing

pez 2021-03-01T11:48:58.372800Z

Not considered core.async, @raspasov, I started to think about the option to parallellize this some minutes before I asked the question and hadn’t found pfilter in the core library.

pez 2021-03-01T11:50:14.373Z

I’ll have a look at that. Even if the number crunching is done for the particular task. It takes .3 ms and then filtering out the results take 20ms. Very frustrating!

raspasov 2021-03-01T11:50:44.373200Z

@pez pipeline would allow you to write a transducer like (filter my-fn) and then just give it “n” (defonce p1 (pipeline 21 to-ch (filter my-fn) from-ch))

dharrigan 2021-03-01T11:50:44.373400Z

....`does the database dance`...

dharrigan 2021-03-01T11:50:54.373600Z

🙂 From neanderthal 🙂

p-himik 2021-03-01T11:51:17.373800Z

Do note that Neanderthal is IIRC hundreds of MBs because it requires BLAS and/or MKL.

raspasov 2021-03-01T11:51:21.374Z

Then simply start put! -ing elements onto to-ch

pez 2021-03-01T11:52:07.374200Z

Sounds nice!

pez 2021-03-01T11:52:33.374400Z

(pipeline, not hundreds o MBs 😄 )

raspasov 2021-03-01T11:52:37.374600Z

… and receive the filtered result onto ‘from-ch’

raspasov 2021-03-01T11:52:51.374800Z

actually…. reverse

raspasov 2021-03-01T11:53:05.375Z

start with ‘from-ch’

raspasov 2021-03-01T11:53:09.375200Z

receive in ‘to-ch’

borkdude 2021-03-01T11:53:16.375400Z

Hm, if these results are coming from a database, you might be able to do this work inside the database instead (dharrigan's database word triggered that thought)

raspasov 2021-03-01T11:53:20.375600Z

Hopefully that was clear 🙂

raspasov 2021-03-01T11:54:18.375800Z

clojure docs has some nice examples of pipeline

raspasov 2021-03-01T11:54:35.376Z

Here https://clojuredocs.org/clojure.core.async/pipeline

vemv 2021-03-01T11:58:35.376200Z

Wondering if using transducers and no parallelization would result in a noticeable speedup (at the very least it tends to be more memory-efficient)

raspasov 2021-03-01T12:01:42.376400Z

@vemv It depends on what you’re doing… it can be significant but rarely an order of magnitude improvement (just switching from collections to transducers without something like pipeline)

raspasov 2021-03-01T12:02:41.376600Z

(pipeline …) really shines if you have a big server with many real cores and a bunch of tasks that you need to get done in parallel and they require minimal coordination (for example, web scraping)

raspasov 2021-03-01T12:04:22.377500Z

I’ve launched a server on AWS with 32+ cores and used pipeline… it’s pretty neat

pez 2021-03-01T12:28:04.378500Z

The easiest one to test was transduce. It gained me 10%. Next thing to try is reducers, I think. But later, my lunch break is over. 😃

restenb 2021-03-01T12:43:23.379600Z

so I have this chain of async events where I need to wait on a status conditions for each step in order to proceed to the next. struggling a bit with how to structure this code. currently looks like this:

restenb 2021-03-01T12:46:55.379800Z

restenb 2021-03-01T12:47:27.380600Z

looking for input on how to handle this sort of thing. gets pretty ugly when we're talking about a chain of 8-10 steps.

borkdude 2021-03-01T12:47:49.381100Z

This is called callback hell. You might be able to structure this better using core async

borkdude 2021-03-01T12:48:09.381500Z

or some monadic library like promesa maybe (never tried it)

borkdude 2021-03-01T12:48:53.382Z

The way you write your code right now it seems you're not doing it async btw, it seems like a series of sync operations

borkdude 2021-03-01T12:49:38.382800Z

Recently someone showed me how he used https://github.com/adambard/failjure to solve this kind of problem

restenb 2021-03-01T12:49:41.382900Z

yeah it is. each wait-for hides a loop polling some HTTP API for a specific status

borkdude 2021-03-01T12:50:28.383900Z

You might also be able to use an async http lib like httpkit or a java 11 based one

restenb 2021-03-01T12:50:50.384400Z

the whole thing is essentially one big sync operation in that each step absolutely needs to wait for the next before proceeding

restenb 2021-03-01T12:51:07.384800Z

but consisting of async HTTP calls underneath

borkdude 2021-03-01T12:51:39.385300Z

then handle the error/success in the http callback

borkdude 2021-03-01T12:54:14.385900Z

you either wait, or you async, there's no waiting + async

borkdude 2021-03-01T12:55:03.386900Z

unless you are waiting for a promise that gets delivered by an async request for example

restenb 2021-03-01T12:55:13.387200Z

hm. the whole point of this was to wait different amounts of time for each step before issuing a timeout to the client, but iirc you can perhaps do something like

restenb 2021-03-01T12:56:31.387400Z

restenb 2021-03-01T12:57:00.387900Z

the http client (clj-http) returns futures if you tell it to (which I hadn't)

borkdude 2021-03-01T12:57:33.388200Z

it might be better to set the timeout on the request though, if possible

restenb 2021-03-01T12:58:48.389300Z

nah the request doesn't time out. you get a response with current status from the API. so this won't really work either

restenb 2021-03-01T13:01:31.389700Z

so it's essentially a daisy chain of polling loops

raspasov 2021-03-01T13:09:46.390600Z

I feel like I’m having a core.async day 🙂 (already talked about pipeline elsewhere) @restenb https://clojuredocs.org/clojure.core.async/pipeline-async might be helpful for your case

restenb 2021-03-01T13:11:31.391200Z

@raspasov i'll take a look, thanks

Elso 2021-03-01T16:36:39.392100Z

lately I got massive hangups when retrieving libs from central, using leiningen 2.9.5 - is that a lein problem or maven?

Elso 2021-03-01T16:38:42.392200Z

always hangs on different libs and restarting deps a few times eventually succeeds

pez 2021-03-01T17:37:39.392400Z

The gain from reducers is a tad better, but still nothing major. I don’t quite understand why. Next experiement will be pipeline but I expect it to not help too much either, because I suspect I have not analyzed the problem correctly.

dpsutton 2021-03-01T21:08:24.394100Z

i'm surprised i've never made this silent but terrible error before [{:keys [a :as thing]}]. thing here is not the whole object being destructured. my eyes glanced right over it for a while

🙂 1

alexmiller 2021-03-01T21:19:24.394500Z

syntactically, that's all legal :)

alexmiller 2021-03-01T21:20:33.395100Z

but :as in a :keys list seems like something you could lint

alexmiller 2021-03-01T21:20:46.395400Z

calling all @borkdude s

borkdude 2021-03-01T21:42:31.396700Z

@dpsutton @alexmiller clj-kondo will already kind of make you notice by saying that :as is an unused binding

❤️ 1

dpsutton 2021-03-01T21:43:41.397200Z

yeah. i guess i missed it in the font-locking for :as as a keyword

borkdude 2021-03-01T21:46:33.397800Z

Speaking about keywords, I'd like some input on this proposal for an :invalid-ident linter which will warn about things like :1.10.2: https://github.com/clj-kondo/clj-kondo/issues/1179

borkdude 2021-03-01T21:47:30.398700Z

The reason :1.10.2 is problematic is that when you take the name and convert it to a symbol and try to read that as EDN, it will fail for example. We ran into this issue when outputting keywords in the analysis.

pez 2021-03-01T22:29:41.398900Z

With pipeline things go about 200 times slower :thinking_face:

pez 2021-03-01T22:50:18.399200Z

I think pipeline might not be suited for parallellizing things that go fast.