clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/
jumar 2020-12-21T07:45:22.277400Z

In Joy of Clojure 2nd ed. (p. 253 - 255) they give a following example of making array mutations safe:

(defn make-safe-array [t sz]
  (let [a (make-array t sz)]
    (reify SafeArray
      (count [_] (clj/count a))
      (seq [_] (clj/seq a))
      ;; is locking really neccessary for aget? what could happen?
      (aget [_ i] (locking a
                    (clj/aget a i)))
      (aset [this i f] (locking a
                         (clj/aset a i (f (aget this i))))))))
(full sample here: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L280-L282) I'm wondering why they lock aget at all? Isn't it enough to lock aset? Why should I block readers while there's a write in progress?

jumar 2020-12-21T08:12:39.278100Z

Hmm, that might be it. But what would that mean? Like observing a half-set value? What would that even be?

p-himik 2020-12-21T08:23:34.278300Z

Maybe because of this: https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html

p-himik 2020-12-21T08:23:52.278600Z

I.e. same reason why we mark variables as volatile.

2020-12-21T08:34:57.278800Z

Exactly, you need something that orders both reads and writes respective of each other, otherwise the jvm can do things like read the array index once and cache it in a register, and just say your writes all happened after the read

roklenarcic 2020-12-21T10:07:15.280Z

@jumar locking emits a memory fencing instruction prevents operation reordering and that makes sure your CPU caches are synced. One thread might update a value in L1 cache (which are per-core) then another thread on another core might read the same value in it’s own L1 cache. Typically memory fence causes the changes to get pushed to L3 cache which isn’t per core. writing a volatile does the same, so generally for scalar values (int, long, writing a reference), volatile is sufficient

jumar 2020-12-21T11:30:41.285900Z

Ah right, so the read lock is there only to provide a fresh value - otherwised it could get cached; I think it's unlikely to happen here (I increment the array values in 100 concurrent threads, then read them all afterwards), maybe because the cache coherence protocol will actually fetch the proper value when it's modified by the aset operation (even when there's no lock in aget) I definitely couldn't find any consistency issue when removing the aget lock and testing it (https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L286-L290)

2020-12-21T14:07:09.288800Z

Such bugs are notoriously difficult to test for. Sometimes you may catch them with such tests, but there is no guarantee you will

jumar 2020-12-21T14:52:10.289200Z

Yeah, based on my understanding of JMM and memory consistency properties (https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/concurrent/package-summary.html#MemoryVisibility) they do the right thing in the book; in particular: > Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread. Here are some good resources dealing with more details regarding the notion of volatile et al meaning "flush to main memory" (which was the impression I got from reading some Java book a decade ago but found much later that this is likely false when reading about the MESI cache coherence protocol): • https://stackoverflow.com/questions/1850270/memory-effects-of-synchronization-in-javahttps://mechanical-sympathy.blogspot.com/2013/02/cpu-cache-flushing-fallacy.htmlhttps://stackoverflow.com/questions/42746793/does-a-memory-barrier-ensure-that-the-cache-coherence-has-been-completed/42750844#42750844

2020-12-21T14:52:53.289500Z

@jumar locks or not, there's a race condition here because the sequence can be constructed before the threads are done mutating. Look at the result of (-> (make-safe-array Integer/TYPE 8) (doto pummel) seq)

jumar 2020-12-21T14:55:12.289700Z

Oh yeah, you're right. I think they basically rely on the reader waiting until the threads are done (which is quick for a human experimenting in the REPL 🙂 ). ... in which case, I think, the read lock basically doesn't matter at all but would be the right thing to do for an operation happening immediately after a previous aset, right?

2020-12-21T15:08:40.292900Z

there's many possible reasons why you could see the latest value without explicit synchronization, but in general physical time is not something you should rely on

👍 1
p-himik 2020-12-21T07:58:12.277700Z

Likely because java.lang.reflect.Array/get doesn't say anything about it being thread-safe.

roklenarcic 2020-12-21T09:59:26.279900Z

What’s the default for clojure.compiler.direct-linking and elide-meta jvm options when doing a lein jar or lein uberjar ?

borkdude 2020-12-21T10:18:01.280700Z

@roklenarcic a build tool should not change these options unless the users asks for it

borkdude 2020-12-21T10:18:20.281100Z

code in an uberjar might still rely on non-direct linking or metadata for example

Niklas 2020-12-21T10:30:15.283700Z

Anyone here using vim with conjure in a monorepo? My issue is that I typically open files in multiple projects and it becomes tedious to launch the repl for every file. Is there a way to configure vim to find the projects root path and launch an nrepl-server in that dir?

dharrigan 2020-12-21T10:42:34.284100Z

I believe there is talk around adding that, you can check in the #conjure channel

Olical 2020-12-21T10:55:36.284500Z

I do but I don't start the REPL from nvim, I start a bunch of REPLs using a kinda custom docker-compose wrapper then I set up Conjure to connect to the right REPL depending on what dir I :cd into. Conjure allows you to work on multiple projects at a time by setting the :ConjureClientState [state-key]

Olical 2020-12-21T10:56:35.284700Z

At work, I set up a "cwd changed" autocmd that sets my ConjureClientState to the cwd path. So every time I :cd I get a fresh Conjure state with it's own nREPL connection and config.

Olical 2020-12-21T10:58:29.284900Z

You could set up something similar + use something like https://github.com/clojure-vim/vim-jack-in if you really want to start your REPL from within nvim. I still recommend setting up your REPLs outside of nvim with your own script though, ensure you write your .nrepl-port files into each sub-repo directory, then :cd into each module as you work on them and Conjure will auto connect. Then you can set up the autocmd to set the state as you hop around to have multiple concurrent connections.

augroup conjure_set_state_key_on_dir_changed
  autocmd!
  autocmd DirChanged * execute "ConjureClientState " . getcwd()
augroup END

Olical 2020-12-21T10:59:33.285200Z

I have a script that goes through my docker processes and maps the nREPL ports into .nrepl-port files in the correct directories of the mono repo. Making :cding into directories synonymous with connecting to them.

Olical 2020-12-21T11:00:07.285400Z

You can also discuss conjure over at https://conjure.fun/discord if you so wish 🙂

Niklas 2020-12-21T11:15:24.285700Z

I guess I can simply use a script to launch repls for all projects.. I guess it will eat some memory. Anyway, I joined #conjure so I'll ask future questions there.

alexmiller 2020-12-21T15:01:36.290200Z

by default those aren't used at all afaik

alexmiller 2020-12-21T15:01:47.290400Z

so no direct linking, no elide-meta

murtaza52 2020-12-21T15:07:03.292500Z

spec generators rely on the Clojure property testing library test.check. However, this dependency is dynamically loaded and you can use the parts of spec other than gen, exercise, and testing without declaring test.check as a runtime dependency. 
The above is from the spec guide where it speaks of loading the test.check lib. What does it mean to dynamically load a lib ? how does that work ?

alexmiller 2020-12-21T15:08:42.293200Z

if you do generator stuff, it will load the test.check.generator namespace. if you don't, then it won't.

alexmiller 2020-12-21T15:09:07.293700Z

so you can safely include test.check at test/repl time but exclude it at production time

kwladyka 2020-12-21T17:43:24.295500Z

:test-deps {:extra-paths ["test"]
                       :extra-deps {org.clojure/test.check {:mvn/version "1.0.0"}
                                    peridot/peridot {:mvn/version "0.5.2"}}}
           :run-tests {:extra-deps {com.cognitect/test-runner
                                   {:git/url "<https://github.com/cognitect-labs/test-runner>"
                                    :sha "209b64504cb3bd3b99ecfec7937b358a879f55c1"}}
                      :main-opts ["-m" "cognitect.test-runner"
                                  "-d" "test"]}
an example of adding test.check

popeye 2020-12-21T21:19:55.297900Z

(map (fn [k v]
                     (println " K " k)
                     (println " v " v)
                
                     (if-not (re-matches #"^[a-z]+\*$" (-&gt;str v))
                             nil
                             (-&gt;str v)))
                     {:id "john"}) 

popeye 2020-12-21T21:21:01.298800Z

Hello Team I am passing a map to anonymous function and wanted to validate the function and tried with below code ,but it is not working, how can I pass {:id "john"} to anonymous function ?

kwladyka 2020-12-21T21:21:53.299100Z

(fn [k v] …) is for 2 arguments. If you want to have key and value you need (fn [[k v]] …).

kwladyka 2020-12-21T21:23:13.299300Z

(defn foo [x1 x2 x3] ...) is the fn with 3 arguments (defn foo [x1 [k v] x3] ...) is the function with 3 arugments, but second one is destructed to [k v]

popeye 2020-12-21T21:24:31.299600Z

if we use reduce-kv then then parameter will be [k v] right? why so?

kwladyka 2020-12-21T21:24:33.299800Z

so it takes x2 which is [:keyword-foo “value”] and place this under k and v

kwladyka 2020-12-21T21:26:01.300Z

because it is different fn which get different parameters - in simply words 😉

kwladyka 2020-12-21T21:26:13.300200Z

it is designed to already get this parameters like that

kwladyka 2020-12-21T21:26:30.300400Z

while on the beginning it can look confusing later it is very intuitive

👍 1
kwladyka 2020-12-21T21:27:44.300600Z

so it already destruct this value for you

popeye 2020-12-21T21:28:04.300900Z

Thanks @kwladyka

kwladyka 2020-12-21T21:28:09.301100Z

no problem

kwladyka 2020-12-21T21:31:39.301300Z

There was website with challenging tasks to transform data where you can try to solve this online. After all you can compare your solutions to the best solutions made by other people. This is really god place to start.

kwladyka 2020-12-21T21:31:43.301500Z

But I forgot the URL

popeye 2020-12-21T21:36:34.302300Z

is that 4 clojures?

kwladyka 2020-12-21T21:37:13.302500Z

indeed!

kwladyka 2020-12-21T21:37:38.302700Z

At least it is how I was learning many years ago

popeye 2020-12-21T21:38:55.302900Z

(map (fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false)){:id "john"})

popeye 2020-12-21T21:39:20.303100Z

in this case it is returning (true) or (false) as list

popeye 2020-12-21T21:39:33.303300Z

how can convert that to get as boolean?

kwladyka 2020-12-21T21:40:26.303500Z

the is you are using map not in right context

kwladyka 2020-12-21T21:41:16.303700Z

{:id "John"} is a map, but you want to use map functions on collection like [{:id "John"} {:id "Popeye"}]

kwladyka 2020-12-21T21:41:38.303900Z

map return a list

kwladyka 2020-12-21T21:42:03.304100Z

so it process each element in vector and return the output of your function

kwladyka 2020-12-21T21:42:28.304300Z

if you want to process only one map {:id “John”}, then not use map function

popeye 2020-12-21T21:42:53.304500Z

what we can use if we have only 1 key and value?

kwladyka 2020-12-21T21:43:13.304700Z

just remove map from there

kwladyka 2020-12-21T21:43:57.304900Z

ok, this will be not enough 🙂

kwladyka 2020-12-21T21:45:25.305100Z

(map println {:id "john" :foo "bar"})
[:id john]
[:foo bar]
=&gt; (nil nil)
(map println [{:id "john"} {:foo "bar"}])
{:id john}
{:foo bar}
=&gt; (nil nil)

kwladyka 2020-12-21T21:46:35.305300Z

Do you see what I mean?

popeye 2020-12-21T21:47:10.305500Z

there is no side efect?

kwladyka 2020-12-21T21:47:20.305700Z

What do you mean by side effect?

popeye 2020-12-21T21:47:34.305900Z

not returning nil?

kwladyka 2020-12-21T21:47:48.306100Z

println return nil

popeye 2020-12-21T21:49:35.306400Z

yes

popeye 2020-12-21T21:51:48.306600Z

the function returning vice versa? like if you apply on map it returning as vector?

kwladyka 2020-12-21T21:53:06.306800Z

I don’t understand the question. The logic is map take each element from collection and run function with this element. The result is returned by list.

kwladyka 2020-12-21T21:53:38.307Z

so map get from vector {:id john} and run (println {:id john} which return nil etc.

popeye 2020-12-21T21:56:58.307200Z

yes, I got the functionality of map, In my logic i want to take key value which will be single map element and do pattern patching and result us true or false

kwladyka 2020-12-21T21:57:59.307800Z

if you want to operate on single map, then you don’t need to use map as a function at all

kwladyka 2020-12-21T21:58:58.308Z

unless you want to operate on each pair of key and value in map, then map is ok

popeye 2020-12-21T21:59:35.308300Z

yeah, i got the error while running this function

popeye 2020-12-21T21:59:36.308500Z

((fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false){:id "john"}))

popeye 2020-12-21T21:59:53.308700Z

is that the way we call? sorry first time i am writing this

kwladyka 2020-12-21T21:59:59.308900Z

[[k v]] is not correct anymore

popeye 2020-12-21T22:00:19.309100Z

oops

popeye 2020-12-21T22:00:47.309300Z

how can we achive then, I am passing single key and value

kwladyka 2020-12-21T22:01:01.309500Z

((fn [m]
   (println m))
 {:foo "bar" :x "y"})
{:foo bar, :x y}
=&gt; nil
(map (fn [m]
   (println m))
 {:foo "bar" :x "y"})
[:foo bar]
[:x y]
=&gt; (nil nil)

kwladyka 2020-12-21T22:01:51.309700Z

((fn [m]
   (println (:foo m)))
 {:foo "bar" :x "y"})
bar
=&gt; nil
if you want to check :id (which is :foo here)

kwladyka 2020-12-21T22:02:24.309900Z

((fn [{:keys [foo] :as m}]
   (println foo))
 {:foo "bar" :x "y"})
bar
or like above

kwladyka 2020-12-21T22:02:32.310100Z

but not everything at once 🙂

kwladyka 2020-12-21T22:05:46.310500Z

On the end you wouldn’t write anonymous function and call them right a way like that

kwladyka 2020-12-21T22:06:34.310700Z

(let [f (fn [{:keys [foo] :as m}]
          (println foo))]
  (f {:foo "bar" :x "y"}))
this can be easier to understand

popeye 2020-12-21T22:06:58.310900Z

yeah will explore

👍 1
popeye 2020-12-21T22:22:05.311200Z

how about this

popeye 2020-12-21T22:22:06.311400Z

(fn [v] (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) false true )(map val (:id "john")))

kwladyka 2020-12-21T22:26:10.311600Z

no, this is not how you want to do this 🙂

kwladyka 2020-12-21T22:27:26.311800Z

BTW if you want to get all values only from map use vals so (vals {:foo "bar" :x 1})

kwladyka 2020-12-21T22:27:46.312Z

really hard to talk about how things should be done while we are doing things to learn

kwladyka 2020-12-21T22:27:56.312200Z

you have to experiment and figure out things

phronmophobic 2020-12-21T23:16:39.314200Z

from https://clojure.org/reference/protocols#_extend_via_metadata: > As of Clojure 1.10, protocols can optionally elect to be extended via per-value metadata:

(defprotocol Component
  :extend-via-metadata true
  (start [component]))
Is there a resource that talks about how to decide if a protocol should opt in to extension via metadata?

2020-12-21T23:20:15.315800Z

Here's a fun little example of why Functional is better than OOP 😛

data = None

if data and "domain" in data:
  domain = data.get("domain").get("name", "foo")
else:
  domain = "bar"
  
print(domain)
Notice in this code, you need the condition to be: if data and "domain" in data:, the reason we have to check for the fact that data is not None otherwise the type None will not have a in method and you will see: TypeError: argument of type 'NoneType' is not iterable

2020-12-21T23:21:30.316800Z

If you didn't use methods, and instead used a functional approach, and in was a function, this would not be a problem, because you could easily implement a None check inside that function.

2020-12-21T23:22:05.317400Z

This is also a good example why nil isn't as bad in Clojure as it is in non null-safe OOP languages like Python or Java

kwladyka 2020-12-21T23:24:10.318200Z

cljs.user=&gt; (key nil)
ERROR - No protocol method IMapEntry.-key defined for type null: 
you have to check nil and types in Clojure too 🙂

2020-12-22T20:19:03.335500Z

Yes, sometimes, but now it's just a design choice, not a limitation of the paradigm. Key is just a function implemented with:

(defn key
  "Returns the key of the map entry."
  [map-entry]
  (-key map-entry))
If it wanted, it could handle nil in any way.

phronmophobic 2020-12-21T23:26:00.319800Z

I wouldn't say that's a fair comparison. you typically wouldn't want to accept data as either None or a dict. I think it would be appropriate to only expect a dict. additionally, idiomatic python follows "it's easier to ask for forgiveness than permission". I would expect to just see:

data.get("domain", {}).get("name", "bar") 

phronmophobic 2020-12-22T08:11:03.326Z

the above is a nice addition. I still prefer clojure to python by quite a bit, but python isn't so bad

Tamas 2020-12-22T10:21:22.326500Z

same here! ie. python isn't bad but I prefer clojure

2020-12-22T20:22:40.335800Z

I wasn't specifically singling out Python, more OO vs Functional.

2020-12-22T20:23:24.336Z

My point being, what if you wanted a .get that can handle None or any other type, maybe vector, etc.

2020-12-22T20:24:02.336200Z

In OO, all types would need to agree to share a .get interface, and provide an implementation for it

2020-12-22T20:27:11.336400Z

But also, in this particular case, ya I do find Python's handling of None on .get less then ideal. Think Clojure's handling is much nicer specifically because I think the above is a common source of bug.

2020-12-22T20:30:17.336600Z

And not withstanding, I found this example because it was in our case 😅

Tamas 2020-12-23T08:09:23.351900Z

I think we understood and agreed with your point, but we didn't think that the comparison was fair. In practice (at least on python codebases I worked on) that python code would look like: get-in(data, ('domain', 'name'), 'bar') or get-in(data, 'domain.name', 'bar') which doesn't compare that unfavourably to (get-in data ["domain", "name"], "bar") as your initial example.

2020-12-23T18:15:04.393900Z

It's possible, no one on our team is really a pro at Python, more like learned at university or picked it up here and there. This code is in a script file part of our infra, so it also doesn't get the same level of code review scrutiny and all. I can't seem to find get-in though? Is that from a popular library?

2020-12-23T18:16:36.394300Z

If so, I think it demonstrates my point pretty well, and I'd be curious to look at the implementation. My guess is get-in is a function that people create for this very problem. Instead of adding a method to Dictionaries and None, if people have found the need to change get from a method to a function, that would be a good example of what I'm talking about. In Python, you could argue that you want a null error to be thrown, maybe you prefer the fail fast, and if you didn't explicitly handle null, maybe you consider a null appearing a bug that you'd want to know about. So that can be a design choice, what do you do with data being None? And while I like that Clojure has get handle nil by default, I don't want to say that throwing a null error if get encounters a null is necessarily worse or bad. But, in OOP, you actually can't do anything about it if you did want to handle this case the way Clojure does. That's because of how methods work versus functions. If the type is wrong, the methods won't exist. All you can do is add the method to more and more types, but even then, there's always a chance a type shows up that doesn't have the method, and you get an error again. That's one of the Functional advantages in my opinion. Which you could also do in Python, since it has Functions, you could make get a function and do this.

kwladyka 2020-12-21T23:28:07.321300Z

I would say the biggest difference for me is I can focus on moving from room A to B instead of object door which is not what I am interested in to achieve, because I want to move to B - but this is very abstractive description :)

kwladyka 2020-12-21T23:29:13.321600Z

I am going sleep, good night

GGfpc 2020-12-21T23:47:36.324Z

I'm working on an app where I'm making several api calls concurrently to fetch data. The number is variable but let's say it's 50 on average. I'm currently using pmap to transform the urls into the response in parallel, but I was wondering if it could be faster since pmap is limited to 2 + num_cpus and the time is mostly spent in I/O wait. Any tips?

2020-12-22T20:54:53.338Z

@jumar I don't think you're correct here. The parallelization level is restricted by the thread pool it uses, chunking won't change that.

2020-12-22T21:44:37.338700Z

the parallelization is controlled by the lag between the launch of new futures and the deref, it uses future which is an expanding unlimited pool

2020-12-22T21:45:15.338900Z

chunking changes the behavior of (map #(future (f %)) coll) which is what actually creates the threads

2020-12-22T21:46:50.339100Z

so the answer is weird and complicated (another reason I don't like pmap) - chunking causes futures to be launched a chunk at a time, if the input is chunked, otherwise the number of futures in flight is controlled by the lag between future generation and future realization (which is done via the blocking deref)

2020-12-22T21:47:10.339300Z

(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"
   :static true}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x &amp; xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll &amp; colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

2020-12-22T21:48:30.339500Z

the (drop n rets) creates the lag between creation of new futures and blocking deref to wait on them

2020-12-22T21:49:00.339700Z

breaking a common piece of advice to not mix lazy calculation with procedural side effects

2020-12-22T21:55:48.339900Z

Oh ya, my bad, I was thinking of agent send

2020-12-22T21:56:17.340100Z

I actually never deep dived the impl of pmap, hum..

2020-12-22T21:57:10.340300Z

Doesn't the implementation of step here unchunks?

2020-12-22T22:03:08.340600Z

;; changes to this atom will reported via println

(def snitch (atom 0))

(add-watch snitch :logging
           (fn [_ _ old-value new-value]
             (print (str "total goes from " old-value " to " new-value "\n"))))

(defn exercise
  [coll]
  (doall
   (pmap (fn [x]
           (swap! snitch inc)
           (print (str "processing: " x "\n"))
           (swap! snitch dec)
           @snitch)
         coll)))
user=&gt; (exercise (range 10))
total goes from 3 to 4
total goes from 4 to 5
total goes from 2 to 3
total goes from 1 to 2
total goes from 0 to 1
processing: 0
processing: 4
processing: 2
processing: 3
processing: 1
total goes from 5 to 4
total goes from 4 to 3
total goes from 1 to 0
total goes from 2 to 1
total goes from 3 to 2
total goes from 0 to 1
total goes from 1 to 2
processing: 6
processing: 7
total goes from 2 to 3
total goes from 3 to 4
total goes from 5 to 4
total goes from 4 to 5
processing: 8
total goes from 4 to 3
processing: 9
processing: 5
total goes from 3 to 2
total goes from 2 to 1
total goes from 1 to 0
(0 0 0 0 0 0 3 2 0 0)
max parallelism here is 5 - I'm going to try a version where I capture the max and exercise it more aggressively

2020-12-22T22:03:37.340800Z

Cool

2020-12-22T22:03:48.341Z

@didibus I am not good enough with lazy-seqs to read the pmap code and know whether it unchunks, so I'm working empirically

2020-12-22T22:04:14.341200Z

Haha, no one is 😛

2020-12-22T22:06:16.341400Z

yeah, here's my version of exercise that captures the max parallelism:

(defn exercise
  [coll]
  (let [biggest (atom 0)]
    (dorun
     (pmap (fn [x]
             (swap! snitch inc)
             (swap! biggest max @snitch)
             (print (str "processing: " x "\n"))
             (swap! snitch dec)
             @snitch)
           coll))
    @biggest))
(exercise (range 1000)) prints a lot more than I'm going to paste here, and returns 19

2020-12-22T22:06:41.341600Z

lmk if that's flawed, but to my eye that will accurately tell you the max futures spawned concurrently by pmap

2020-12-22T22:07:01.341800Z

(nb range is chunked, which is why I'm using it here)

2020-12-22T22:08:23.342Z

Hum. Ya, looking at the code, its kind of hard to get a full picture. I think the branch of if-let that uses cons will unchunk, but the other branch would not. And the drop n will also trigger the first chunk.

2020-12-22T22:10:12.342200Z

all the retries on that poor little atom make the output with bigger inputs absurd

2020-12-22T22:10:50.342400Z

or maybe that's caused by the printing contention...

2020-12-22T22:11:15.342600Z

Might be better to use a sempahore? I think a lock instead of atom's retry maybe would make this more clear?

2020-12-22T22:11:21.342900Z

(the reason all the prints call str is because otherwise the parts of the prints overlap in the output

2020-12-22T22:11:28.343100Z

hmm

2020-12-22T22:12:18.343300Z

Oh, no I don't think that's what I meant. Whatever the thing that is a locking counter is called

2020-12-22T22:13:30.343500Z

Then again, hum... What if you changed the impl of pmap so that inside the future it incremented and decremented the counter before and after running f ?

2020-12-22T22:14:10.343700Z

that would be the same behavior, with more work to achieve it

2020-12-22T22:17:09.343900Z

hum..

2020-12-22T22:18:21.344100Z

I rewrote to an agent (doesn't retry), the prints are now in intelligible order, the answer is still high (33, 37, 38, 39, 36 ...)

2020-12-22T22:20:38.344300Z

max value in theory is 42 (32 chunk size + 8 processors + 2)

2020-12-22T22:25:43.344700Z

Ya, so that matches my interpretation of the code

2020-12-22T22:26:12.344900Z

The first branch I think unchunks, but the drop is what triggers the first chunk

2020-12-22T22:26:27.345100Z

So instead of getting n parallelization, you get size of first chunk

2020-12-22T22:26:39.345300Z

+n

2020-12-22T22:26:53.345500Z

+n hum..

2020-12-22T22:27:00.345700Z

(when you overlap the next chunk)

2020-12-22T22:28:26.345900Z

Oh boy, that's one confusing little function haha. It does seem like, it was written pre-chunking though, so I guess chunking just wasn't taken into account. Hum, I wonder if that explains why I see poor performance improvements from it in practice, like with chunking, the thread overhead is way too high for parallelization

2020-12-22T22:28:35.346100Z

it launches chunk-size futures, but iterates by nproc+2 delay between reader of input and reader of future values, if your input is big enough to have multiple chunks you can have more than chunk size in flight

2020-12-22T22:29:37.346300Z

that could be - I consider it more like "an example of what you could do to parallelize a specific problem" that happened to make it into the codebase, and it doesn't match most people's problems

2020-12-22T22:30:36.346600Z

reducers are more general, but I haven't used them in anger and haven't seen much usage of them in the wild

2020-12-22T22:31:28.346800Z

Ya, I think having to require their namespace and the fact that only fold is still useful now that we have transducers makes them kind of DOA

2020-12-23T00:01:47.348Z

Well, maybe this chunking behavior is actually a blessing in disguise? Now it means using this re-chunk function:

(defn re-chunk [n xs]
  (lazy-seq
   (when-let [s (seq (take n xs))]
     (let [cb (chunk-buffer n)]
       (doseq [x s] (chunk-append cb x))
       (chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))
Taken from clojuredocs, you can actually control the concurrency level of pmap 😛

2020-12-23T00:02:50.348200Z

(dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 1 (range 1000)))) Will give you ~2+cores (dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 100 (range 1000)))) Will give you ~100

2020-12-23T00:06:06.348500Z

Not sure what to think about this. It probably just be nice if pmap was re-written to unchunk and take the number of cores+2 or an optional n.

jumar 2020-12-23T08:47:04.357100Z

I have the same feeling and that’s why I created map-throttled in the repo; but it’s for a very specific use case. In most cases it’s better to use Executors or claypoole

2020-12-21T23:52:32.324200Z

When I had an app that heavily used APIs, the pattern that worked best was to have separate resource pooling per API service. This is because there's usually a per API limit (either imposed by the API, or their own resources being able to serve you)

2020-12-21T23:53:38.324400Z

that pooling could be a thread pool (eg. claypoole which lets you use futures with custom pools) or a queue per service, with a different number of workers dedicated to each queue

2020-12-21T23:55:11.324700Z

if you aren't hitting the limits of the APIs, you can just use future for each call, and skip pmap which is rarely the right answer

2020-12-21T23:56:13.324900Z

if you need to do any coordination (eg. combining results from multiple calls before calling another endpoint) look into core.async (but make sure all the io is inside core.async/thread calls)