In Joy of Clojure 2nd ed. (p. 253 - 255) they give a following example of making array mutations safe:
(defn make-safe-array [t sz]
(let [a (make-array t sz)]
(reify SafeArray
(count [_] (clj/count a))
(seq [_] (clj/seq a))
;; is locking really neccessary for aget? what could happen?
(aget [_ i] (locking a
(clj/aget a i)))
(aset [this i f] (locking a
(clj/aset a i (f (aget this i))))))))
(full sample here: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L280-L282)
I'm wondering why they lock aget
at all? Isn't it enough to lock aset
? Why should I block readers while there's a write in progress?Hmm, that might be it. But what would that mean? Like observing a half-set value? What would that even be?
Maybe because of this: https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html
I.e. same reason why we mark variables as volatile.
Exactly, you need something that orders both reads and writes respective of each other, otherwise the jvm can do things like read the array index once and cache it in a register, and just say your writes all happened after the read
@jumar locking emits a memory fencing instruction prevents operation reordering and that makes sure your CPU caches are synced. One thread might update a value in L1 cache (which are per-core) then another thread on another core might read the same value in it’s own L1 cache. Typically memory fence causes the changes to get pushed to L3 cache which isn’t per core. writing a volatile does the same, so generally for scalar values (int, long, writing a reference), volatile is sufficient
Ah right, so the read lock is there only to provide a fresh value - otherwised it could get cached;
I think it's unlikely to happen here (I increment the array values in 100 concurrent threads, then read them all afterwards), maybe because the cache coherence protocol will actually fetch the proper value when it's modified by the aset
operation (even when there's no lock in aget
)
I definitely couldn't find any consistency issue when removing the aget
lock and testing it (https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L286-L290)
Such bugs are notoriously difficult to test for. Sometimes you may catch them with such tests, but there is no guarantee you will
Yeah, based on my understanding of JMM and memory consistency properties (https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/concurrent/package-summary.html#MemoryVisibility) they do the right thing in the book; in particular: > Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread. Here are some good resources dealing with more details regarding the notion of volatile et al meaning "flush to main memory" (which was the impression I got from reading some Java book a decade ago but found much later that this is likely false when reading about the MESI cache coherence protocol): • https://stackoverflow.com/questions/1850270/memory-effects-of-synchronization-in-java • https://mechanical-sympathy.blogspot.com/2013/02/cpu-cache-flushing-fallacy.html • https://stackoverflow.com/questions/42746793/does-a-memory-barrier-ensure-that-the-cache-coherence-has-been-completed/42750844#42750844
@jumar locks or not, there's a race condition here because the sequence can be constructed before the threads are done mutating. Look at the result of (-> (make-safe-array Integer/TYPE 8) (doto pummel) seq)
Oh yeah, you're right. I think they basically rely on the reader waiting until the threads are done (which is quick for a human experimenting in the REPL 🙂 ).
... in which case, I think, the read lock basically doesn't matter at all but would be the right thing to do for an operation happening immediately after a previous aset
, right?
there's many possible reasons why you could see the latest value without explicit synchronization, but in general physical time is not something you should rely on
Likely because java.lang.reflect.Array/get
doesn't say anything about it being thread-safe.
What’s the default for clojure.compiler.direct-linking
and elide-meta
jvm options when doing a lein jar
or lein uberjar
?
@roklenarcic a build tool should not change these options unless the users asks for it
code in an uberjar might still rely on non-direct linking or metadata for example
Anyone here using vim with conjure in a monorepo? My issue is that I typically open files in multiple projects and it becomes tedious to launch the repl for every file. Is there a way to configure vim to find the projects root path and launch an nrepl-server in that dir?
I believe there is talk around adding that, you can check in the #conjure channel
I do but I don't start the REPL from nvim, I start a bunch of REPLs using a kinda custom docker-compose wrapper then I set up Conjure to connect to the right REPL depending on what dir I :cd
into.
Conjure allows you to work on multiple projects at a time by setting the :ConjureClientState [state-key]
At work, I set up a "cwd changed" autocmd that sets my ConjureClientState
to the cwd path. So every time I :cd
I get a fresh Conjure state with it's own nREPL connection and config.
You could set up something similar + use something like https://github.com/clojure-vim/vim-jack-in if you really want to start your REPL from within nvim.
I still recommend setting up your REPLs outside of nvim with your own script though, ensure you write your .nrepl-port
files into each sub-repo directory, then :cd
into each module as you work on them and Conjure will auto connect.
Then you can set up the autocmd to set the state as you hop around to have multiple concurrent connections.
augroup conjure_set_state_key_on_dir_changed
autocmd!
autocmd DirChanged * execute "ConjureClientState " . getcwd()
augroup END
I have a script that goes through my docker processes and maps the nREPL ports into .nrepl-port files in the correct directories of the mono repo. Making :cd
ing into directories synonymous with connecting to them.
You can also discuss conjure over at https://conjure.fun/discord if you so wish 🙂
I guess I can simply use a script to launch repls for all projects.. I guess it will eat some memory. Anyway, I joined #conjure so I'll ask future questions there.
by default those aren't used at all afaik
so no direct linking, no elide-meta
spec generators rely on the Clojure property testing library test.check. However, this dependency is dynamically loaded and you can use the parts of spec other than gen, exercise, and testing without declaring test.check as a runtime dependency.
The above is from the spec guide
where it speaks of loading the test.check
lib. What does it mean to dynamically load a lib ? how does that work ?if you do generator stuff, it will load the test.check.generator namespace. if you don't, then it won't.
so you can safely include test.check at test/repl time but exclude it at production time
:test-deps {:extra-paths ["test"]
:extra-deps {org.clojure/test.check {:mvn/version "1.0.0"}
peridot/peridot {:mvn/version "0.5.2"}}}
:run-tests {:extra-deps {com.cognitect/test-runner
{:git/url "<https://github.com/cognitect-labs/test-runner>"
:sha "209b64504cb3bd3b99ecfec7937b358a879f55c1"}}
:main-opts ["-m" "cognitect.test-runner"
"-d" "test"]}
an example of adding test.check(map (fn [k v]
(println " K " k)
(println " v " v)
(if-not (re-matches #"^[a-z]+\*$" (->str v))
nil
(->str v)))
{:id "john"})
Hello Team I am passing a map to anonymous function and wanted to validate the function and tried with below code ,but it is not working, how can I pass {:id "john"} to anonymous function ?
(fn [k v] …) is for 2 arguments. If you want to have key and value you need (fn [[k v]] …).
(defn foo [x1 x2 x3] ...)
is the fn with 3 arguments
(defn foo [x1 [k v] x3] ...)
is the function with 3 arugments, but second one is destructed to [k v]
if we use reduce-kv then then parameter will be [k v] right? why so?
so it takes x2
which is [:keyword-foo “value”] and place this under k and v
because it is different fn which get different parameters - in simply words 😉
it is designed to already get this parameters like that
while on the beginning it can look confusing later it is very intuitive
so it already destruct this value for you
Thanks @kwladyka
no problem
There was website with challenging tasks to transform data where you can try to solve this online. After all you can compare your solutions to the best solutions made by other people. This is really god place to start.
But I forgot the URL
is that 4 clojures?
indeed!
At least it is how I was learning many years ago
(map (fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false)){:id "john"})
in this case it is returning (true) or (false) as list
how can convert that to get as boolean?
the is you are using map
not in right context
{:id "John"}
is a map, but you want to use map
functions on collection like [{:id "John"} {:id "Popeye"}]
map
return a list
so it process each element in vector and return the output of your function
if you want to process only one map {:id “John”}, then not use map
function
what we can use if we have only 1 key and value?
just remove map
from there
ok, this will be not enough 🙂
(map println {:id "john" :foo "bar"})
[:id john]
[:foo bar]
=> (nil nil)
(map println [{:id "john"} {:foo "bar"}])
{:id john}
{:foo bar}
=> (nil nil)
Do you see what I mean?
there is no side efect?
What do you mean by side effect?
not returning nil?
println
return nil
yes
the function returning vice versa? like if you apply on map it returning as vector?
I don’t understand the question.
The logic is map
take each element from collection and run function with this element. The result is returned by list.
so map get from vector {:id john}
and run (println {:id john}
which return nil
etc.
yes, I got the functionality of map, In my logic i want to take key value which will be single map element and do pattern patching and result us true or false
if you want to operate on single map, then you don’t need to use map
as a function at all
unless you want to operate on each pair of key and value in map, then map
is ok
yeah, i got the error while running this function
((fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false){:id "john"}))
is that the way we call? sorry first time i am writing this
[[k v]]
is not correct anymore
oops
how can we achive then, I am passing single key and value
((fn [m]
(println m))
{:foo "bar" :x "y"})
{:foo bar, :x y}
=> nil
(map (fn [m]
(println m))
{:foo "bar" :x "y"})
[:foo bar]
[:x y]
=> (nil nil)
((fn [m]
(println (:foo m)))
{:foo "bar" :x "y"})
bar
=> nil
if you want to check :id
(which is :foo here)((fn [{:keys [foo] :as m}]
(println foo))
{:foo "bar" :x "y"})
bar
or like abovebut not everything at once 🙂
On the end you wouldn’t write anonymous function and call them right a way like that
(let [f (fn [{:keys [foo] :as m}]
(println foo))]
(f {:foo "bar" :x "y"}))
this can be easier to understandyeah will explore
how about this
(fn [v] (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) false true )(map val (:id "john")))
no, this is not how you want to do this 🙂
BTW if you want to get all values only from map use vals
so (vals {:foo "bar" :x 1})
really hard to talk about how things should be done while we are doing things to learn
you have to experiment and figure out things
from https://clojure.org/reference/protocols#_extend_via_metadata: > As of Clojure 1.10, protocols can optionally elect to be extended via per-value metadata:
(defprotocol Component
:extend-via-metadata true
(start [component]))
Is there a resource that talks about how to decide if a protocol should opt in to extension via metadata?Here's a fun little example of why Functional is better than OOP 😛
data = None
if data and "domain" in data:
domain = data.get("domain").get("name", "foo")
else:
domain = "bar"
print(domain)
Notice in this code, you need the condition to be: if data and "domain" in data:
, the reason we have to check for the fact that data
is not None
otherwise the type None
will not have a in
method and you will see: TypeError: argument of type 'NoneType' is not iterable
If you didn't use methods, and instead used a functional approach, and in
was a function, this would not be a problem, because you could easily implement a None check inside that function.
This is also a good example why nil
isn't as bad in Clojure as it is in non null-safe OOP languages like Python or Java
cljs.user=> (key nil)
ERROR - No protocol method IMapEntry.-key defined for type null:
you have to check nil
and types in Clojure too 🙂Yes, sometimes, but now it's just a design choice, not a limitation of the paradigm. Key is just a function implemented with:
(defn key
"Returns the key of the map entry."
[map-entry]
(-key map-entry))
If it wanted, it could handle nil in any way.I wouldn't say that's a fair comparison. you typically wouldn't want to accept data
as either None
or a dict. I think it would be appropriate to only expect a dict
. additionally, idiomatic python follows "it's easier to ask for forgiveness than permission". I would expect to just see:
data.get("domain", {}).get("name", "bar")
the above is a nice addition. I still prefer clojure to python by quite a bit, but python isn't so bad
same here! ie. python isn't bad but I prefer clojure
I wasn't specifically singling out Python, more OO vs Functional.
My point being, what if you wanted a .get that can handle None or any other type, maybe vector, etc.
In OO, all types would need to agree to share a .get interface, and provide an implementation for it
But also, in this particular case, ya I do find Python's handling of None on .get less then ideal. Think Clojure's handling is much nicer specifically because I think the above is a common source of bug.
And not withstanding, I found this example because it was in our case 😅
I think we understood and agreed with your point, but we didn't think that the comparison was fair. In practice (at least on python codebases I worked on) that python code would look like:
get-in(data, ('domain', 'name'), 'bar')
or
get-in(data, 'domain.name', 'bar')
which doesn't compare that unfavourably to
(get-in data ["domain", "name"], "bar")
as your initial example.
It's possible, no one on our team is really a pro at Python, more like learned at university or picked it up here and there. This code is in a script file part of our infra, so it also doesn't get the same level of code review scrutiny and all.
I can't seem to find get-in
though? Is that from a popular library?
If so, I think it demonstrates my point pretty well, and I'd be curious to look at the implementation. My guess is get-in
is a function that people create for this very problem. Instead of adding a method to Dictionaries and None, if people have found the need to change get
from a method to a function, that would be a good example of what I'm talking about.
In Python, you could argue that you want a null error to be thrown, maybe you prefer the fail fast, and if you didn't explicitly handle null, maybe you consider a null appearing a bug that you'd want to know about. So that can be a design choice, what do you do with data being None? And while I like that Clojure has get handle nil by default, I don't want to say that throwing a null error if get encounters a null is necessarily worse or bad.
But, in OOP, you actually can't do anything about it if you did want to handle this case the way Clojure does. That's because of how methods work versus functions. If the type is wrong, the methods won't exist. All you can do is add the method to more and more types, but even then, there's always a chance a type shows up that doesn't have the method, and you get an error again. That's one of the Functional advantages in my opinion. Which you could also do in Python, since it has Functions, you could make get a function and do this.
I would say the biggest difference for me is I can focus on moving from room A to B instead of object door which is not what I am interested in to achieve, because I want to move to B - but this is very abstractive description :)
I am going sleep, good night
I'm working on an app where I'm making several api calls concurrently to fetch data. The number is variable but let's say it's 50 on average. I'm currently using pmap to transform the urls into the response in parallel, but I was wondering if it could be faster since pmap is limited to 2 + num_cpus and the time is mostly spent in I/O wait. Any tips?
@jumar I don't think you're correct here. The parallelization level is restricted by the thread pool it uses, chunking won't change that.
the parallelization is controlled by the lag between the launch of new futures and the deref, it uses future which is an expanding unlimited pool
chunking changes the behavior of (map #(future (f %)) coll)
which is what actually creates the threads
so the answer is weird and complicated (another reason I don't like pmap) - chunking causes futures to be launched a chunk at a time, if the input is chunked, otherwise the number of futures in flight is controlled by the lag between future generation and future realization (which is done via the blocking deref
)
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
{:added "1.0"
:static true}
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
([f coll & colls]
(let [step (fn step [cs]
(lazy-seq
(let [ss (map seq cs)]
(when (every? identity ss)
(cons (map first ss) (step (map rest ss)))))))]
(pmap #(apply f %) (step (cons coll colls))))))
the (drop n rets)
creates the lag between creation of new futures and blocking deref to wait on them
breaking a common piece of advice to not mix lazy calculation with procedural side effects
Oh ya, my bad, I was thinking of agent send
I actually never deep dived the impl of pmap, hum..
Doesn't the implementation of step here unchunks?
;; changes to this atom will reported via println
(def snitch (atom 0))
(add-watch snitch :logging
(fn [_ _ old-value new-value]
(print (str "total goes from " old-value " to " new-value "\n"))))
(defn exercise
[coll]
(doall
(pmap (fn [x]
(swap! snitch inc)
(print (str "processing: " x "\n"))
(swap! snitch dec)
@snitch)
coll)))
user=> (exercise (range 10))
total goes from 3 to 4
total goes from 4 to 5
total goes from 2 to 3
total goes from 1 to 2
total goes from 0 to 1
processing: 0
processing: 4
processing: 2
processing: 3
processing: 1
total goes from 5 to 4
total goes from 4 to 3
total goes from 1 to 0
total goes from 2 to 1
total goes from 3 to 2
total goes from 0 to 1
total goes from 1 to 2
processing: 6
processing: 7
total goes from 2 to 3
total goes from 3 to 4
total goes from 5 to 4
total goes from 4 to 5
processing: 8
total goes from 4 to 3
processing: 9
processing: 5
total goes from 3 to 2
total goes from 2 to 1
total goes from 1 to 0
(0 0 0 0 0 0 3 2 0 0)
max parallelism here is 5 - I'm going to try a version where I capture the max and exercise it more aggressivelyCool
@didibus I am not good enough with lazy-seqs to read the pmap code and know whether it unchunks, so I'm working empirically
Haha, no one is 😛
yeah, here's my version of exercise that captures the max parallelism:
(defn exercise
[coll]
(let [biggest (atom 0)]
(dorun
(pmap (fn [x]
(swap! snitch inc)
(swap! biggest max @snitch)
(print (str "processing: " x "\n"))
(swap! snitch dec)
@snitch)
coll))
@biggest))
(exercise (range 1000))
prints a lot more than I'm going to paste here, and returns 19lmk if that's flawed, but to my eye that will accurately tell you the max futures spawned concurrently by pmap
(nb range is chunked, which is why I'm using it here)
Hum. Ya, looking at the code, its kind of hard to get a full picture. I think the branch of if-let that uses cons will unchunk, but the other branch would not. And the drop n will also trigger the first chunk.
all the retries on that poor little atom make the output with bigger inputs absurd
or maybe that's caused by the printing contention...
Might be better to use a sempahore? I think a lock instead of atom's retry maybe would make this more clear?
(the reason all the prints call str
is because otherwise the parts of the prints overlap in the output
hmm
Oh, no I don't think that's what I meant. Whatever the thing that is a locking counter is called
Then again, hum... What if you changed the impl of pmap so that inside the future it incremented and decremented the counter before and after running f ?
that would be the same behavior, with more work to achieve it
hum..
I rewrote to an agent (doesn't retry), the prints are now in intelligible order, the answer is still high (33, 37, 38, 39, 36 ...)
max value in theory is 42 (32 chunk size + 8 processors + 2)
Ya, so that matches my interpretation of the code
The first branch I think unchunks, but the drop is what triggers the first chunk
So instead of getting n parallelization, you get size of first chunk
+n
+n hum..
(when you overlap the next chunk)
Oh boy, that's one confusing little function haha. It does seem like, it was written pre-chunking though, so I guess chunking just wasn't taken into account. Hum, I wonder if that explains why I see poor performance improvements from it in practice, like with chunking, the thread overhead is way too high for parallelization
it launches chunk-size futures, but iterates by nproc+2 delay between reader of input and reader of future values, if your input is big enough to have multiple chunks you can have more than chunk size in flight
that could be - I consider it more like "an example of what you could do to parallelize a specific problem" that happened to make it into the codebase, and it doesn't match most people's problems
reducers are more general, but I haven't used them in anger and haven't seen much usage of them in the wild
Ya, I think having to require their namespace and the fact that only fold is still useful now that we have transducers makes them kind of DOA
Well, maybe this chunking behavior is actually a blessing in disguise? Now it means using this re-chunk function:
(defn re-chunk [n xs]
(lazy-seq
(when-let [s (seq (take n xs))]
(let [cb (chunk-buffer n)]
(doseq [x s] (chunk-append cb x))
(chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))
Taken from clojuredocs, you can actually control the concurrency level of pmap 😛(dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 1 (range 1000))))
Will give you ~2+cores
(dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 100 (range 1000))))
Will give you ~100
Not sure what to think about this. It probably just be nice if pmap was re-written to unchunk and take the number of cores+2 or an optional n.
I have the same feeling and that’s why I created map-throttled in the repo; but it’s for a very specific use case. In most cases it’s better to use Executors or claypoole
When I had an app that heavily used APIs, the pattern that worked best was to have separate resource pooling per API service. This is because there's usually a per API limit (either imposed by the API, or their own resources being able to serve you)
that pooling could be a thread pool (eg. claypoole which lets you use futures with custom pools) or a queue per service, with a different number of workers dedicated to each queue
if you aren't hitting the limits of the APIs, you can just use future
for each call, and skip pmap
which is rarely the right answer
if you need to do any coordination (eg. combining results from multiple calls before calling another endpoint) look into core.async
(but make sure all the io is inside core.async/thread
calls)