clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/
2021-01-03T03:43:37.278500Z

Thanks. I see that now.

william 2021-01-03T13:19:43.281300Z

hey, some time ago I asked in this channel for a debugger that shows the intermediate steps of a function call in the buffer (like the clojure one) for clojurescript, and I got pointed to a tool (standalone) that works with both clj and cljs. But then I hit the slack message limit and I can't find it anymore. Do you recall the name?

clyfe 2021-01-03T13:22:09.281400Z

hashp or spyscope

william 2021-01-03T13:28:32.281600Z

this are very interesting things, but the project I remember had a gui you connected to

Eamonn Sullivan 2021-01-03T13:53:25.288200Z

Hi all, I have the following (probably common) problem. I have a sequence (count: 100-300) of maps that need to be filtered by a series of predicates and then (on the two dozen or so ones that remain) embellished with a couple of new keys and values. The issue is that most of the predicates (and some of the embellishments) require REST or GraphQL calls, so are blocking. What would an experienced Clojure developer (of which I'm not) reach for first in this situation to make this run quickly and make the best use of cores/threads? The r/reduce r/fold things say they are for computationally intensive stuff, not i/o blocking. Maybe async/pipeline-blocking, I thought, but it doesn't seem to help much, unless I'm using it wrong.

Eamonn Sullivan 2021-01-04T19:46:56.325900Z

In the end, I went with async/pipeline-blocking, with 100 concurrency. This didn't give me much when I first tried it, but I was folding the whole sequence in the blocking channels (and less concurrency). Instead, I now feed one thing (map) at a time through the pipeline and that seems to result in acceptable performance (10-30 seconds for queries, compared with 15-45 seconds). That's good enough for my purposes. Thanks all! I learned quite a bit.

2021-01-05T03:33:12.365600Z

Agents could also be easy I think for your use case, something like:

(defn api-pred [e]
  (Thread/sleep 100)
  (even? e))

(let [coll-to-process (range 1000)
      concurrency 100
      agents (repeatedly concurrency #(agent []))]
  (doseq [[i agnt] (map vector coll-to-process (cycle agents))]
    (send-off agnt
      #(if (api-pred i)
        (conj % i)
        %)))
  (apply await agents)
  (->> (mapv deref agents)
    (reduce into)))
We spawn concurrency number of agents (so 100 in this example). And then we round-robin sending them the task of calling the api-pred function for each collection item and if true we conj the item over the batch the agent is handling. Then we wait for all of them to be done, and we reduce over their results.

vemv 2021-01-03T13:57:31.288300Z

You forgot to describe, what's the actual problem with blocking? (e.g. performance, sth else)

Eamonn Sullivan 2021-01-03T13:58:01.288700Z

Sorry, yes, performance: I want to be quick.

Eamonn Sullivan 2021-01-03T13:58:59.289Z

Have edited.

vemv 2021-01-03T13:59:42.289300Z

is it acceptable for your use case to perform 100-300 requests in parallel? (if not, what's the max)

Eamonn Sullivan 2021-01-03T14:00:18.289500Z

Yes, I have the threads. This is a command-line/batch tool. In Scala, I would probably do something like that: use futures and a big (100-200) threadpool.

vemv 2021-01-03T14:04:13.289700Z

given those requirements I'd simply use pmap in such a way that each item in the 100-300 sequence gets its own thread, with its own filter->embellishment steps happening in each thread pmap (and future, send-off) are perfectly fine for IO-bound workloads even if some other options may look fancier

p-himik 2021-01-03T14:06:01.290100Z

But pmap doesn't allow you to specify the concurrency level, does it?

vemv 2021-01-03T14:07:42.290300Z

if you pmap a seq of 300 items you get a parallelism of 300 threads, which is desired in this case ...just make sure to (vec (pmap, to ensure such parallelism, since pmap is lazy

william 2021-01-03T14:10:40.290500Z

thanks @cursork I think it was flow-storm!

Eamonn Sullivan 2021-01-03T14:14:16.290700Z

Thank you. I'll try that.

p-himik 2021-01-03T14:26:54.290900Z

@vemv I don't think you assessment is correct. I just experimented, and I couldn't get more than 30-something threads. With hundreds of items and long sleep times. IIRC it's explained by the chunking. pmap derefs its futures by chunks which have a limited size.

p-himik 2021-01-03T14:32:12.291100Z

@eamonn.sullivan FWIW I just found this in my notes: https://github.com/TheClimateCorporation/claypoole As per noisesmith: > there's a version of pmap in the claypoole library that's better [than Clojure's pmap] for compute tasks > the advantage of claypoole over just mapv future / deref is it lets you define a specific parallelism (if coll has enough elements in it, it will grind the jvm or even OS to a crawl as it loses all its resources to thread context switch overhead)

1☝️
vemv 2021-01-03T14:32:23.291400Z

pmap uses future which uses a CachedThreadPool. It only grows if needed, which explains what you probably are seeing. If a given pmap step can be performed using a thread that was used-but-then-released from a previous pmap step, it will.

p-himik 2021-01-03T14:33:23.291600Z

@vemv That's why I mentioned long sleep times. Try to use pmap with a huge collection and a blocking function, and see how much JVM threads you get. The amount will be between 30 and 40.

rutledgepaulv 2021-01-03T14:33:44.291800Z

Pmap has some logic in it to not grow more than ncpu+2 tasks, it's enforced by the way it realizes the sequence and futures and not the thread pool that future spawns into. Look at the source

rutledgepaulv 2021-01-03T14:36:33.292Z

Parallelization capped near ncpu is intended primarily for cpu bound work and not blocking io, though you can certainly use it for either but you won't achieve the kinds of throughout you might be able to using larger number of threads if your work is primarily blocking

vemv 2021-01-03T14:37:28.292200Z

that's true @rutledgepaulv, I had only checked out future in the source but not that logic. future and send-off seem simpler then (the same CachedThreadPool will be used, but without a cpu-related limit) ...you must be sure that a reasonable amount of threads will be spawned though. 300 is OK, 10000 starts to be dangerous

rutledgepaulv 2021-01-03T14:38:55.292400Z

Or just use a fixed executor directly and skip the clojure.core functions

4πŸ‘
2021-01-03T15:58:49.293Z

If you’re actually concerned about performance, I would recommend reducing roundtrips by batching queries and then merging results in a post-processing step.

1πŸ‘
Eamonn Sullivan 2021-01-03T16:00:36.293200Z

Yeah, I don't have control over these APIs (an internal one and Github's REST and GraphQL). I'm batching as much as I can (doing Github searches when filtering on topics, for example), but I have limited leeway on this side.

Eamonn Sullivan 2021-01-03T16:04:31.293400Z

I'm writing a CLI querying tool, to help my teammates find which one of our hundreds of microservices and lambdas are using a particular dependency or runtime environment (e.g., version of node). This involves getting everything from an internal registry (which has things like whether it is lambda or an EC2, or what version of CentOS) and then poking Github to answer queries about particular dependencies, language or topics.

Eamonn Sullivan 2021-01-03T16:07:05.293600Z

My initial attempt (single threaded) took as long as two minutes to get an answer. My second attempt, using pmap and async/pipeline-async, takes 15-45 seconds. I think I can make it faster, given that just about everything is blocking i/o.

Eamonn Sullivan 2021-01-03T16:08:39.293900Z

(currently trying the fixed executor directly, but actually hitting Github API rate limits, so there's probably a ceiling on how much more I can squeeze out of this.)

respatialized 2021-01-03T17:05:09.296200Z

is there a version of in-ns that functions like let? Something that temporarily overrides the namespace for a given expr like:

(with-ns (symbol "new-ns")
  (do (println "the current namespace is: *ns*)))

2021-01-05T02:16:20.364Z

True, but you actually have to be careful with that, if it creates it, it won't even require clojure.core

clyfe 2021-01-03T17:18:10.296300Z

(defmacro with-ns [ns form]
  `(let [nsp# (.name *ns*)
         _# (in-ns ~ns)
         res# ~form]
     (in-ns nsp#)
     res#))

(with-ns 'clojure.core.reducers
  (prn (.name *ns*)))
;; => clojure.core.reducers

1
2021-01-03T17:35:45.296600Z

Since your original question was "what do people go to", I'd add core.async, since in my experience there's always a good solution you can come up with using those building blocks (pipeline for this one?). If you were looking for info tuning java threadpools, I will be very quiet πŸ™‚