clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/
2021-06-27T00:00:48.028500Z

Is there any possibility to use protocol in macro during macroexpansion time?

No implementation of method: :-patterns of protocol: #'ribelo.munich/IMulti
   found for class: clojure.lang.Symbol

dpsutton 2021-06-27T00:11:59.030300Z

Sure. Just remember when you have clojure forms versus runtime data. Emit code that does the right thing at runtime. You most likely are just dealing with sequences of symbols which is what you are seeing in your error message

2021-06-27T02:36:42.032Z

Up to a minute and half? On what hardware?

seancorfield 2021-06-27T02:54:20.033300Z

@jakub.stastny.pt_serv 90 seconds to start up a Clojure app from source is nothing... for a large app... because all the .clj files have to be compiled into memory.

seancorfield 2021-06-27T02:55:02.033500Z

When that overhead is removed -- by AOT'ing code going into the uberjar -- then the services restart in seconds, not minutes.

Nazral 2021-06-27T06:03:45.035600Z

I have a (very large) number of gzipped files that contain edn per lines, so I made these two functions to handle process them:

(defn read-gzipped
  [fname]
  (with-open [in (java.util.zip.GZIPInputStream.
                  (io/input-stream fname))]
    (slurp in)))

(defn read-edn-per-line
  [in f]
  (->> in
       str/split-lines
       (map (comp f read-string))))
I would expect these functions to parallelize well and to be able to do pmap (or upmap when using claypoole) over the list of files, however there is no difference in time whether I use pmap or map, not sure why, am I missing something?

dominicm 2021-06-27T06:06:32.036300Z

@archibald.pontier_clo could it be that your producer is slower than your reader? i.e. doing the gunzip and slurp is slower than read-string?

Nazral 2021-06-27T06:08:48.038200Z

I am not sure, but how would that stop calling read-gzipped + read-edn-per-line from having the same speed in a map and in a pmap? Because even if read-gzipped is slow, I should be able to read multiple files at once no?

dominicm 2021-06-27T06:12:25.041200Z

@archibald.pontier_clo Perhaps I misunderstood where your pmap was. I thought it was in place of the map at the end of read-edn-per-line. Not that you were pmap 'ing your list of files. The other option is that it's so fast the overhead of the parallelism makes it the same speed. pmap does have some footguns due to laziness, and I'm not sure if those might apply here, it depends on how you consume the sequence afterwards.

seancorfield 2021-06-27T06:12:34.041500Z

@archibald.pontier_clo Are you sure you are measuring the complete result? pmap is semi-lazy so unless you are forcing the whole result you may not be getting accurate times?

Nazral 2021-06-27T06:14:06.042900Z

@seancorfield I do a pmap followed by a mapcat and doall (last call), that should be fine no? @dominicm reading one file takes 20s+ so I don't think the overhead plays a role there

dominicm 2021-06-27T06:17:23.044800Z

@archibald.pontier_clo to confirm, your full code is (pmap #(read-edn-per-line (read-gzipped %) %) ["file-1" "file-2"])?

Nazral 2021-06-27T06:17:59.045600Z

(->> selected-days
         (pmap
          (fn [f]
            (-> (str f "/" ticker ".txt.gz")
                utils/read-gzipped
                ;;(utils/read-edn-per-line parse-line)
                )))
         (mapcat identity)
         doall)

Nazral 2021-06-27T06:18:16.045900Z

I removed read-edn-per-line for the moment

dominicm 2021-06-27T06:18:56.046300Z

How many selected-days are we talking here?

Nazral 2021-06-27T06:19:16.046800Z

10 for the moment (I'm testing on a small subset of files for the moment)

2021-06-27T06:19:47.047600Z

Slurp+read-string is generally horrendous, use read

👍 2
dominicm 2021-06-27T06:21:49.049800Z

@hiredman I think it's newline-separated files, so it would be a map over .readLine (which isn't there on InputStreams).

Nazral 2021-06-27T06:22:12.050100Z

Yes, one edn per line

2021-06-27T06:22:25.050500Z

Read will handle that fine

dominicm 2021-06-27T06:22:38.051200Z

You're right, just needs repeated calls to read.

2021-06-27T06:22:39.051500Z

That is more or less what a clojure source file jsis

dominicm 2021-06-27T06:22:40.051600Z

My bad 🙂

2021-06-27T06:25:11.054800Z

Pmap entangles a lot of things so it is tricky to understand. Pmap limits its parallelism to the number of cores the java runtime reports

Nazral 2021-06-27T06:25:33.055600Z

I need to convert the gzip stream to a stream that read understands though

2021-06-27T06:26:04.056400Z

Yes, java.io.PushbackReader

2021-06-27T06:26:50.057700Z

You may need to wrap in a reader first via http://clojure.java.io/reader

dominicm 2021-06-27T06:27:20.058200Z

That's where the core limiter is, I knew there must be one around there somewhere. pmap is an interesting beast 😛

dominicm 2021-06-27T06:29:34.060Z

There's also a lot of environment involved here: If you've only got a couple of cores, (I only have 4 for example) then you're not going to get loads of parallelism here. Although I am surprised you're seeing absolutely no speedup. I'd expect it to be less than 200s.

Nazral 2021-06-27T06:30:07.060100Z

class java.io.BufferedReader cannot be cast to class
   java.io.PushbackReader (java.io.BufferedReader and
   java.io.PushbackReader are in module java.base of loader
   'bootstrap')

Nazral 2021-06-27T06:30:23.060300Z

I found a previous slack thread on that topic, doesn't seem straightforward but I'll figure it out

Nazral 2021-06-27T06:31:04.061100Z

some, but nothing

Nazral 2021-06-27T06:32:34.061600Z

https://clojurians-log.clojureverse.org/clojure/2018-04-16 here

dominicm 2021-06-27T06:34:24.063200Z

@archibald.pontier_clo For comparison, how long does this take? (time (doall (pmap #(Thread/sleep (+ 5000 %)) (range 20)))) That should give some idea of parallelism available to you.

Nazral 2021-06-27T06:36:52.065400Z

5s

2021-06-27T06:36:56.065600Z

If you use an ExecutorService and an ExecutorCompletionService instead of pmap, you have a lot more visibility and control.

1
Nazral 2021-06-27T07:53:07.072700Z

Isn't this what is under the hood in the claypoole library?

2021-06-27T06:39:19.066600Z

Again, pmap is tricky, I forget if the specialized range type implements chunking, but chunking does weird things to pmaps attempts to limit parallelism

Nazral 2021-06-27T06:40:38.066900Z

Ok thank you I'll look into it

dominicm 2021-06-27T06:43:23.070Z

That's true again. But at least indicates there're enough cores around to be making use of this parallelism.

2021-06-27T06:43:35.070700Z

Your process may just be io bound, such that your io requests are queuing sequential somewhere else (os kernel, disk driver, etc), such that any parallelism in dispatching the requests doesn't result in faster processing

dominicm 2021-06-27T06:44:02.071400Z

I was trying to figure out which profiler or debugging tool would give insight into this, and I wasn't sure.

Nazral 2021-06-27T06:44:09.071500Z

that might be it

Nazral 2021-06-27T07:46:14.072300Z

Out of spite I ran that code on my prod server (significantly more powerful / better ssd than my laptop), and there pmap gives a very nice speed boost

Nazral 2021-06-27T09:21:23.073200Z

And thanks for the help! :hugging_face:

honza 2021-06-27T10:27:34.073900Z

Is lein still the best build system?

borkdude 2021-06-27T10:36:40.074800Z

@honza "best" is subjective, but it's the most complete solution. there is also now deps.edn which is more "decomplected": it does less and you can build tooling around this (which people have done and more to come)

❤️ 3
vemv 2021-06-27T11:11:14.079300Z

Lein has a lot of power (from its existing ecosystem to its unquoting and middleware systems) but deps.edn has shown the right way for a number of things (single JVM per task, first-class git dependencies, composable aliases) All those are technically possible in Lein but not the default... in an ideal world Lein would pick up some insights or even implementational details from deps.edn In practice it would be quite a lot of work, like so many things in OSS