clojure-dev

Issues: https://clojure.atlassian.net/browse/CLJ | Guide: https://insideclojure.org/2015/05/01/contributing-clojure/
2019-08-22T03:26:06.317400Z

I have tried an Amazon EC2 Ubuntu 18.04 Linux t2.micro instance running OpenJDK 11 and seen the same behavior, running the tryme.sh script

alexmiller 2019-08-22T03:27:11.318Z

I'm sure it has something to do with opt and deopt of that loop, but you're not running with enough debug stuff on in the jvm to tell

alexmiller 2019-08-22T03:27:48.318500Z

I've seen this before, don't remember what in particular

alexmiller 2019-08-22T03:28:15.318900Z

you need to actually have it print all the asm stuff probably

2019-08-22T03:28:52.319500Z

I can set that up and collect that data, and give a link to it later.

alexmiller 2019-08-22T03:29:13.320Z

well, I don't have time to look at it, but that's what I'd do

2019-08-22T03:29:29.320400Z

From what I saw doing it once earlier today, I had the impression that it was optimizing up to 'level 4' at some point, and that was when it got slower.

alexmiller 2019-08-22T03:29:44.320800Z

that's tier 4, not level 4

2019-08-22T03:29:51.321Z

Sure, not expecting you to dive in to the data, but someone else might see something I don't.

alexmiller 2019-08-22T03:30:39.321600Z

it could be when it does the on-stack replacement it finds something much faster but then deopts at the termination condition or something

alexmiller 2019-08-22T03:31:29.321900Z

just tweet it to https://twitter.com/shipilev, he'll probably tell you in 5 min

2019-08-22T03:36:00.322800Z

Thanks. I may do that once I have something with a JVM byte code listing and JIT logging enabled.

2019-08-22T06:54:24.323200Z

Good news, Alex, I saw your email message to the Clojure Google group, for the first time in a while.

🦜 2
alexmiller 2019-08-22T12:23:51.338500Z

Yes, that was kind of a test. I’ve been messing with my DKIM and SPF records, may have actually made some progress :)

cgrand 2019-08-22T08:19:16.326300Z

@andy.fingerhut your loop is in return position and returns a primitive, the function return type is object, this return type mismatch is the root of the issue: 1/ Repro of your case:

(defn foo2 [n]
  (let [n (int n)]
    (loop [i (int 0)]
      (if (< i n)
        (recur (inc i))
        i))))
#'user/foo2
user=> (time (foo2 100000000))
"Elapsed time: 53.865585 msecs"
100000000
user=> (time (foo2 100000000))
"Elapsed time: 226.535655 msecs"
100000000
2/ return object:
(defn foo2 [n]
  (let [n (int n)]
    (loop [i (int 0)]
      (if (< i n)
        (recur (inc i))
        :foo))))
#'user/foo2
user=> (time (foo2 100000000))
"Elapsed time: 57.383683 msecs"
:foo
user=> (time (foo2 100000000))
"Elapsed time: 56.647265 msecs"
:foo
3/ return prim
(defn ^long foo2 [n]
  (let [n (int n)]
    (loop [i (int 0)]
      (if (< i n)
        (recur (inc i))
        i))))
#'user/foo2
user=> (time (foo2 100000000))
"Elapsed time: 51.049683 msecs"
100000000
user=> (time (foo2 100000000))
"Elapsed time: 47.310656 msecs"
100000000

đź‘Ť 1
reborg 2019-08-22T08:45:33.327300Z

@cgrand any idea why the first time it runs that boxing into object seems to be ignored? Is the var object somewhat ignoring the return type and then save the wrong info somewhere?

cgrand 2019-08-22T08:46:35.327600Z

No clear idea. I believe it has something to do with on stack replacement, and later realizing that the replacement sig does not match the replacee sig 🤷 maybe even pushing some boxing inside the loop. Inspecting assembly could help figure out. But i definitely encountered this in the past.

cgrand 2019-08-22T09:21:31.331700Z

More experiments: • explicit boxing on exit of the loop works great

user=>
(defn foo2 [n]
  (let [n (int n)]
    (loop [i (int 0)]
      (if (< i n)
        (recur (inc i))
        (Integer/valueOf i)))))
#'user/foo2
user=> (time (foo2 100000000))
"Elapsed time: 46.156153 msecs"
100000000
user=> (time (foo2 100000000))
"Elapsed time: 54.608185 msecs"
100000000
• explicit boxing on the loop expression doesn’t
(defn foo2 [n]
  (let [n (int n)]
    (Integer/valueOf
      (loop [i (int 0)]
        (if (< i n)
          (recur (inc i))
          i)))))
#'user/foo2
user=> (time (foo2 100000000))
"Elapsed time: 45.705437 msecs"
100000000
user=> (time (foo2 100000000))
"Elapsed time: 220.503665 msecs"
100000000
I believe it doesn’t work because the loop expression is hoisted into a function that suffers from the same issue as the original foo2.

cgrand 2019-08-22T09:40:51.333200Z

However Long/valueOf instead of Integer/valueOf doesn’t work. Does it have to do with the size of the return type? I tried disabling UseCompressedOops but have seen no difference

reborg 2019-08-22T10:48:18.334500Z

On the plus side, I’m only able to reproduce on a plain REPL, it does not happen on a lein repl or from a main (meaning, prod apps are not affected)

cgrand 2019-08-22T11:02:13.334900Z

On the minus side, I can:

$ clj -m foo2
"Elapsed time: 49.448754 msecs"
"Elapsed time: 240.798172 msecs"

cgrand 2019-08-22T11:03:28.335400Z

where foo2.clj is

(ns foo2)

(defn foo2 [n]
  (let [n (int n)]
    (loop [i (int 0)]
      (if (< i n)
        (recur (inc i))
        i))))

(defn -main []
  (time (foo2 100000000))
  (time (foo2 100000000)))

dominicm 2019-08-22T11:07:13.335600Z

Aot?

cgrand 2019-08-22T11:07:49.335800Z

nope

cgrand 2019-08-22T11:10:00.336500Z

also it’s tied to C2

$ clj -J-XX:TieredStopAtLevel=1 -m foo2
"Elapsed time: 85.075081 msecs"
"Elapsed time: 84.942798 msecs"
$ clj -J-XX:TieredStopAtLevel=4 -m foo2
"Elapsed time: 58.025492 msecs"
"Elapsed time: 222.761918 msecs"

cgrand 2019-08-22T11:20:11.336800Z

@dominicm AOT:

$ java -cp classes:`clj -Spath` foo2
"Elapsed time: 57.144601 msecs"
"Elapsed time: 218.541054 msecs"

dominicm 2019-08-22T11:23:35.337300Z

Ah. Maybe C2 is the difference here then.

alexmiller 2019-08-22T12:27:07.340200Z

Andy’s earlier post looked like it was doing osr in C3, then deopt in C3, then reopt in C4

alexmiller 2019-08-22T12:27:31.340800Z

Which matches cgrand’s hypothesis

cgrand 2019-08-22T12:52:35.342700Z

C2 is what you call C4 (Tier 4), no? To me Tier 0 is interpreter, Tiers 1 to 3 are C1 with increasing profiling, Tier 4 is C2. We need a JDK specialist, @ghadi?

alexmiller 2019-08-22T12:58:47.342900Z

maybe so, dunno

2019-08-22T17:21:07.343800Z

Wow, thanks for all the investigation @cgrand. I will be trying to reproduce your cases and commentary in my repo for possible future reference.

2019-08-22T19:07:14.343900Z

Not sure if you were aware that Leiningen by default adds its own JVM command line options related to Tiered compilation, which make performance experiment results suspect, unless you override that by using something like the following in your project.clj file: :jvm-opts ^:replace ["jvm-opts-I-want"]

2019-08-22T20:23:01.346100Z

OK, still no Twitter message to Aleksey Shipilev, and still no disassembly to JVM byte code nor Intel assembly code, but I have added brief instructions to the README, and attempted to automate what is there working across JDK8 through the latest versions (AdoptOpenJDK, OpenJDK, Azul Zulu, and Amazon Corretto all tested), and added Christophe's variations: https://github.com/jafingerhut/leeuwenhoek

2019-08-22T20:24:18.347200Z

My best guess is that we are seeing a case where some Clojure-compiled JVM byte codes are being optimized to faster machine code at JIT tier 3, vs. slower machine code when compiled at JIT tier 4, depending on exactly how you write that Clojure function.

reborg 2019-08-22T20:34:36.347300Z

Indeed! That’s now clear. It also reproduces on a pure java main, so there is “a problem” also in production code