java

2020-02-27T22:26:16.002300Z

I am using criterium to benchmark some single-threaded Clojure code, and the GarbageCollectorMXBean Java API to sample and report the total time the JVM spends doing GC. That API is often showing 0 time, or very close to 0 time, during tests where when I use Ctrl-\ to show thread dump about 10 seconds apart, it shows 4 or 8 GC threads getting significant CPU time, and both Activity Monitor on macOS, and top/htop on Linux, are showing full use of 2 to 4 CPU cores during those 10 seconds.

2020-02-27T22:26:53.003100Z

Anyone have experience with GarbageCollectorMXBean underreporting GC time of the JVM? Or know a different way to read GC time from within the JVM that would take account of this time better?

seancorfield 2020-02-27T22:37:28.003700Z

If you enable verbose GC logging, do you get better timings in the logs?

seancorfield 2020-02-27T22:38:07.004Z

-Xlog:gc,gc+metaspace,gc+cpu:file=gc.log

seancorfield 2020-02-27T22:38:21.004300Z

(assuming you're on JDK 9+)

2020-02-27T22:43:46.004800Z

it is literally deja vu 🙂

2020-02-27T22:46:23.005900Z

The gc.log file produced by that method does seem to show GC times that appear closer to the CPU time measurements I see. I haven't scrubbed the numbers to see how many decimal places they match up to, but at least probably the first one 🙂

2020-02-27T22:46:27.006100Z

thanks.

2020-02-27T22:47:06.006800Z

I just feel cheated given that I actually bothered to look up a Java API to do this, and it is so far off. Welcome to the real world, I know 🙂

2020-02-27T22:49:09.008100Z

If one wanted this API to either (a) actually work accurately, or (b) be documented in such a way that one expected it to be not so accurate, would starting with a Java-only reproducible test case and asking about it on a StackOverflow question, and/or going straight to a ticket on some OpenJDK bug tracker, be the right approach?

seancorfield 2020-02-27T22:53:36.009200Z

I've never tried to use that MX bean. The docs do say it is estimated and that it can return incorrect values (repeating old values across multiple GC counts if the interval is small).

seancorfield 2020-02-27T22:55:20.009400Z

"For concurrent algorithms that value is perty useless, because it is not STW time and it is not CPU time (algorithm may use multiple threads, but wall clock time would be calculated)."

seancorfield 2020-02-27T22:56:27.010300Z

"You don't get that information from the MXBean. As you can see it's not exactly a very sophisticated tool, giving you 2 numbers for such a complicated operation. I'd say forget about the whole bean."

2020-02-27T22:57:01.011300Z

Thanks for that link. I will look a bit further into it, perhaps trying out the alternate APIs recommended there to see if they give more accurate results.

seancorfield 2020-02-27T22:57:12.011600Z

So I think the consensus is that it's poorly documented and doesn't work well with concurrent GC algorithms 😐

2020-02-27T22:57:49.012300Z

What I am seeing appears to be a case of GarbageCollectorMXBean getCollectionTime returning something like "time spent by main computation thread only in GC, ignoring other GC-specific thread computation time"

2020-02-27T22:58:15.012800Z

but for a case where > 95% of the time spent in GC is in those other GC-specific threads.

2020-02-27T22:59:11.013200Z

I did do a bit of Google searches on my own, but hadn't come across that link on my own. Perhaps a win for Bing there 🙂

2020-02-27T23:04:10.014700Z

I will check, but this seems like a potentially significant blind spot in tools like criterium -- if it reports elapsed time only, and you know that you do not care how many cores GC is using, then OK. If you prefer to know total CPU time, e.g. because that is what you pay for when renting VMs on a cloud service, then it would be seriously underreporting CPU time.