java

orestis 2020-03-05T06:06:11.025800Z

Was surprised to find that my ETL pipeline spends a considerable amount in SSL processing (JDK 8, for now). I don’t control the code that handles the connections (MongoDB and JDBC Postgres) - is there a way to speed things up other than throwing hardware at the problem?

aisamu 2020-03-05T12:31:57.047600Z

This is a long shot, but does the flame-graph show wait time (e.g. waiting for I/O)? Have you checked the entropy levels on the machine that's running this (specially if it's a remote instance)

orestis 2020-03-05T13:18:51.047800Z

I think that I/O is built in the various function calls. Not sure what entropy levels is about…

orestis 2020-03-05T13:20:48.048Z

Like this? https://forums.aws.amazon.com/thread.jspa?messageID=249079

☝️ 1
orestis 2020-03-05T13:41:03.048200Z

It might be that I’m also saturating some I/O channel. Need to dig deeper into AWS to see the metrics.

seancorfield 2020-03-05T06:33:53.026700Z

@orestis That sounds like your code is creating and tearing down a lot of connections -- perhaps look at ways to have longer-lived connections and more connection pooling?

orestis 2020-03-05T06:46:08.027800Z

Sorry, I meant that the JVM is spending quite a lot of time doing SSL operations. I’ll paste the relative part of the flame graph here

orestis 2020-03-05T06:48:15.028200Z

seancorfield 2020-03-05T06:48:44.029200Z

Sorry, I don't know how to read that.

orestis 2020-03-05T06:48:53.029500Z

This is the mongo-specific part — sun.security.ssl.InputRecord.decrypt is taking half the time.

orestis 2020-03-05T06:49:15.030Z

Width is time spent in a specific function, bottom layers include the top.

orestis 2020-03-05T06:49:39.030300Z

(Super cool tool: http://clojure-goes-fast.com/blog/profiling-tool-async-profiler/)

orestis 2020-03-05T06:50:24.031100Z

Effectively this looks like when I’m loading documents from mongo, 55% of the CPU is spent on doing the BSON decoding, and 45% on the SSL decoding.

seancorfield 2020-03-05T06:51:08.031600Z

(this feels like yet more support for our decision to stop using MongoDB 🙂 )

orestis 2020-03-05T06:51:26.031700Z

Heh, you’d think — but here’s the relevant part of for JDBC 😄

orestis 2020-03-05T06:51:57.032400Z

again, sun.security.ssl.AppOutputStream.write is taking like 40% of the time

seancorfield 2020-03-05T06:52:44.032800Z

Security is expensive 🙂

orestis 2020-03-05T06:52:52.033100Z

(I’m loading documents from Mongo, doing light massage and dumping them into Postgres)

orestis 2020-03-05T06:53:38.034400Z

Good thing I decided to profile because I naively thought that the massaging time would be my bottleneck, but turns out it isn’t.

seancorfield 2020-03-05T06:53:39.034500Z

I guess I'd ask "Is the process fast enough?" rather than "Where is it spending its time?"

orestis 2020-03-05T06:54:06.035Z

Nope, it’s not 😄

seancorfield 2020-03-05T06:54:35.035900Z

Normally, the big speed ups come from algorithmic changes -- and it doesn't seem like you've got much hope of those.

seancorfield 2020-03-05T06:54:51.036500Z

So... hardware? Maybe that is your only option?

orestis 2020-03-05T06:55:12.037Z

There might be “free” performance gains by bumping to JDK 11, which I’ll try next. Googling about JVM SSL performance shows that it is a common issue (Netty says use OpenSSL)

seancorfield 2020-03-05T06:56:03.037600Z

We moved from 8 to 11 a while back for nearly all our processes but I can't say we noticed much speedup.

seancorfield 2020-03-05T06:56:21.038100Z

(I suspect we're bottlenecked on other stuff than SSL tho')

orestis 2020-03-05T06:57:35.038800Z

Yeah this is a special case in that it syncs Mongo to Postgres and has to pretty much pipe the entire database. It wouldn’t be noticeable with normal day-to-day operations.

orestis 2020-03-05T06:57:53.039Z

Hm, this seems relevant: https://github.com/google/conscrypt/

orestis 2020-03-05T07:02:26.039300Z

I’ll try throwing hardware at it and see what changes

orestis 2020-03-05T07:59:53.040500Z

Hardware does make a difference, and this is a nicely parallel problem so I think I can live with this for now.

jumar 2020-03-05T08:11:26.041800Z

This is interesting, given that the SSL stuff takes at most 50% (or less), it seems, I wouldn't hope to get a huge performance gain from optimizing it (Ambdahl's law). Did you try to run it without SSL? Does it make a huge difference?

orestis 2020-03-05T08:53:17.043100Z

No SSL is not an option so I didn’t even try :) trying to run this on production configurations so even disabling SSL is not something possible.

orestis 2020-03-05T08:54:24.044900Z

The thing is that people claim that OpenSSL is 1 or 2 orders of magnitude faster so essentially I would get twice the speed if SSL becomes insignificant.

jumar 2020-03-05T09:07:34.047Z

Assuming 45% spend on SSL and 10x performance speedup (which would be really interesting to verify if that's possible) you get at most 1.7x speedup: https://www.google.com/search?q=1+%2F+(+(1+-+0.45)+%2B+0.45%2F10)+)&rlz=1C5CHFA_enCZ836CZ837&oq=1+%2F+(+(1+-+0.45)+%2B+0.45%2F10)+)&aqs=chrome..69i57j6j69i64l2.11187j0j7&sourceid=chrome&ie=UTF-8 (https://en.wikipedia.org/wiki/Amdahl%27s_law) So it depends whether that is enough - I assume if you noticed it is slow, this speedup might not be enough 🙂