sql

All things SQL and JDBC...
Jean 2020-03-29T15:26:16.058500Z

Hello 🙂 I’m playing with next.jdbc(really really fan of the library). I was wondering, when I use (nav row :relation value)on a vector of rows, each navcall the database, right? So in the case of a large amount of data, isn’t it better to fetch all relation by id in one request?

seancorfield 2020-03-29T17:49:25.060200Z

@jean.boudet11 Can you provide a bit more context around your question? I'm not sure what you're really asking.

Jean 2020-03-29T18:04:25.060300Z

I have the relation A.B_id = B.id When I execute select * from A I wanted inject Binto each rows. I have two options: • Get B ids -> (map :B_id rows) and make a request to get all B rows (and process to inject B into each rows) • Or for each rows -> (map #(nav % :B_id (:B_id %)) rows) What the best choice on a large amount of rows? Maybe the first choice if nav call the database connection each time.

seancorfield 2020-03-29T18:07:08.060600Z

datafy/`nav` are really intended for use by tools, not by "users". Besides, in order to use nav you should first call datafy on a thing.

seancorfield 2020-03-29T18:07:22.060800Z

Why not just use a SQL JOIN?

johnj 2020-03-29T18:13:40.062Z

Why would (time (jdbc/execute! ds ["select * from products"])) take 20ms in the repl but print 300ms from the CLI?

seancorfield 2020-03-29T18:51:27.063600Z

@lockdown- What is ds? Perhaps in the REPL you've already opened a connection to the DB prior to timing this, but in the CLI you're timing is including opening the connection (which is likely the expensive part).

johnj 2020-03-29T19:07:22.064300Z

ds is (def ds (jdbc/get-datasource db))

johnj 2020-03-29T19:08:10.065400Z

In both cases I'm just timing the execute! expression, as pasted before

seancorfield 2020-03-29T19:17:01.066900Z

Hard to say, without a lot more information/context. Something has to be different about the code you're running, or the way you're timing it, or something in the environment.

johnj 2020-03-29T19:18:06.067600Z

this is literally all the code

johnj 2020-03-29T19:18:43.068100Z

(require '[next.jdbc :as jdbc])

(def db {:dbtype "postgres"
         :dbname "xx"
         :user "xx"
         :password "xx"})

(def ds (jdbc/get-datasource db))

(time (jdbc/execute! ds ["select * from products"]))

seancorfield 2020-03-29T19:20:58.068800Z

Duplicate that last line so it runs twice. See if the timings are different on repeated execution. Run it via the CLI again.

johnj 2020-03-29T19:25:44.069600Z

@seancorfield yep, the second line is now 30ms, what is getting cached there?

johnj 2020-03-29T19:26:27.070Z

the JVM warming up the clojure code?

seancorfield 2020-03-29T19:27:16.070600Z

More likely to be the initial JDBC driver setup of the connection to the DB.

johnj 2020-03-29T19:29:41.073600Z

anyway, ~30-50ms still seems very high, in psql is takes <1ms, do you know if this overhead is normal? the driver or somewhere else?

seancorfield 2020-03-29T19:30:40.074300Z

Well, you're still standing up a connection on each call to execute!.

seancorfield 2020-03-29T19:31:46.075500Z

The initial code path through JDBC involves the driver manager loading and initializing the specific JDBC classes needed, and setting up everything needed for connections to the database -- that's the big 300ms vs 30ms chunk of time.

seancorfield 2020-03-29T19:32:08.076Z

But you're not using a connection pool so each execute! is causing a new connection to be created and closed.

seancorfield 2020-03-29T19:33:29.077500Z

In real code, you'd want to use HikariCP or c3p0 (that's all described in the next.jdbc docs) and then once the first few connections are setup, subsequent operations become a lot faster because they can reuse previously connections, just taking them out of the pool and putting them back in.

johnj 2020-03-29T19:38:54.078Z

gonna give it a try

seancorfield 2020-03-29T19:42:37.079900Z

The other thing is that execute! realizes the whole ResultSet and builds Clojure data structures for it. That can be quite an overhead for large result sets. That's why plan exists, so you can sidestep all that realization and data construction.

johnj 2020-03-29T19:45:50.081400Z

that probably may be it, in the code above, is the object (jdbc/get-datasource db)) creates, is created a new every time when calling execute!?

johnj 2020-03-29T19:46:34.081600Z

can it be cached?

seancorfield 2020-03-29T19:49:10.082200Z

You are "caching" the data source object -- it is computed once and stored in the ds Var.

seancorfield 2020-03-29T19:49:45.083Z

But execute! will call .getConnection on that datasource object and that is what sets up a new connection to the database (and then it is closed at the end of the operation).

johnj 2020-03-29T19:56:37.084400Z

I see, thanks!

seancorfield 2020-03-29T20:03:14.085300Z

If you don't want to go all the way to connection pooling, but you want to run multiple statements without the overhead each time, look at the section of the docs on reusing connections.

seancorfield 2020-03-29T20:05:05.086200Z

(Connection pooling is described just a bit further on in that Getting Started page)

johnj 2020-03-29T20:12:28.087300Z

tried with both hikari and reusing the connection with next.jdbc/get-connection and now I'm getting ~1ms when timing

seancorfield 2020-03-29T20:16:26.087900Z

That sounds about right.

johnj 2020-03-29T20:18:56.088100Z

nice