sql

All things SQL and JDBC...
nikp 2020-12-10T13:09:47.462900Z

Ah thanks Sean! I did indeed misunderstand. Is there any way in realizing the result set / running the builder-fn in parallel? It may be there's no benefit to that, but just curious at this point

seancorfield 2020-12-10T17:52:29.467100Z

@nikp There would be no benefit since you can't build rows until after you've read them in and you can't start concatenating chunks of the result set until you've read at least two chunks. Because of resource management, the processing of result sets has to be eager (in Clojure) even if the actual reading of rows is "lazy" against the database. If you're using plan for "lazy" reading (i.e., result set streaming) because you can't fit it all in memory, then your reducing function must be eagerly producing smaller data structure (or at least sending those chunks elsewhere for processing). If streaming results doesn't matter (i.e., the full result set will fit in memory), plan is still faster when you can avoid realizing rows as Clojure data structures.

nikp 2020-12-10T17:54:16.468100Z

I see, so result set processing has to be eager. That clarifies things, thanks

nikp 2020-12-10T17:55:14.468700Z

plan makes a lot of sense for row-by-row processing / large data that can't fit in memory, or for parallel processing

seancorfield 2020-12-10T17:56:04.469600Z

plan makes sense in any situation where you don't actually need the rows as realized Clojure hash maps, i.e., where you can process the columns without needing to create an actual "row".

seancorfield 2020-12-10T17:56:49.470900Z

execute! and execute-one! are eager as well. There's no way to produce a lazy sequence of data in Clojure from any SQL query in next.jdbc (because of connection management).

nikp 2020-12-10T17:56:59.471100Z

Okay

seancorfield 2020-12-10T17:57:33.471900Z

We use plan a lot at work because we tend to set up a lot of our data processing in pipelines via transduce.

nikp 2020-12-10T17:58:20.472900Z

Definitely can see a case for that. We have a flow right now where we load the entire dataset into memory and use in multiple pipelines instead of having multiple queries for the dataset. Was trying to see if there was a way to speed up the query time to load the dataset into memory

seancorfield 2020-12-10T18:02:10.476200Z

That would depend on what the bottleneck is. Is it really the actual Java ResultSet processing in JDBC, or is it in the Clojure code that builds data structures from it? If you use execute! you pay the cost of realizing the entire ResultSet into memory as a vector of hash maps in Clojure. If you use plan, you can usually avoid all of that overhead -- and you have the possibility of overlapping at least some of the data processing. But it usually involves turning your data processing "inside out" so it all runs inside a reducing pipeline / transducing pipeline, rather than the (Clojure) lazy way of reading data and then piping it through multiple (lazy) processing steps.

nikp 2020-12-10T18:05:08.477100Z

It's the Clojure code that creates the hashmaps from look of things. I'll double check again later with a profiler

seancorfield 2020-12-10T18:24:13.477800Z

Yeah, it can be surprising just how much of an overhead that can be for large result sets...