sql

All things SQL and JDBC...
2020-05-22T19:48:17.030800Z

@seancorfield jdbc/plan ... {:builder-fn rs/as-arrays} does not return the name of the columns as the 1st item like (execute! ..as-arrarys). Is that intended ?

2020-05-22T19:49:32.032100Z

I've noticed this because I was generating excel files with execute! when I've made the code more resilient to low memory with plan I've noticed the lack of names.

seancorfield 2020-05-22T19:53:44.033100Z

@geraldodev jdbc/plan only uses the RowBuilder half of a result set builder -- it doesn't know anything about the result set as a whole.

seancorfield 2020-05-22T19:58:40.035200Z

The docs should probably point that out (but I hadn't even thought about someone using plan with array builders to be honest). It's "intentional" that you don't get the column names but I can see how it would be convenient to be able to get that information... Hmm, once as-arrays has actually been called you can get the column names from the object it returns... give me a second...

2020-05-22T20:04:52.038900Z

I'm a recent convert to as-arrays to build spreadshets, before I used juxt. It's unnecessary if the qry reflects what you want to output if you use as-arrays. The argument to inclusion I'd make is that this information is already included when the result is a map and that would match the behaviour of the as-arrays flag on other places .

seancorfield 2020-05-22T21:21:08.042100Z

OK, I think I can see how to expose the column names in the reduce over an array builder with plan... I'll have to write some tests and verify how it all works now. I'll most likely add a new protocol to make this cleaner. TL;DR: a result set builder (record) has fields for the result set, the result set metadata, and the column names but there's currently no easy way to access that part of the builder via the "mapified" result set.

🦜 1
seancorfield 2020-05-22T21:45:52.043700Z

@geraldodev Please add comments to https://github.com/seancorfield/next-jdbc/issues/110 describing what you're doing with arrays and spreadsheets and plan so I have more context on how and where you would use the vector of column names (labels) if you had access to them.

seancorfield 2020-05-22T22:46:39.046400Z

Thanks for the additional context @geraldodev -- that's an interesting use case. One of the problems here is that you're actually "losing" the whole builder context by realizing the full row (that first (map #(into [] %)) so by the second step of your pipeline, you're dealing with pure data and you no longer have access to the result set or metadata so whatever solution I come up with here is going to need a different solution shape.

seancorfield 2020-05-22T22:53:15.051200Z

One thing I can do is to have the mapified-result-set implement clojure.lang.Indexed because columns can be accessed via index (essentially nth) but I'm not sure how much it helps... What is your idx-v-fn ?

2020-05-22T22:53:45.051500Z

The #(into [] %) was necessary to get rid of not available statement. I could go to plan with map, but even with that, I'd have to compute the order of the fields at the start of the pipeline, and store in a atom.

seancorfield 2020-05-22T22:53:58.051800Z

Right, you're missing the point.

2020-05-22T22:54:34.052600Z

(defn idx-v-fn [idx v] [idx v])

seancorfield 2020-05-22T22:55:04.053300Z

The row passed into the reduction (`transduce` in your case) is an abstraction. By calling seq on it -- implied by calling into -- you've blown away that abstraction and that's put you on the slow path.

seancorfield 2020-05-22T22:55:36.053700Z

Ah, so idx-v-fn is just vector

seancorfield 2020-05-22T22:55:52.054400Z

(map-indexed vector) would do the same thing.

👍 1
2020-05-22T22:56:09.054800Z

Yes, I need indexes to craft excel spreadsheet

seancorfield 2020-05-22T22:57:00.055600Z

Yeah, you're really working against the whole idea behind plan at this point.

2020-05-22T22:57:47.056400Z

I need plan because I dont know the size of the table in advance. I need it to get the data in chunks.

seancorfield 2020-05-22T22:58:55.057200Z

Right, I understand that you need streaming and you need to reduce on that so you can have chunking.

seancorfield 2020-05-22T23:01:18.059600Z

I'll have to think about this but it would be fairly easy for me to extend the "API" that mapified-result-set exposes (which is currently just a hash map that is also a DatafiableRow) to make it possible to get the row index (the underlying result set already knows that) as well as making each row Indexed (again, because the result set can be accessed by label or by index) and maybe a few other things that will make this a lot easier for you.

seancorfield 2020-05-22T23:02:31.060200Z

The vector of column names is also available "under the hood" but it's just not exposed anywhere.

seancorfield 2020-05-22T23:03:19.061800Z

In order to take advantage of that, you'd need a way to build chunked sequences of vectors that stayed inside the abstraction which is going to be the hard problem to solve here.

2020-05-22T23:03:30.062Z

Remember that Oracle is not Datafiable friendly 🙂 But as you said the column names are already available.

seancorfield 2020-05-22T23:04:38.062600Z

I'm not sure what you mean about "Oracle is not Datafiable friendly"?

seancorfield 2020-05-22T23:05:21.063100Z

Oracle has nothing to do with the abstraction I'm talking about.

seancorfield 2020-05-22T23:06:31.064200Z

If you're using an array builder, datafiable-row will produce a vector of column values -- no column names involved (and no table names either -- I know Oracle doesn't give you the table names but that's not relevant here).

2020-05-22T23:07:41.065100Z

Yes they lack getTableName , not related to issue , please disregard the comment

seancorfield 2020-05-22T23:10:53.066200Z

(I just checked using REBL and the whole datafy / nav machinery works just fine on result sets built as arrays so that basic stuff is fine -- I had not tested that before)

seancorfield 2020-05-22T23:22:34.069600Z

Hmm, working through this locally, partition-all is the real problem here because it requires that the rows be realized (because partition-all is a stateful transducer and the reduction is already happening over a stateful object -- the ResultSet). So what you need is a way to produce rows (vectors of column values) efficiently that would also give you the row number and the column names -- I'm thinking as metadata -- and then that should "survive" the trip through partition-all.

2020-05-22T23:29:54.070700Z

That would be very nice to have

seancorfield 2020-05-22T23:36:35.071600Z

I have part of it working locally. I need to shift gears and move that p.o.c. into a proper test context so I can play with it in more depth.

seancorfield 2020-05-22T23:40:51.073100Z

The TL;DR is that I'll switch MapifiedResultSet from an interface to a protocol and have it support row-number and column-names and implement those via metadata on the result of datafiable-row.

seancorfield 2020-05-22T23:43:44.075900Z

Your code would become:

(transduce
  (comp
    (map #(rs/datafiable-row % conn {}))
    (partition-all 100))
  (completing
    (fn [_ lote]
      (doseq [v lote]
        (row-number v) ; returns 1-based result set row number
        (column-names v) ; returns vector of column names in the same order as the data values in v
        ...)))
  (jdbc/plan conn [...] {:builder-fn rs/as-arrays}))

seancorfield 2020-05-22T23:44:22.076600Z

You wouldn't need map-indexed since you could get the row number from the (realized) row anyway.

2020-05-22T23:45:42.076800Z

Very good abstraction.

seancorfield 2020-05-22T23:51:34.079200Z

There are some interesting DB-specific restrictions in play with this, I've just discovered. Apache Derby only allows .getRow when scroll cursors are used. And introducing the dereferencing of the builder in datafiable-row seems to throw exceptions if you use plan incorrectly (I have tests for several types of plan misuse and they now blow up instead of giving you something more predictable, but that's easy to fix 🙂 )