sql

All things SQL and JDBC...
kulminaator 2020-08-22T05:22:27.014100Z

gave a testrun to cockroachdb yesterday with java & postgresql jdbc driver ... worked like a charm ... and since clojure's sql support is built ontop of jdbc it should work fine with clojure too

seancorfield 2020-08-22T05:23:12.014300Z

Nice!

kulminaator 2020-08-22T05:23:42.015100Z

i wish postgresql boys would take a note from the features that these guys are providing and pimp up their db

seancorfield 2020-08-22T05:23:49.015300Z

If you hit any weirdness with next.jdbc, LMK (did you see the Snowflake discussions above?)

seancorfield 2020-08-22T05:26:57.016100Z

Does CockroachDB support multiple result sets? That seems to be something that the PG maintainers are dragging their heels on implementing. @kulminaator

kulminaator 2020-08-22T05:27:39.016700Z

i'm more after their multimaster setup than other exotic features

kulminaator 2020-08-22T05:27:58.017100Z

they have a very nicely designed cluster architecture

kulminaator 2020-08-22T05:28:55.018Z

you can upgrade the db node by node in the cluster, you can add new nodes on the run, decomission old nodes

kulminaator 2020-08-22T05:29:00.018300Z

like a 21st century db

seancorfield 2020-08-22T05:29:15.018600Z

We use Percona's fork of MySQL, partly for the improved ProxySQL/replication/primary/secondary machinery.

seancorfield 2020-08-22T05:30:03.019600Z

It's interesting how poor the base open source versions are in this area.

kulminaator 2020-08-22T05:30:09.019800Z

i suggest you check out both, cockroachdb and yugabytedb ... they both seem to be on the same track ... just to see what's out there

kulminaator 2020-08-22T05:30:24.020400Z

both have free open source variants of their db available with somewhat reduced featureset

seancorfield 2020-08-22T05:30:33.020600Z

We're not switching. We have too much invested in Percona at this point.

kulminaator 2020-08-22T05:31:00.021300Z

well at least you can think of what you can ask from percona people as next features

seancorfield 2020-08-22T05:31:14.021700Z

We have hundreds of millions of rows of data 🙂

seancorfield 2020-08-22T05:32:41.022400Z

As maintainer of c.j.j for nine years and now next.jdbc, nearly all of my pain has come from Postgres users 🙂

seancorfield 2020-08-22T05:33:40.023400Z

Never from MySQL/Percona users. Some pain from Oracle users (I do feel sorry for them -- I was certified as an Oracle DBA back when I worked at Macromedia)

kulminaator 2020-08-22T05:35:45.023700Z

i have been on postgres for 13 years by now

kulminaator 2020-08-22T05:35:50.023900Z

but i feel it's time has come

kulminaator 2020-08-22T05:36:17.024400Z

the single node db instance design just doesn't cut my needs anymore

kulminaator 2020-08-22T05:36:45.025Z

yes one can shard the data and build exotic master-slave setups for failovers etc.

kulminaator 2020-08-22T05:36:49.025200Z

but it's all patches on patches

kulminaator 2020-08-22T05:36:53.025500Z

instead of cluster by design

seancorfield 2020-08-22T05:42:33.027200Z

We migrated all our data from MS SQL Server to MySQL a decade ago and it's still a painful memory so migrating to a new DB is... not something we look forward to. But a MySQL-compatible cloud DB would probably be our first choice.

kulminaator 2020-08-22T05:42:42.027600Z

luckily have been avoiding oracle for 15 years , i dont miss it 😄

kulminaator 2020-08-22T05:43:03.028300Z

failures with messages like ORA-43432 just send up a cold shiver in one's back

seancorfield 2020-08-22T05:43:09.028500Z

We also went all-in on MongoDB for a while... and that was... well... let's just say we were happy to migrate back to MySQL 🙂

kulminaator 2020-08-22T05:43:48.029100Z

i loved the thread on db sides stack overflow with "i upgraded my mongodb, where is my data"

kulminaator 2020-08-22T05:43:49.029400Z

😄

kulminaator 2020-08-22T05:44:27.030Z

i think sql as a design is over all sane, it has civilizing limiting featurset

kulminaator 2020-08-22T05:45:08.030600Z

and at least for a fintech company anything without proper transactions should be a no-no

kulminaator 2020-08-22T05:45:28.031100Z

developers are extremely bad at building their own simulations of transactions

kulminaator 2020-08-22T05:45:52.031400Z

so the nosql crowd advising people to do so is just asking for trouble

seancorfield 2020-08-22T05:51:29.032800Z

I got into MongoDB fairly early when they were all "We're not Oracle!" and then once they started getting popular they were like "Hey! We've added all these Enterprise features! You could migrate to us from Oracle!" and I was like "WTF?"

kulminaator 2020-08-22T05:53:08.033300Z

but by giving up mongodb you gave up the chance to reinvent a lot if bicycles

seancorfield 2020-08-22T05:54:21.033900Z

I've gotten pretty comfortable with my MySQL-cycle.

seancorfield 2020-08-22T05:55:11.034900Z

I know how to get it to perform well at high data volumes, I know how to modify large tables with zero downtime... I'm pretty happy with it in production, to be honest.

seancorfield 2020-08-22T05:56:02.035500Z

MongoDB promised a lot of ease of use but robustness and scalability were both hard and expensive 😞

kulminaator 2020-08-22T05:56:37.036Z

how does your percona mysql cluster handle the situation where you just yank a power cable from one of the nodes

kulminaator 2020-08-22T05:57:20.036500Z

is it up to you to start some kind of failover or does it have some kind of quorum system built in to make such decisions ?

kulminaator 2020-08-22T05:58:30.038Z

with the standard postgres setup & replications it's up to admins to do such things ... and it's partially one of the reasons i feel it's time to move away

seancorfield 2020-08-22T05:59:10.038800Z

I don't remember exactly how our DB folks have ProxySQL setup. I know it can be automatic, but I don't remember the details of failover and how replication lag affects that.

seancorfield 2020-08-22T05:59:58.040300Z

(I know from experience that yanking the power cord on our secondary DB doesn't affect anything, and replication catches up pretty fast once it is powered back on 🙂 )

kulminaator 2020-08-22T06:00:05.040500Z

with mongodb my experience is limited , i just remember doing one disaster prevention (machines running out of disk, spinning up new ones with bigger disks, cleaning the data, moving back to old ones) .... that was not a great weekend

kulminaator 2020-08-22T06:01:26.041700Z

but from what i saw on that weekend i don't really want to do anything with it

seancorfield 2020-08-22T06:01:36.041900Z

Some of the cloud services are getting pretty impressive at that sort of scaling/migration/upgrading these days. We use Elastic Cloud for our Elastic Search instances and their ops stuff is amazingly painless.

kulminaator 2020-08-22T06:02:08.042700Z

well that's the thing

seancorfield 2020-08-22T06:02:16.043200Z

Back when we used MongoDB, we used MongoLab and they were fairly good. A lot of stuff was automated, but not as smooth as the Elastic Cloud stuff.

kulminaator 2020-08-22T06:02:22.043400Z

cockroachdb and yugabyte both feel like elastic in that experience

kulminaator 2020-08-22T06:02:34.044Z

it acts like a proper cluster and manages itself

seancorfield 2020-08-22T06:02:37.044100Z

I wonder how Datomic feels in that sort of area?

kulminaator 2020-08-22T06:02:45.044500Z

and i feel databases should get to the same level

seancorfield 2020-08-22T06:02:47.044700Z

(Datomic Cloud, specifically)

kulminaator 2020-08-22T06:03:17.045Z

the compromise i don't want to do is ditching sql

seancorfield 2020-08-22T06:05:25.046Z

We talked to Cognitect about our use cases. Some of what we do would work well on Datomic but they said some of it isn't a match (and some of our data volumes are bigger than Datomic was designed for).

kulminaator 2020-08-22T06:06:53.047600Z

things start to fall apart surprisingly fast once you reach terabytes of data

seancorfield 2020-08-22T06:06:55.047800Z

I think for our member profile data and our billing/transactional data, Datomic would work well, but we have a lot of high-write data and a lot of append-only data with high volumes that they said would not be a good fit. And we don't want to run a multi-DB solution.

kulminaator 2020-08-22T06:06:58.047900Z

even with postgres and mysql

seancorfield 2020-08-22T06:07:17.048300Z

We have several tables with well over 100M rows at this point.

kulminaator 2020-08-22T06:07:28.048600Z

usually it's developers fault though 😄

kulminaator 2020-08-22T06:08:07.049200Z

but yeah, funny stuff happens ... initially nobody wants to become a billionaire so they create tables with int32 primary keys ...

kulminaator 2020-08-22T06:08:30.049900Z

then they finally understand that hey, we are getting many records here ... so tables migrate to bigint primary keys

kulminaator 2020-08-22T06:08:52.050500Z

and then you start seeing tables where you have stored blocked users and those have bigint primary keys

kulminaator 2020-08-22T06:09:03.050900Z

and you go like ... why would we block more than a third of the planet ?

kulminaator 2020-08-22T06:09:35.051500Z

we have exceeded the int32 line with multiple tables in our systems ...

seancorfield 2020-08-22T06:10:01.052400Z

Yup, us too. Can be a giant pain when you suddenly have to rebuild a table on the fly without downtime 🙂

kulminaator 2020-08-22T06:10:14.052700Z

and there is no good decision that sql planners can do in those situations unless you do only lookups by indexed field exact matches 😄

seancorfield 2020-08-22T06:11:07.053800Z

Recently, we had a lookaside table that assumed a 16-bit unsigned max index and it was fine for years, but then someone else decided to change the strings that could go into that table and BOOM! 65535!

kulminaator 2020-08-22T06:11:19.054200Z

but developers are always surprising too ...

seancorfield 2020-08-22T06:11:26.054600Z

Took six weeks to completely dig out from under that one.

kulminaator 2020-08-22T06:11:45.055200Z

in the sense that you have a table that has 500m records ... and then there's a type field on the records ... and it has 3-4 different values ....

kulminaator 2020-08-22T06:11:50.055400Z

and developers put an index on that

kulminaator 2020-08-22T06:12:03.055700Z

and then you just sit down and cry a little bit

seancorfield 2020-08-22T06:12:16.056200Z

Ah yes, index design is a fine "black art" 🙂

kulminaator 2020-08-22T06:12:18.056300Z

especially if they come asking why isnt the index used to make their magnificent query faster

seancorfield 2020-08-22T06:13:23.057200Z

Or you have a query for which two or more different indices are a "reasonable" match but the planner picks the slightly less good one... fortunately you can provide a USES hint in the query!

seancorfield 2020-08-22T06:13:57.057600Z

Sometimes I'm just amazed that any of this stuff works "at scale"...

kulminaator 2020-08-22T06:14:13.057900Z

well recovering from backups kind of doesn't anymore

kulminaator 2020-08-22T06:14:24.058200Z

would be unthinkable without failover databases

kulminaator 2020-08-22T06:15:09.058800Z

recovering a 2 terabyte db from a dump is way too slow, by any means ...

kulminaator 2020-08-22T06:15:35.059300Z

people are creative though, some use zfs snapshots of the database filesystem instead

kulminaator 2020-08-22T06:15:54.059800Z

so you don't have to import a dump , you just mount the image and start the db

🙂 1
seancorfield 2020-08-22T06:16:34.060200Z

(I'm out for the night)

kulminaator 2020-08-22T06:16:55.060400Z

night 🙂

orestis 2020-08-22T17:14:37.062700Z

Any resources for designing real-world schemas in sql? Things like blocked users, saving line item price to maintain historical records and all the other messy bits that are needed.

kulminaator 2020-08-23T13:54:59.063600Z

for blocked user detection you may want to use a bloom filter & caching , so you dont penalize "good users" with unnecessary lookups

kulminaator 2020-08-23T13:56:06.063800Z

not sure if i know overall good books on the overall topic though, most is domain specific