gave a testrun to cockroachdb yesterday with java & postgresql jdbc driver ... worked like a charm ... and since clojure's sql support is built ontop of jdbc it should work fine with clojure too
Nice!
i wish postgresql boys would take a note from the features that these guys are providing and pimp up their db
If you hit any weirdness with next.jdbc
, LMK (did you see the Snowflake discussions above?)
Does CockroachDB support multiple result sets? That seems to be something that the PG maintainers are dragging their heels on implementing. @kulminaator
i'm more after their multimaster setup than other exotic features
they have a very nicely designed cluster architecture
you can upgrade the db node by node in the cluster, you can add new nodes on the run, decomission old nodes
like a 21st century db
We use Percona's fork of MySQL, partly for the improved ProxySQL/replication/primary/secondary machinery.
It's interesting how poor the base open source versions are in this area.
i suggest you check out both, cockroachdb and yugabytedb ... they both seem to be on the same track ... just to see what's out there
both have free open source variants of their db available with somewhat reduced featureset
We're not switching. We have too much invested in Percona at this point.
well at least you can think of what you can ask from percona people as next features
We have hundreds of millions of rows of data 🙂
As maintainer of c.j.j for nine years and now next.jdbc, nearly all of my pain has come from Postgres users 🙂
Never from MySQL/Percona users. Some pain from Oracle users (I do feel sorry for them -- I was certified as an Oracle DBA back when I worked at Macromedia)
i have been on postgres for 13 years by now
but i feel it's time has come
the single node db instance design just doesn't cut my needs anymore
yes one can shard the data and build exotic master-slave setups for failovers etc.
but it's all patches on patches
instead of cluster by design
We migrated all our data from MS SQL Server to MySQL a decade ago and it's still a painful memory so migrating to a new DB is... not something we look forward to. But a MySQL-compatible cloud DB would probably be our first choice.
luckily have been avoiding oracle for 15 years , i dont miss it 😄
failures with messages like ORA-43432 just send up a cold shiver in one's back
We also went all-in on MongoDB for a while... and that was... well... let's just say we were happy to migrate back to MySQL 🙂
i loved the thread on db sides stack overflow with "i upgraded my mongodb, where is my data"
😄
i think sql as a design is over all sane, it has civilizing limiting featurset
and at least for a fintech company anything without proper transactions should be a no-no
developers are extremely bad at building their own simulations of transactions
so the nosql crowd advising people to do so is just asking for trouble
I got into MongoDB fairly early when they were all "We're not Oracle!" and then once they started getting popular they were like "Hey! We've added all these Enterprise features! You could migrate to us from Oracle!" and I was like "WTF?"
but by giving up mongodb you gave up the chance to reinvent a lot if bicycles
I've gotten pretty comfortable with my MySQL-cycle.
I know how to get it to perform well at high data volumes, I know how to modify large tables with zero downtime... I'm pretty happy with it in production, to be honest.
MongoDB promised a lot of ease of use but robustness and scalability were both hard and expensive 😞
how does your percona mysql cluster handle the situation where you just yank a power cable from one of the nodes
is it up to you to start some kind of failover or does it have some kind of quorum system built in to make such decisions ?
with the standard postgres setup & replications it's up to admins to do such things ... and it's partially one of the reasons i feel it's time to move away
I don't remember exactly how our DB folks have ProxySQL setup. I know it can be automatic, but I don't remember the details of failover and how replication lag affects that.
(I know from experience that yanking the power cord on our secondary DB doesn't affect anything, and replication catches up pretty fast once it is powered back on 🙂 )
with mongodb my experience is limited , i just remember doing one disaster prevention (machines running out of disk, spinning up new ones with bigger disks, cleaning the data, moving back to old ones) .... that was not a great weekend
but from what i saw on that weekend i don't really want to do anything with it
Some of the cloud services are getting pretty impressive at that sort of scaling/migration/upgrading these days. We use Elastic Cloud for our Elastic Search instances and their ops stuff is amazingly painless.
well that's the thing
Back when we used MongoDB, we used MongoLab and they were fairly good. A lot of stuff was automated, but not as smooth as the Elastic Cloud stuff.
cockroachdb and yugabyte both feel like elastic in that experience
it acts like a proper cluster and manages itself
I wonder how Datomic feels in that sort of area?
and i feel databases should get to the same level
(Datomic Cloud, specifically)
the compromise i don't want to do is ditching sql
We talked to Cognitect about our use cases. Some of what we do would work well on Datomic but they said some of it isn't a match (and some of our data volumes are bigger than Datomic was designed for).
things start to fall apart surprisingly fast once you reach terabytes of data
I think for our member profile data and our billing/transactional data, Datomic would work well, but we have a lot of high-write data and a lot of append-only data with high volumes that they said would not be a good fit. And we don't want to run a multi-DB solution.
even with postgres and mysql
We have several tables with well over 100M rows at this point.
usually it's developers fault though 😄
but yeah, funny stuff happens ... initially nobody wants to become a billionaire so they create tables with int32 primary keys ...
then they finally understand that hey, we are getting many records here ... so tables migrate to bigint primary keys
and then you start seeing tables where you have stored blocked users and those have bigint primary keys
and you go like ... why would we block more than a third of the planet ?
we have exceeded the int32 line with multiple tables in our systems ...
Yup, us too. Can be a giant pain when you suddenly have to rebuild a table on the fly without downtime 🙂
and there is no good decision that sql planners can do in those situations unless you do only lookups by indexed field exact matches 😄
Recently, we had a lookaside table that assumed a 16-bit unsigned max index and it was fine for years, but then someone else decided to change the strings that could go into that table and BOOM! 65535!
but developers are always surprising too ...
Took six weeks to completely dig out from under that one.
in the sense that you have a table that has 500m records ... and then there's a type field on the records ... and it has 3-4 different values ....
and developers put an index on that
and then you just sit down and cry a little bit
Ah yes, index design is a fine "black art" 🙂
especially if they come asking why isnt the index used to make their magnificent query faster
Or you have a query for which two or more different indices are a "reasonable" match but the planner picks the slightly less good one... fortunately you can provide a USES
hint in the query!
Sometimes I'm just amazed that any of this stuff works "at scale"...
well recovering from backups kind of doesn't anymore
would be unthinkable without failover databases
recovering a 2 terabyte db from a dump is way too slow, by any means ...
people are creative though, some use zfs snapshots of the database filesystem instead
so you don't have to import a dump , you just mount the image and start the db
(I'm out for the night)
night 🙂
Any resources for designing real-world schemas in sql? Things like blocked users, saving line item price to maintain historical records and all the other messy bits that are needed.
for blocked user detection you may want to use a bloom filter & caching , so you dont penalize "good users" with unnecessary lookups
not sure if i know overall good books on the overall topic though, most is domain specific