The pure, unadulterated, presumptuous impudence of NoSQL. Engines like MongoDB recklessly discard concepts like ACID in some futile quest to achieve “web scale”, and end up accomplishing neither. RDBMS systems have literally decades of history to draw upon, and have long since conquered the pitfalls NoSQL platforms are just now encountering. There may be something to a couple of them, but by and large, they’re nothing we really need.
At least, that’s something I might have said a couple of weeks ago.
One way the Postgres project is subtly misleading, is that it becomes easy to forget that not all other projects are nearly as well managed. This becomes more relevant when delving into niches that lack sufficient visibility to expose the more obvious deficiencies. As much as we like Postgres, it’s not quite as popular as it could be. This makes some of the side projects infrequently used, and as a direct consequence, they can often resemble jerky automatons cobbled together out of spit and bailing wire.
Now that we’ve decided to really start embracing horizontal scaling builds, there is a critically important engine-agnostic element we need to examine. Given an existing table, how exactly should we split up the contents across our various nodes during the…
Let’s just get the obvious out of the way early: dealing with multiple Terabytes or Petabytes in a database context is something of a nightmare. Distributing it, retrieving it, processing it, aggregating and reporting on it, are all complicated—and perhaps worst of all—non-intuitive. Everything from tooling and maintenance, to usage and input, are either ad-hoc or obfuscated by several special-purpose APIs and wrappers.
One of the reasons a self-scaling database is such a killer app, derives from the failure rate from having so many moving parts. A proxy here, a shard API there, a few design modifications to accommodate implementation quirks, and suddenly we have a fragile, janky golem. A lumbering monstrosity that our applications and data depend on, where modifying or replacing any part of it will mean redistributing all the data at the very least.
Having run into a [intlink id='pg-phriday-growing-pains']bit of a snag[/intlink] with Postgres-XL, and not wanting to be dead in the water with our project, I went on a bit of a knowledge quest. Database scaling is hard, so I expected a bunch of either abandoned or proprietary approaches. In addition, as a huge fans of Postgres, compatibility or outright use of the Postgres core was a strict prerequisite.
So, what options are out there? Is there even anything worth further investigation? Maybe more importantly, what do you do when you’re under a bit of a scheduling constraint? Projects need to move forward after all, and regardless of preferences, sometimes concessions are necessary. The first step was obviously the list of databases derived from Postgres.