The pure, unadulterated, presumptuous impudence of NoSQL. Engines like MongoDB recklessly discard concepts like ACID in some futile quest to achieve “web scale”, and end up accomplishing neither. RDBMS systems have literally decades of history to draw upon, and have long since conquered the pitfalls NoSQL platforms are just now encountering. There may be something to a couple of them, but by and large, they’re nothing we really need.
At least, that’s something I might have said a couple of weeks ago.
One way the Postgres project is subtly misleading, is that it becomes easy to forget that not all other projects are nearly as well managed. This becomes more relevant when delving into niches that lack sufficient visibility to expose the more obvious deficiencies. As much as we like Postgres, it’s not quite as popular as it could be. This makes some of the side projects infrequently used, and as a direct consequence, they can often resemble jerky automatons cobbled together out of spit and bailing wire.
Now that we’ve decided to really start embracing horizontal scaling builds, there is a critically important engine-agnostic element we need to examine. Given an existing table, how exactly should we split up the contents across our various nodes during the conversion process? Generally this is done by selecting a specific column and applying some kind of hash or custom distribution mechanism to ensure all node contents are reasonably balanced. But how do we go about figuring that out?
Let’s just get the obvious out of the way early: dealing with multiple Terabytes or Petabytes in a database context is something of a nightmare. Distributing it, retrieving it, processing it, aggregating and reporting on it, are all complicated—and perhaps worst of all—non-intuitive. Everything from tooling and maintenance, to usage and input, are either ad-hoc or obfuscated by several special-purpose APIs and wrappers.
One of the reasons a self-scaling database is such a killer app, derives from the failure rate from having so many moving parts.
Having run into a bit of a snag with Postgres-XL, and not wanting to be dead in the water with our project, I went on a bit of a knowledge quest. Database scaling is hard, so I expected a bunch of either abandoned or proprietary approaches. In addition, as a huge fans of Postgres, compatibility or outright use of the Postgres core was a strict prerequisite.
So, what options are out there?