PG Phriday: The Audacity of NoSQL

Friday July 22, 2016

Page content

The pure, unadulterated, presumptuous impudence of NoSQL. Engines like MongoDB recklessly discard concepts like ACID in some futile quest to achieve “web scale”, and end up accomplishing neither. RDBMS systems have literally decades of history to draw upon, and have long since conquered the pitfalls NoSQL platforms are just now encountering. There may be something to a couple of them, but by and large, they’re nothing we really need.

At least, that’s something I might have said a couple of weeks ago.

Now, I’ve never really ascribed to the supposition that NoSQL is the strict antithesis of traditional SQL-driven systems. The intro paragraph is something of a hyperbolic exaggeration of the general disdain that seems to permeate the NoSQL vs SQL debate. Most DBAs I’ve met basically say “meh” and move on with their day, optimizing queries, tweaking storage metrics, and what not. Usually NoSQL is on our RADAR, but we have other stuff to worry about.

Last week, I was forced to reconsider due to an existing internal project that needed my input. That analysis really showed me why devs and data guys from all walks of life are seduced to the Dark Side of data. After another week slogging through documentation, experimenting with self-healing replication sets, self-balancing sharding, and taking copious notes on everything, something finally clicked. The Dark Side, it seems, has some merit beyond simplifying data structure.

darkside_fullpic

So what can Postgres potentially learn from its eternal adversaries? Beyond stealing their encoding mechanisms via JSON and JSONB datatypes, that is. Though my analysis thus far has focused mostly on MongoDB, it alone incorporates several fundamental concepts that I’ve always felt Postgres lacked. For now, I’ll focus on the two that have—in my opinion, anyway—really prevented Postgres from reaching its full potential.

Break Me

Postgres has replication, but no meaningful application beyond having a hot standby or a few read slaves. Why no master election consensus system like RAFT? Last year, I came across an excellent article that discussed implementing it with HAProxy, Governor, and etcd instead of the more historic (and infinitely more complicated) Pacemaker stack. Someone even forked Governer into Patroni to incorporate alternative consensus approaches. Still, that’s a lot of external dependencies.

Postgres has all of the parts to handle this itself, it just doesn’t. It has the replication stream. It knows among all replica nodes which has the highest transaction id. It has a process for cloning an existing instance in pg_basebackup. If an old primary node needs to become a replica, it can either do it directly, or use pg_rewind if necessary. Yet these fragments must be combined manually. Some use scripts, others use Pacemaker or some variant of the Governer approach above, but it’s much easier for a new project to simply chose another platform.

The specific issue at hand, is that Postgres elects to consider itself an autonomous process. In this view, a database is a self-contained data management structure, where everything goes into a single extremely high integrity bucket. As a consequence, Postgres has no concept of a cluster at all, except perhaps a slight awareness of currently streaming replicas.

MongoDB takes itself much less seriously. In the MongoDB world, a mongod instance is a bucket that will accept any and all content, and do a passable job of protecting it. As such, it’s much less concerned with spillage, giving it greater leverage for passing that bucket around. The authors of MongoDB clearly wanted to focus on data availability rather than integrity. To get that, they settled on a consensus system that some might argue is inadequate when combined with their data writing strategies.

Despite that, it works. MongoDB is a concept beyond mere data storage. Where Postgres is becoming more of a storage middleware through additions like extensions and foreign data wrappers, MongoDB is “where your data goes.” That’s a very alluring attitude, and a subtle shift in focus few (if any) mainstream RDBMS have adopted. First and foremost, engines like MongoDB are a cluster, incorporating multiple nodes that interchangeably share roles but present a united face to the world. Postgres has no equivalent.

But it could! Imagine a universe where we have the integrity guarantee of an ACID database, with a Mongo-like front-end that manages the nodes such that our bulletproof data is always available. Speaking of front ends…

Into Pieces

Shards. I have 50TB of data, and I have no Earthly idea what to do with it. I’d love to blow it up into a trillion little bits, but Postgres gives me no easy way to do that. Here’s the second place MongoDB trumps Postgres in its current incarnation. It’s not exactly a cake-walk, considering I drew up this chart while compiling my notes:

Mongo Cluster

But all the tools are there. Anyone with a dozen servers or VMs can build a scalable data store without writing their own distribution implementation. That’s hard to ignore.

For Postgres, it really boils down to the citus extension and Postgres-XL. Unfortunately citus has severe limitations with joins including non-distributed tables. Likewise, Postgres-XL has been trying to merge the scaling code into the 9.5 branch for a year now. Despite the increased visibility and presumed quality of 2ndQuadrant’s work, there are a couple of orders of magnitude fewer eyes on that code. Even assuming they manage to merge everything before 9.6 drops, will they be able to keep up with core afterwards? Citus has its limitations, but in the end, it is an extension anyone can install. Anything that isn’t an extension risks falling woefully behind or being abandoned.

Is there another way? If we treated Postgres backends like dumb containers the way MongoDB does, things suddenly change quite a bit. MongoDB has a process called mongos which is basically just a glorified proxy that manages shard metadata and forks out a balancing job that ushers data around to ensure shard content isn’t lopsided. That’s the “Manager / Balancer” in my chart. What if it also included a transaction manager?

Postgres has supported background workers since 9.3. In theory, it could adopt management of shard metadata, chunk migration, or global transaction roles where appropriate. This is essentially what Postgres-XL (and similar) is doing. Where MongoDB wrote a glue process to bring everything together, Postgres-XL opted to directly patch the Postgres source, and all the pain that entails.

I’m in danger of being too dismissive of implementation details, of course. Yet there are massive potential benefits to taking a cluster approach by default. Such a distribution proxy could be database agnostic, compatible with any engine that speaks SQL. Consider too that Postgres foreign data wrappers imply it could be the communication layer.

Last Thoughts

Postgres is tantalizingly close to being a cluster system simply through the communication protocols it provides. It just needs that final piece. I’m fully convinced that someone with the right skillset (definitely not me) could whip up a proof of concept for this in a few days. This is, after all, how most application-level sharding approaches work anyway. In the end, that’s all MongoDB, and several other NoSQL solutions ultimately provide. Take one part integrated RAFT process to always present a writable primary, and mix in a data distribution and balancing proxy. Done.

If they so desired, the MongoDB developers could theoretically even use Postgres as a back-end storage mechanism. In fact, I highly encourage them to do so! Given a natural shard split and extensions like cstore_fdw, suddenly NoSQL becomes a scalable column-store.

Barring that, corporate use of Postgres will be necessarily limited. There’s only so much you can do in a single instance, and without integral scaling features, it becomes migration time. I don’t want to tell our devs to convert all of their ORMs and queries to JSON, give up easy joins, or forgo NULL handling, but I will. I have to. This 50TB is only growing every day, and without a straight-forward and reliable migration path, we need to retool.

I disagree with too many of the data compromises MongoDB makes to use it for important data, but the availability approach is fundamentally sound. Mongo’s implementation is refreshing and innovative, and one which many NoSQL engines appear to share. Is it any wonder there has been a silent transition to these systems, when everything has several billion rows, and applications fully expect to reach well beyond Terabytes of storage into Petabyte territory?

No single instance can absorb that kind of volume, and not everyone has time or inclination to work around that limitation. We need a common middle-ground with the NoSQL world. We have the better storage engine, and they have the better distribution and availability mechanisms. Can’t there be a place where both of those things work in tandem?

I, for one, welcome our cluster overlords.

Break Me

Into Pieces

Last Thoughts

See Also