Let’s just get the obvious out of the way early: dealing with multiple Terabytes or Petabytes in a database context is something of a nightmare. Distributing it, retrieving it, processing it, aggregating and reporting on it, are all complicated—and perhaps worst of all—non-intuitive. Everything from tooling and maintenance, to usage and input, are either ad-hoc or obfuscated by several special-purpose APIs and wrappers.
One of the reasons a self-scaling database is such a killer app, derives from the failure rate from having so many moving parts.
Having run into a bit of a snag with Postgres-XL, and not wanting to be dead in the water with our project, I went on a bit of a knowledge quest. Database scaling is hard, so I expected a bunch of either abandoned or proprietary approaches. In addition, as a huge fans of Postgres, compatibility or outright use of the Postgres core was a strict prerequisite.
So, what options are out there?
Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well.
Connecting to a Postgres database can be a headache for end users and DBAs alike. Not because of the work involved, but the general irritation of managing passwords—an issue common to any system that requires authentication controls.
The user wants to say, “Who am I? None of your business!”
While the database is more comfortable with, “Go away, before I taunt you a second time!”
Well, there’s some middle ground everyone can enjoy, and a few Postgres-specific caveats which add a bit of flavor to the whole interaction.
I’ve been going on and on about esoteric Postgres features for so long, sometimes I forget my own neophyte beginnings. So let’s back way off and talk about how JOINs work. What are they? What do they do? Why would you use one? What can go wrong? It’s easier than you might imagine, but there are definitely some subtleties to account for.
To make this more relatable, let’s start with a club of pet owners that likes to keep track of everything: