It’s about the time for year-end performance reviews. While I’m always afraid I’ll narrowly avoid being fired for gross incompetence, that’s not usually how it goes. But that meeting did remind me about a bit of restructuring I plan to impose for 2017 that should vastly improve database availability across our organization. Many of the techniques to accomplish that—while Postgres tools in our case—are not Postgres-specific concepts.
On a higher level, Postgres has a bevy of libraries, interfaces, and clients for accessing a database instance. From language APIs to GUIs like pgAdmin, or SaaS entries like JackDB, every flavor of interaction is covered. And yet, that’s only a small part of the story. For those who dare to tread into the watery depths, there’s also the world of dark incantations that is the command-line.
Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well. Performance tests showed huge improvements, initial deployments uncovered no outright incompatibilities, and the conversion was underway.
Then we started to use it.
I’ve been a Postgres DBA since 2005. After all that time, I’ve come to a conclusion that I’m embarrassed I didn’t reach much earlier: Postgres is awful. This isn’t a “straw that broke the camel’s back” kind of situation; there is a litany of ridiculous idiocy in the project that’s, frankly, more than enough to stave off any DBA, end user, or developer. But I’ll limit my list to five, because clickbait.
Postgres has been lacking something for quite a while, and more than a few people have attempted to alleviate the missing functionality multiple times. I’m speaking of course, about parallel queries. There are several reasons for this, and among them include various distribution and sharding needs for large data sets. When tables start to reach hundreds of millions, or even billions of rows, even high cardinality indexes produce results very slowly.