This PG Phriday is going to be a bit different. During my trip to Postgres Open this year, I attended a talk I had originally written off as “some Red Hat stuff.” But I saw the word “containers” in the PostgreSQL in Containers at Scale talk and became intrigued. A few days later, I had something of an epiphany: I’ve been wrong about servers for years; we all have.
That’s a pretty bold claim, so it needs some background.
Well, the bell has tolled, the day is over, and at the end of it all, Postgres Open has ended its fifth year in service of the community. I will say it was certainly an honor to speak again this year, though now that it’s not conveniently in Chicago, I’ll have to work harder to justify hauling myself across the country next year. Of course at this point, I’d feel guilty if I didn’t at least try, assuming any of my submissions are accepted.
Most Postgres operators and informed users are aware that it uses MVCC for storage. One of the main drawbacks of this versioning mechanism is related to tuple reuse. In order to reuse the space, VACUUM must complete a cycle on the table. Unfortunately this isn’t always possible to “optimize” for larger tables. How so?
If a large table needs to have a calculated column added, or some other bulk query updates a large portion of its content, a large fragment of the table is now empty space.
One of the cool things I like most about Postgres, is that it’s probably the most inclusive database software I’ve ever encountered. It’s so full of features and functionality these days, it’s practically middleware. Almost anything plugs into it, and if it doesn’t, there’s usually a way to make it happen.
Want a demonstration?
SciDB is often used for large analytical data warehouses. They even use Postgres for metadata storage. Despite this, they still haven’t written a foreign data wrapper for back-and-forth interaction.
This week we’ll be covering another method of Postgres partitioning. This is a technique I personally prefer and try to use and advocate at every opportunity. It’s designed to straddle the line between traditional partitioning and standard monolithic table structure by using table inheritance as a convenience factor. The assumption here is that end-user applications either:
Know that partitioning is in use. Only load “current” data and don’t care about partitions.