I’ve been talking about partitions a lot recently, and I’ve painted them in a very positive light. Postgres partitions are a great way to distribute data along a logical grouping and work best when data is addressed in a fairly isloated manner. But what happens if we direct a basic query at a partitioned table in such a way that we ignore the allocation scheme? Well, what happens isn’t pretty. Let’s explore in more detail.
This PG Phriday is going to be a bit different. During my trip to Postgres Open this year, I attended a talk I had originally written off as “some Red Hat stuff.” But I saw the word “containers” in the PostgreSQL in Containers at Scale talk and became intrigued. A few days later, I had something of an epiphany: I’ve been wrong about servers for years; we all have.
That’s a pretty bold claim, so it needs some background.
Most Postgres operators and informed users are aware that it uses MVCC for storage. One of the main drawbacks of this versioning mechanism is related to tuple reuse. In order to reuse the space, VACUUM must complete a cycle on the table. Unfortunately this isn’t always possible to “optimize” for larger tables. How so?
If a large table needs to have a calculated column added, or some other bulk query updates a large portion of its content, a large fragment of the table is now empty space.
One of the cool things I like most about Postgres, is that it’s probably the most inclusive database software I’ve ever encountered. It’s so full of features and functionality these days, it’s practically middleware. Almost anything plugs into it, and if it doesn’t, there’s usually a way to make it happen.
Want a demonstration?
SciDB is often used for large analytical data warehouses. They even use Postgres for metadata storage. Despite this, they still haven’t written a foreign data wrapper for back-and-forth interaction.
This week we’ll be covering another method of Postgres partitioning. This is a technique I personally prefer and try to use and advocate at every opportunity. It’s designed to straddle the line between traditional partitioning and standard monolithic table structure by using table inheritance as a convenience factor. The assumption here is that end-user applications either:
Know that partitioning is in use. Only load “current” data and don’t care about partitions.