Most Postgres operators and informed users are aware that it uses MVCC for storage. One of the main drawbacks of this versioning mechanism is related to tuple reuse. In order to reuse the space, VACUUM must complete a cycle on the table. Unfortunately this isn’t always possible to “optimize” for larger tables. How so?
If a large table needs to have a calculated column added, or some other bulk query updates a large portion of its content, a large fragment of the table is now empty space.
One of the cool things I like most about Postgres, is that it’s probably the most inclusive database software I’ve ever encountered. It’s so full of features and functionality these days, it’s practically middleware. Almost anything plugs into it, and if it doesn’t, there’s usually a way to make it happen.
Want a demonstration?
SciDB is often used for large analytical data warehouses. They even use Postgres for metadata storage. Despite this, they still haven’t written a foreign data wrapper for back-and-forth interaction.
This week we’ll be covering another method of Postgres partitioning. This is a technique I personally prefer and try to use and advocate at every opportunity. It’s designed to straddle the line between traditional partitioning and standard monolithic table structure by using table inheritance as a convenience factor. The assumption here is that end-user applications either:
Know that partitioning is in use. Only load “current” data and don’t care about partitions.
Most Postgres (PostgreSQL) users who are familiar with partitioning use the method described in the partitioning documentation. This architecture comes in a fairly standard stack:
One empty base table for structure. At least one child table that inherits the base design. A trigger to redirect inserts based on the partitioning scheme. A constraint on each child table to enforce the partition scheme, and help the planner exclude child partitions from inapplicable queries.
What’s a good table to partition? It’s not always a question with an obvious answer. Most often, size and volume determine whether or not a table should be broken into several chunks. However, there’s also cases where business or architecture considerations might use partitioning to preserve a shared table structure, or drive aggregate analysis over a common core. In Postgres (PostgreSQL), this is even more relevant because of how partitioning is handled through inheritance.