Partitioning tables in Postgres can be an extremely risky endeavor. Unfortunately on many larger systems, it’s also essentially a requirement; the maximum size of a Postgres table is 32TB. This isn’t just because converting an existing table to a series of partitions is expensive or time consuming. We must consider how the query planner will react to the partitioned version of a table. There’s also the very real risk we will (or already have) implement flaws in the trigger or constraint logic.
Tools like pg_partman or pg_pathman are great for new systems, but existing deployments require retrofitting. So what happens if we have a large warehouse that’s already using a slightly flawed partition engine? And what other concerns remain even after achieving a pristine setup?
One of the things Postgres has been “missing” for a while is logical replication based on activity replay. Until fairly recently, in order to replicate single tables from one database to another, we had to encumber the table with performance-robbing triggers coupled to a third party daemon to manage transport. Those days might finally be behind us thanks to pglogical.
For all of those warehouse queries that never seem to complete before the heat death of the universe, there’s often a faster version. Sometimes this is due to a fundamental misunderstanding of how queries work, or how Postgres specifically functions. The trick is knowing when to back away slowly from an ugly but efficient query, and when to inject a flurry of predicates to fully illustrate the original intent of the query so the planner makes better decisions. When partitions are involved, this can be quite an adventure.
I always advocate breaking up large Postgres tables for a few reasons. Beyond query performance concerns, maintaining one monolithic structure is always more time consuming and consequentially more dangerous. The time required to create a dozen small indexes may be slightly longer than a single larger one, but we can treat the smaller indexes as incremental. If we want to rebuild, add more indexes, or fix any corruption, why advocate an all-or-nothing proposition? Deleting from one large table will be positively glacial compared to simply dropping an entire expired partition. The list just goes on and on.
I wasn’t able to write an article last week due to an unexpected complication regarding tests I was running to verify its contents. So this week, it’s going to be extra special! Also long.
What’s the fastest way to load a Postgres table? If you believe the documentation, the
COPY command is the best way to unceremoniously heave data into a table. Fortunately after all of our talk about partitions, our minds are primed and ready to think in chunks. Why restrict ourselves to one
COPY, when we can summon an army?