Recently a coworker asked me this question:
Should I expect variance between minutes and hours for the same query?
And I was forced to give him this answer:
Potentially, but not commonly. Query planning is an inexact science, and regardless of the query being the “same query,” the data is not the “same data.” This isn’t generally the case, but on occasion, changes in data can affect the query execution path. Usually this is a good thing, as the database accounts for new value distributions.
For example, if there are a million distinct values in one column, but 90% of them are the same, certain values should trigger an index scan instead of a sequential scan. Those values will change over time, and if the stats don’t account for that, queries will have non-dependable performance. Of course, this introduces potential correlation assumptions that aren’t correct in some cases, and that also causes unreliable query performance. I guess the question is: which would you rather have?
The Postgres query planner is house of cards built upon the ever-shifting sand of our data. It has the utterly impossible mission of converting our ridiculous and inane requests into a logical series of fetch, filter, sort, join, and other instructions. Then the resulting steps must be ruthlessly efficient or the execution phase could very well saturate every hardware resource available; Set Theory isn’t very forgiving.
Forewarned is forearmed is very apt when applied to database query planners. Without proper statistics, they are reduced to assumptions that make adequate first approximations. But scale is the utter enemy of imprecision, as multiplicative effects quickly overwhelm reality. This allows seemingly simple report scripts to fall endlessly into a pit of smoldering system resources.
Partitioning tables in Postgres can be an extremely risky endeavor. Unfortunately on many larger systems, it’s also essentially a requirement; the maximum size of a Postgres table is 32TB. This isn’t just because converting an existing table to a series of partitions is expensive or time consuming. We must consider how the query planner will react to the partitioned version of a table. There’s also the very real risk we will (or already have) implement flaws in the trigger or constraint logic.
Tools like pg_partman or pg_pathman are great for new systems, but existing deployments require retrofitting. So what happens if we have a large warehouse that’s already using a slightly flawed partition engine? And what other concerns remain even after achieving a pristine setup?
For all of those warehouse queries that never seem to complete before the heat death of the universe, there’s often a faster version. Sometimes this is due to a fundamental misunderstanding of how queries work, or how Postgres specifically functions. The trick is knowing when to back away slowly from an ugly but efficient query, and when to inject a flurry of predicates to fully illustrate the original intent of the query so the planner makes better decisions. When partitions are involved, this can be quite an adventure.
Like any programming language, the PL/pgSQL Postgres procedural language has plenty of handy control structures. Among those thankfully modern accoutrements is the humble
EXCEPTION block. While not the more prevalent try/catch methodology, we can use
BEGIN anywhere to start an embedded section for the same effect.
Knowing this is the case, what are the costs of actually using them? Postgres is fairly streamlined, and as such, can take several shortcuts when there are no exception blocks within a chunk of PL/pgSQL. Thus, it’s not uncommon for members of mailing lists to mutter about ‘performance costs’ and suggest that the worst performance hit is felt by even breathing the word ‘exception’ in the presence of a Postgres instance. The implication is that further leverage will be incremental by comparison.