Postgres is great, but it can’t run itself in all cases. Things come up. Queries go awry. Hardware fails, and users leave transactions open for interminable lengths of time. What happens if one of these things occur while the DBA themselves has a hardware fault? While they’re down for maintenance, someone still has to keep an eye on things. For the last PG Phriday of the year completely unrelated to my upcoming surgery, let’s talk about what happens when your DBA becomes inoperative due to medical complications.
This is Fake Postgres DBA 101!
Recently a coworker asked me this question:
Should I expect variance between minutes and hours for the same query?
And I was forced to give him this answer:
Potentially, but not commonly. Query planning is an inexact science, and regardless of the query being the “same query,” the data is not the “same data.” This isn’t generally the case, but on occasion, changes in data can affect the query execution path. Usually this is a good thing, as the database accounts for new value distributions.
For example, if there are a million distinct values in one column, but 90% of them are the same, certain values should trigger an index scan instead of a sequential scan. Those values will change over time, and if the stats don’t account for that, queries will have non-dependable performance. Of course, this introduces potential correlation assumptions that aren’t correct in some cases, and that also causes unreliable query performance. I guess the question is: which would you rather have?
It’s about the time for year-end performance reviews. While I’m always afraid I’ll narrowly avoid being fired for gross incompetence, that’s not usually how it goes. But that meeting did remind me about a bit of restructuring I plan to impose for 2017 that should vastly improve database availability across our organization. Many of the techniques to accomplish that—while Postgres tools in our case—are not Postgres-specific concepts.
One of the best features Postgres boasts is the ability to adapt. Any schmo off the street can write an extension and bolt it onto Postgres with nary a second glance. As proof, I’m going to whip one up really quick. That should be enough to convince anyone that it takes no skill at all to add functionality to Postgres.
The Postgres query planner is house of cards built upon the ever-shifting sand of our data. It has the utterly impossible mission of converting our ridiculous and inane requests into a logical series of fetch, filter, sort, join, and other instructions. Then the resulting steps must be ruthlessly efficient or the execution phase could very well saturate every hardware resource available; Set Theory isn’t very forgiving.
Forewarned is forearmed is very apt when applied to database query planners. Without proper statistics, they are reduced to assumptions that make adequate first approximations. But scale is the utter enemy of imprecision, as multiplicative effects quickly overwhelm reality. This allows seemingly simple report scripts to fall endlessly into a pit of smoldering system resources.