Functions are great. Having cut my teeth on a database that didn’t even provide the ability to define functions, I’ve come to almost take them for granted in Postgres (PostgreSQL). However, with this kind of ubiquity, sometimes they can be overused in ways that don’t seem to be part of the common programmer lexicon. In this week’s PG Phriday series on performance-killing missteps, I’m going to talk a bit about set theory, and how a certain amount of familiarity is necessary to properly interact with a database.
There are a lot of database engines out there. As such, a developer or DBA will naturally have varying levels of experience with each, and some of this might conflict with how Postgres (PostgreSQL) operates. These kinds of embedded misunderstandings can cause potential issues by themselves, but in this particular case, corrective action is fairly simple.
So this week, I’d like to talk about indexes. Many people treat them as a “make query faster” button, and this often results in egregious misuse.
I recently got the opportunity to take a look at Learning Heroku Postgres, a new book by Patrick Espake that seems intended to help new PostgreSQL database administrators get their data into the cloud. The chapters are short, concise, and the questionnaires at the end are a nice touch. But does it hit the mark? Almost.
Before I get too far into this review, I should point out that Heroku is a proprietary service that presents a modular deployment system for various programming languages, applications, administration, monitoring, and other related services.
Unlike a lot of programming languages that have loose typing, databases are extremely serious about data types. So serious in fact, many patently refuse to perform automatic type conversions in case the data being compared is not exactly equivalent afterwards. This is the case with Postgres (PostgreSQL) and surprisingly often, queries will suffer as a result. Fortunately this is something we can address!
There are easier ways to demonstrate this, but in the interests of making this more of an adventure, let’s use a join between two tables:
When working with a database, sometimes performance problems are both far more subtle, and much worse than a query itself might suggest. The topic of this week’s Postgres (PostgreSQL) performance killers article concerns the use of the IN clause, and how misusing it can catastrophically obliterate the database in mysterious ways.
To that end, we’ll use a slightly revised single-table test case since it’s served us pretty well so far: