PG Phriday: 10 Ways to Ruin Performance: Functionally Bankrupt

June 19th, 2015 | Published in Database, Tech Talk | 4 Comments


Functions are great. Having cut my teeth on a database that didn’t even provide the ability to define functions, I’ve come to almost take them for granted in PGDB (PostgreSQL). However, with this kind of ubiquity, sometimes they can be overused in ways that don’t seem to be part of the common programmer lexicon. In this week’s PG Phriday series on performance-killing missteps, I’m going to talk a bit about set theory, and how a certain amount of familiarity is necessary to properly interact with a database.

One of the axioms of set theory states that f(x) = y, and x !~ y. That is, a function applied to some value produces a value that may not be equivalent to the original. Put another way, a spayed pet does not have the same reproductive capacity as a regular pet, and is thus not equal to the original. In the context of a database, this is extremely relevant because equivalence is what makes calculations and restrictions possible.

Given this, consider what a database index does. When creating an index, the database takes the value of one or more columns and essentially builds a pointer tree to quickly locate a record in ln time. This is much faster than reading every record in a table and applying a filter to remove unwanted records. That’s something even novice developers tend to know. But they don’t know how functions modify this scenario.

Given the very simplified introduction to set theory, applying a function to the column value means the database must discard equality. Remember: x !~ y. The database has indexed x, not y. So when a function is used in a WHERE clause, any index on x will be ignored. This makes perfect sense, considering there are an infinite number of possible functions, and the output of each is indeterminate. It’s not possible to predict the end result of every possible function.

To further illustrate this, we’ll reuse one of our tried-and-true examples:

CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt DESC);

Pay special attention to the index on order_dt. I’ve applied a modifier that builds the index in descending order, as it’s very common to retrieve the newest records of a table. This doesn’t actually affect the values being indexed, and is a very handy feature of PGDB. We’re also going to try and use this index, because we don’t want to search through one-million records any time we want to get a few recent data points.

Here’s a query that will count the number of orders yesterday:

EXPLAIN ANALYZE
SELECT COUNT(1)
  FROM sys_order
 WHERE date_trunc('day', order_dt) = CURRENT_DATE - INTERVAL '1 day';
 
                             QUERY PLAN
--------------------------------------------------------------------
 Aggregate  (cost=29865.50..29865.51 ROWS=1 width=0)
            (actual TIME=989.773..989.774 ROWS=1 loops=1)
   ->  Seq Scan ON sys_order
         (cost=0.00..29853.00 ROWS=5000 width=0)
         (actual TIME=0.021..988.407 ROWS=1000 loops=1)
         FILTER: (date_trunc('day'::text, order_dt) = 
                 (('now'::cstring)::DATE - '1 day'::INTERVAL))
         ROWS Removed BY FILTER: 999000
 
 Planning TIME: 0.120 ms
 Execution TIME: 989.810 ms

From this output, we can see that PGDB ignored the index on order_dt and did exactly what we didn’t want. Instead of using an index to jump to the relevant values, it scanned the entire table and filtered out the values that didn’t apply. Yet the implications are actually much worse than that. The date_trunc function, even though it’s written in C, is not a free operation. So not only have we unnecessarily read a million rows, we applied a function to each and every row, just to see if the result is within the boundary we specified. And my example fits in memory; the situation degrades exponentially when that isn’t possible.

That’s insanely expensive, and our relatively small million-row table illustrates that well enough. Imagine the same operation on a table with 100M rows or more. So we lose twice: disk resources are wasted retrieving unwanted rows, and CPU time is consumed performing unnecessary function executions. Given how many times I’ve encountered this at several unrelated companies, it’s a pretty serious problem. Here’s how the query should look:

EXPLAIN ANALYZE
SELECT COUNT(1)
  FROM sys_order
 WHERE order_dt >= CURRENT_DATE - INTERVAL '1 day'
   AND order_dt < CURRENT_DATE;
 
                             QUERY PLAN
--------------------------------------------------------------------
Aggregate  (cost=2564.11..2564.12 ROWS=1 width=0)
 
[ snip ]
 
 ->  Bitmap INDEX Scan ON idx_order_order_dt
       (cost=0.00..21.38 ROWS=894 width=0)
       (actual TIME=0.381..0.381 ROWS=1000 loops=1)
       INDEX Cond: ((order_dt >= 
                     (('now'::cstring)::DATE - '1 day'::INTERVAL))
                 AND (order_dt < ('now'::cstring)::DATE))
 
 Planning TIME: 0.211 ms
 Execution TIME: 5.855 ms

The overall execution plan is slightly more complicated since the index involved now, but note the execution time: it’s almost 200 times faster than the original. All we did was modify the query to use a range that includes all the possible date and time combinations for yesterday. We needed to do that for the same reason we’d tried date_trunc previously, but the end result is the same. The only difference is that this allows the database to use a value range scan on the index and obtain all matching rows immediately.

If you’ve fallen into this trap, don’t feel bad. I’ve seen everyone do this at least once. From newbies right out of college, to highly seasoned technical leads, and even including other DBAs, there doesn’t seem to be any discernible pattern. It’s too easy to frame a query without considering the underlying mechanisms that make everything work. I also want to point out that since PGDB supports functional indexes, it’s also possible to do something like this:

CREATE INDEX idx_order_order_dt
    ON sys_order (date_trunc('day', order_dt));

In this case, we’re simply indexing the resulting value of f(x), so as long as the function call is the same in any WHERE clauses, the index will be used. To PGDB, it’s all the same. If 99% of the development staff, the application itself, and stray dogs are all using a function instead of doing it the “right” way, it’s actually the DBA that is going against the tide.

The only reason I don’t tend to recommend this pattern, is that the functional index is generally too restrictive. What if we wanted to search for a four hour window during yesterday? Well, now we can’t, because the index is only relevant for the date information. The simple index case is applicable to far more potential scenarios, and in the database world, index versatility is extremely important. I try to ensure developers I work with are cognizant of the pitfalls of arbitrarily using functions to filter results.

After all, it’s better to learn these things preemptively!


Tags: , , , ,

PG Phriday: 10 Ways to Ruin Performance: Out Of Order

June 12th, 2015 | Published in Database, Tech Talk | 1 Comment


There are a lot of database engines out there. As such, a developer or DBA will naturally have varying levels of experience with each, and some of this might conflict with how PGDB (PostgreSQL) operates. These kinds of embedded misunderstandings can cause potential issues by themselves, but in this particular case, corrective action is fairly simple.

So this week, I’d like to talk about indexes. Many people treat them as a “make query faster” button, and this often results in egregious misuse. Indexes take space, require CPU resources to maintain, and bring overhead that adversely affects INSERT and UPDATE performance. Most know these drawbacks and generally acknowledge that too many indexes can be detrimental. However, with some databases, even column order can make a drastic difference. PGDB is among these.

To illustrate the point, let’s make a simple test case of one-million records in an inventory tracking table of one thousand products over the course of about three years.

CREATE TABLE sys_inventory_snapshot
(
    product_id   SERIAL       NOT NULL,
    record_dt    TIMESTAMPTZ  NOT NULL,
    item_count   INT          NOT NULL,
    order_count  INT          NOT NULL
);
 
INSERT INTO sys_inventory_snapshot (
       product_id, item_count, order_count, record_dt
)
SELECT b.id, a.id, a.id,
       now() - (a.id || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id),
       generate_series(1, 1000) b(id);
 
ALTER TABLE sys_inventory_snapshot ADD CONSTRAINT pk_inventory_snapshot
      PRIMARY KEY (product_id, record_dt);

At first glance, this looks pretty good. If we want information based on date or product, it’s right there. Things are even better if we have both values available! The query plan is also encouraging:

EXPLAIN ANALYZE
SELECT *
  FROM sys_inventory_snapshot
 WHERE record_dt >= CURRENT_DATE - INTERVAL '1 day';
 
                            QUERY PLAN                             
--------------------------------------------------------------------
 Bitmap Heap Scan ON sys_inventory_snapshot
        (cost=22912.60..24880.07 ROWS=674 width=20)
        (actual TIME=106.051..106.530 ROWS=1000 loops=1)
   Recheck Cond: (record_dt >= (('now'::cstring)::DATE - '1 day'::INTERVAL))
   Heap Blocks: exact=7
   ->  Bitmap INDEX Scan ON pk_inventory_snapshot
         (cost=0.00..22912.43 ROWS=674 width=0)
         (actual TIME=106.027..106.027 ROWS=1000 loops=1)
         INDEX Cond: (record_dt >= (('now'::cstring)::DATE - '1 day'::INTERVAL))
 
 Planning TIME: 0.115 ms
 Execution TIME: 106.988 ms

Look at that! It’s using our date constraint to grab the most recent rows using the primary key for the table. That’s good, right?

Well, no. What’s actually happened here is somewhat esoteric, and the only real clue we have that something isn’t right, is the execution time. We just spent 100ms fetching 1000 records, which is downright offensive to an experienced DBA. Imagine this query in an application that’s executing it hundreds of times per second from multiple threads. In the database world, 100ms is basically forever. This table and query do not warrant such inexcusably bad performance.

So what happened? Buckets. Indexes take one value and point it to a bucket of other values. In the case of a composite index, one bucket value points to another bucket, which contains the values which satisfy both constraints. Now consider that record_dt is the second bucket. In order to ensure we get the most recent data for all products, we have to visit all of the top-level buckets before we can continue to the most recent date for each.

See the problem yet? Instead of simply jumping straight to the first applicable date value and retrieving the entire bucket, we have to visit 1000 buckets and then continue to 1000 smaller buckets. This is a fine access pattern if we had both the product_id and record_dt in the query, since that jumps straight to the values we want. But for everything else, it’s a non-ideal solution. It might be faster than a sequence scan if we’re lucky, but that’s not a guarantee. In some scenarios, it’s actually much slower.

PGDB veterans tend to solve this one of two ways:

  1. Reverse the column order of the index. If it turns out that 90% of queries use both columns, and 10% only use the second column, it’s better to flip them. The primary key in this case still fills its role, but without impacting performance of date searches.
  2. Add another index to represent the second column. This happens when the first column is actually used in exclusion or combination most of the time, but queries using only the second column are frequent enough to cause concern.

Because I’m lazy and don’t want to redefine the primary key, let’s take the second approach:

CREATE INDEX idx_inventory_snapshot_record_dt
    ON sys_inventory_snapshot (record_dt);
 
EXPLAIN ANALYZE
SELECT *
  FROM sys_inventory_snapshot
 WHERE record_dt >= CURRENT_DATE - INTERVAL '1 day';
 
                            QUERY PLAN                             
--------------------------------------------------------------------
 INDEX Scan USING idx_inventory_snapshot_record_dt
    ON sys_inventory_snapshot
       (cost=0.43..28.24 ROWS=674 width=20)
       (actual TIME=0.038..0.663 ROWS=1000 loops=1)
   INDEX Cond: (record_dt >= (('now'::cstring)::DATE - '1 day'::INTERVAL))
 
 Planning TIME: 0.238 ms
 Execution TIME: 1.087 ms

That’s… quite a difference. If you were unfamiliar with databases or PGDB, you might be surprised by this result. But the reality is quite clear: simply listing a column in an index does not magically improve performance of queries that reference it. Order matters. In addition, the further a column strays from the principal position, the worse the degradation becomes.

I’ve seen indexes with six columns, and then a baffled developer asks why his query is slow. Well, his query only used the fourth column in the list. At that point, reading the entire 30-million row table and filtering for desired results would have been faster. The planner tries to account for this, but sometimes statistics or settings lead it astray, and it performs an index scan and dutifully follows all those buckets down the chain to the fourth column. It happens.

Now that you know why, you’re armed to avoid contributing to the problem. Think about indexes before adding them. Consider existing indexes and queries against their parent table, and ensure that column order faithfully represents the real access pattern. It’s an important—and often missed—optimization technique.

As a side note, this is why serious companies have at least one DBA on staff. Designing table structures and optimizing queries are hard enough, but invisible killers like these are more prevalent than one might realize. An experienced database-driven application developer might be familiar with these kind of minutiae, but that’s a lot to expect. Not everyone reads database performance articles, or have encountered and addressed related problems.

Until next week!


Tags: , , , ,

Review: Learning Heroku Postgres

June 6th, 2015 | Published in Book, Review | No Comments


I recently got the opportunity to take a look at Learning Heroku Postgres, a new book by Patrick Espake that seems intended to help new PostgreSQL database administrators get their data into the cloud. The chapters are short, concise, and the questionnaires at the end are a nice touch. But does it hit the mark? Almost.

Before I get too far into this review, I should point out that Heroku is a proprietary service that presents a modular deployment system for various programming languages, applications, administration, monitoring, and other related services. Though there are free hobby-level instances for most modules, it is a commercial platform which provides SAAS (Software as a Service) across multiple geographic locations. In order to leverage it properly, I recommend these hobby-level instances only for experimentation.

To that end, this is very much a book that is a benefit to the PostgreSQL community. Small and large businesses often have trouble distributing data and applications in a high availability environment, and as such, Heroku is a potential solution for quick-and-dirty scalability at a reasonable cost.

The book itself is essentially broken down into three major parts: Tooling, Basics, and Extras. Though this is not explicitly defined by the chapter overview, this is the way it reads. This is somewhat important, because it allows a bit of skipping around for users who are already familiar with PostgreSQL, Heroku, or both.

The adventure begins with a couple short chapters on how Heroku itself is organized, and acquiring Heroku command-line tools for managing account features. Here, Espake presents a good bird’s-eye view of Heroku’s deployment infrastructure and configuration, and spends time discussing just how everything is decoupled and bound together by queues so the elastic infrastructure accurately represents the intention of the user. This is critical, as understanding the underlying landscape can (and should) directly influence development cycles, since readers must account for Heroku’s quirks while organizing their application and associated data.

From here, the discussion naturally moves to PostgreSQL itself in chapter three. Espake makes it clear that PostgreSQL is managed as a fully automated solution, behind a thick wall of tools, interfaces, and somewhat limited management commands. This is one of the most important chapters, as it effectively lays out all of the necessary commands for synchronizing data and accessing the database itself for more direct manipulation with SQL clients, languages, and drivers. Afterwards in chapter four, he addresses the topic of backups and how to secure and obtain them. Both of these chapters combine to give a reader control of how Heroku represents their data, and securing it from loss.

Chapter five is something of an oddity. Espake introduces Heroku dataclips as a method for sharing data without talking about the reality of what they are: versioned views with an exposure API. This is the first time I got the impression that this book is more of a usage manual than a true learning resource. Yes it is important to show how data can be shared or downloaded with this feature, but after the introduction in chapter one regarding Heroku’s operation, I found this omission particularly odd. Given how dataclips work, they could be combined with views for easier overall data management, and yet this option is never presented.

Chapter six moves on to instance management. By this, I mean various uses for database replicas, such as forking, failover, and replacing the current database with a previous version. All the necessary commands and GUI options are here to make juggling multiple copies of the database easier. But again I see wasted opportunity. Heroku considers ‘rollback’ the act of replacing the primary instance with a previous backup instance. The fact that this directly conflicts with the concept of a transaction rollback is never discussed. Nor are database followers equated with PostgreSQL streaming replication, the mechanism that’s probably behind the feature. I wish Espake spent more time explaining how things work, instead of just providing instructions. After all, that kind of information is probably available in Heroku’s documentation; this book should provide a deeper understanding the user can leverage toward a better PostgreSQL cluster.

The last two chapters tie up most of the remaining loose ends by covering logs and various PostgreSQL-specific extensions available on the Heroku platform. Chapter eight in particular is a laundry list of PostgreSQL extensions generally available within the contribution libraries commonly distributed with the PostgreSQL code or binaries. It’s a good resource for users unfamiliar with this functionality, and further links are provided where necessary so the reader can explore, should that feature be relevant. While not really a feature of Heroku, or even especially relevant since most PostgreSQL distributions include them anyway, extensions are part of what make PostgreSQL so powerful, so I’ll allow it.

In the end, the book adequately covers numerous Heroku commands and interface elements. I wish the author spent more time talking about how some of Heroku’s terminology conflicts with common database concepts. For example, Heroku’s idea of ‘promote’ isn’t quite what a seasoned database administrator would recognize. Allowing a new user absorb this interpretation without caveat, could lead to conceptual issues in the future. This happens often unfortunately, as I’d already mentioned regarding rollback. From chapter four onward, the book is organized like a manual as if it were written by an employee of Heroku, treating PostgreSQL as a mere Heroku module that needed a checklist of feature documentation. There’s a reason this book is so short!

Still, it’s a good way to bootstrap a Heroku deployment of PostgreSQL. If there aren’t more comprehensive books on integrating the two, there probably will be in the near future. Wait for these if you really want to delve into a Heroku deployment; for a newbie, you can’t go wrong here.


Tags: , , ,

PG Phriday: 10 Ways to Ruin Performance: Cast Away

June 5th, 2015 | Published in Database, Tech Talk | No Comments


Unlike a lot of programming languages that have loose typing, databases are extremely serious about data types. So serious in fact, many patently refuse to perform automatic type conversions in case the data being compared is not exactly equivalent afterwards. This is the case with PGDB (PostgreSQL) and surprisingly often, queries will suffer as a result. Fortunately this is something we can address!

There are easier ways to demonstrate this, but in the interests of making this more of an adventure, let’s use a join between two tables:

CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
 
CREATE TABLE sys_exchange_map
(
    exchange_map_id  SERIAL       NOT NULL,
    vendor_id        INT          NOT NULL,
    ext_order_code   TEXT         NOT NULL,
    map_message      TEXT         NOT NULL,
    xmit_dt          TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_exchange_map (
         exchange_map_id, vendor_id, ext_order_code,
         map_message, xmit_dt
)
SELECT a.id, (a.id % 10) + 1,
       ((a.id % 10000) + 1) || ':' || SUBSTRING(random()::TEXT, 3),
       'This is a bunch of junk from an external vendor queue!',
       now() - (id % 30 || 'd')::INTERVAL
  FROM generate_series(1, 100000) a(id);
 
ALTER TABLE sys_exchange_map ADD CONSTRAINT pk_exchange_map_id
      PRIMARY KEY (exchange_map_id);
 
CREATE INDEX idx_exchange_map_xmit_dt
    ON sys_exchange_map (xmit_dt);
 
ANALYZE sys_order;
ANALYZE sys_exchange_map;

This structure simulates something I’ve seen somewhat often in real systems. In this case, our order table is normal, but there’s also an exchange mapping table that maps our orders to external vendor orders. Imagine this table is part of the queue engine running the application, so the various codes and message columns are describing the interactions within the platform.

Given this structure, it’s not uncommon to encounter a query that manually parses one or more of these codes and tries to find matching orders. This is what that might look like:

SELECT o.*
  FROM sys_exchange_map m
  JOIN sys_order o ON (
         o.order_id = SUBSTRING(m.ext_order_code, '(.*):')::NUMERIC
       )
 WHERE m.xmit_dt > CURRENT_DATE;

Unfortunately on my test VM, this query takes almost four seconds! Experienced developers and DBAs probably already know what the problem is, but let’s take a gander at two segments of the EXPLAIN ANALYZE output and see if there are any obvious clues.

 MERGE JOIN  (cost=142800.24..568726.04 ROWS=16935000 width=28)
             (actual TIME=4844.280..4875.651 ROWS=3333 loops=1)
 
 [ SNIP ]
 
 ->  Seq Scan ON sys_order o
       (cost=0.00..17353.00 ROWS=1000000 width=28)
       (actual TIME=0.036..884.354 ROWS=1000000 loops=1)
 
 Planning TIME: 0.240 ms
 Execution TIME: 3626.783 ms

We should be able to immediately see two potential problems. Thinking back on our discussions regarding estimates matching actual, notice that the database thinks it will be returning almost 17M rows, when there are really only around 3300. Further, it’s scanning the entire sys_order table instead of using the primary key. What happened, here?

This may or may not come as a surprise, but NUMERIC is not compatible with standard integer types in PGDB. For those who have worked with other databases such as Oracle, it may seem natural to use NUMERIC as a catch-all for any number. This is especially true for developers who work with different data types in a myriad of languages, some of which feature multiple compatible numeric types.

With PGDB, casting is controlled through explicit declaration of type compatibility. We can look at NUMERIC and see which types it will convert to with this query:

SELECT s.typname AS source_type,
       t.typname AS target_type,
       CASE c.castcontext
         WHEN 'a' THEN 'Assignment'
         WHEN 'i' THEN 'Implicit'
         WHEN 'e' THEN 'Explicit'
       END AS context
  FROM pg_cast c
  JOIN pg_type s ON (s.oid = c.castsource)
  JOIN pg_type t ON (t.oid = c.casttarget)
 WHERE s.typname = 'numeric';
 
 source_type | target_type |  context   
-------------+-------------+------------
 NUMERIC     | int8        | Assignment
 NUMERIC     | int2        | Assignment
 NUMERIC     | int4        | Assignment
 NUMERIC     | float4      | Implicit
 NUMERIC     | float8      | Implicit
 NUMERIC     | money       | Assignment
 NUMERIC     | NUMERIC     | Implicit

From this chart, it would appear that INT and NUMERIC are compatible, but only through assignment. This means we can use them interchangeably in calculations, for example, or save an integer to a numeric column or the reverse. This does not mean they’ll be automatically converted or treated as compatible for equality tests. Since the types are not implicitly compatible, PGDB will not use the INT index for the NUMERIC cast.

Unable to use the proper index, the planner does its best to come up with a working plan. Unfortunately the type incompatibility wreaks havoc during that phase as well. What we end up with is something of a mess, with completely incorrect row estimates, an unnecessary sequence scan, and an expensive in-memory merge. And all it would take to fix all of this, is to make one tiny change to our query:

SELECT o.*
  FROM sys_exchange_map m
  JOIN sys_order o ON (
         o.order_id = SUBSTRING(m.ext_order_code, '(.*):')::INT
       )
 WHERE m.xmit_dt > CURRENT_DATE;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Nested Loop  (cost=66.98..22648.71 ROWS=3387 width=28)
              (actual TIME=1.058..42.084 ROWS=3333 loops=1)
 
 [ SNIP ]
 
 ->  INDEX Scan USING pk_order_id ON sys_order o 
       (cost=0.43..6.19 ROWS=1 width=28)
       (actual TIME=0.003..0.003 ROWS=1 loops=3333)
 
 Planning TIME: 1.176 ms
 Execution TIME: 43.635 ms

That’s a pretty massive difference! In this particular case, it’s 80 times faster, but I’ve run into situations where a single miscast caused the query to be almost 80,000 times slower. It all depends on the scale of the data involved, and how badly everything compounds at each step of the execution process.

The message here is to pay particular attention to data types and casting. If a query is performing badly when it should be much faster, make sure types are compatible across the board. Some languages are particularly strict with regard data type usage, so consider PGDB among them.


Tags: , , ,

PG Phriday: 10 Ways to Ruin Performance: IN-Sanity

May 29th, 2015 | Published in Database, Tech Talk | No Comments


When working with a database, sometimes performance problems are both far more subtle, and much worse than a query itself might suggest. The topic of this week’s PGDB (PostgreSQL) performance killers article concerns the use of the IN clause, and how misusing it can catastrophically obliterate the database in mysterious ways.

To that end, we’ll use a slightly revised single-table test case since it’s served us pretty well so far:

DROP TABLE sys_order;
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_order (product_id, item_count, order_dt)
SELECT (a.id % 100) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_product_id
    ON sys_order (product_id);
 
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt);
 
ANALYZE sys_order;

As usual, my test system is a simple dual-CPU VM with 16GB of RAM and some mysterious storage allocation from a SAN. All settings are default, and the version of PostgreSQL is the latest release of the 9.4 branch. I always recommend using the latest version of PGDB when possible, otherwise there’s a risk of missing important planner improvements.

As it turns out in this particular story, the IN clause is actually pretty well known to most developers I’ve worked with. This isn’t some obtuse syntax that only experts have encountered, and it’s used regularly in applications and ORMs through the industry. It’s ubiquitous, and consequentially, extremely dangerous.

Why dangerous? Let’s examine a sanitized query I encountered in an actual running production system. Now, our test case is scaled down by a couple orders of magnitude, so the results won’t be as drastic as what I encountered. Still, the query below performs much worse than anything we’ve discussed so far:

EXPLAIN ANALYZE
SELECT * FROM sys_order
 WHERE order_id IN (
        SELECT DISTINCT order_id
          FROM sys_order
         WHERE product_id = 10
       )
 ORDER BY order_dt DESC
 LIMIT 30;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 LIMIT  (cost=27768.47..27768.55 ROWS=30 width=20)
        (actual TIME=1362.794..1362.840 ROWS=30 loops=1)
   ->  Sort  (cost=27768.47..27791.97 ROWS=9400 width=20)
             (actual TIME=1362.785..1362.801 ROWS=30 loops=1)
 
 [ Horrible ugly mess redacted ]
 
 Planning TIME: 0.699 ms
 Execution TIME: 1363.219 ms

What we’re looking at here, is the planner ripping itself to shreds trying to optimize a query with several problems:

  • A subquery containing the same table with no aggregates.
  • Use of DISTINCT on a primary-key column.
  • Ordering the results on the outside query.
  • Using the LIMIT on the outer query.

Taking these in order, it should be obvious that the subquery is pointless in this example. The inside query is essentially the same as the outside query, minus the ordering and result limit. There are a number of reasons this might happen. The IN clause is primarily used as a type of glue. Often, a developer or ORM will take a working query and embed it as a subquery unchanged. The justification is fairly simple: I’m interested in these how these records are related, and I already have this working query.

In most cases, IN can be simplified into some kind of JOIN, since that’s how databases tend to combine related data. By using IN and a subquery, the planner has to perform numerous unnecessary optimization steps in an attempt to reach the best plan. As the amount of complexity increases, so does the number of potential execution paths. How many elements from the inner query, for instance, can be collapsed into the outer one? What is the resource cost for each variant of doing so?

Then there’s the DISTINCT clause within the inner query. That sys_order table is not joined with anything, and there are no row multiplying functions. It’s not possible for more than one of the same primary key in the results. Yet there it is, making the planner do more work.

The last two are closely related. Since the outer query doesn’t add any new WHERE clauses, applying an order and limiting the results at that point, is simply inefficient. The database must first execute the inner query to find the relevant order_id values, and afterwards, throw away all but the top 30 results. The planner could have used the index on order_dt and read it backwards. Or it could have used the index on product_id and then ordered the results afterward, depending on which was more efficient based on statistics. Instead, it has to produce, and then subsequently discard, all data that matched the subquery.

Here’s what the query should have been:

EXPLAIN ANALYZE
SELECT *
  FROM sys_order
 WHERE product_id = 10
 ORDER BY order_dt DESC
 LIMIT 30;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 LIMIT  (cost=0.42..172.64 ROWS=30 width=20)
        (actual TIME=5.049..5.101 ROWS=30 loops=1)
   ->  INDEX Scan Backward USING idx_order_order_dt ON sys_order
           (cost=0.42..53960.42 ROWS=9400 width=20)
           (actual TIME=5.046..5.073 ROWS=30 loops=1)
         FILTER: (product_id = 10)
         ROWS Removed BY FILTER: 9000
 Planning TIME: 0.099 ms
 Execution TIME: 5.137 ms

Oh, look! There’s that backward index scan I mentioned. The row estimates are a bit off, and we’ll probably want to increase statistics and analyze to produce better values, but this is a speed improvement of over 250x. In a production system, even a few milliseconds can be a huge problem with enough throughput. Multiply that by 250, and the issue is upgraded to a catastrophe.

So how did this happen?

In this particular instance, it was the fault of Java Hibernate. An object for the inner query was passed to another object to flesh out the order detail, and the result almost crashed a production system. The fix was to make smarter use of Hibernate capabilities so it didn’t generate such a terrible query. Indeed, code refactors are probably something we should all consider doing more often in order to reduce accumulated technical debt.

Unfortunately, human-generated queries aren’t free from fault, either. It’s far too tempting to smash two queries together, than to rewrite them as a merged version using proper JOIN syntax. I’ve done it myself when in a hurry to check something. The difference is that I know better than to commit such a hack to a permanent location in our code tree. And the reason I don’t do that, is because I know the potential havoc such a query can wreak.

And now, so do you. By itself, IN is a wonderful tool. But its allure can bewitch and ultimately betray when it becomes a crutch. PGDB has a lot of powerful syntax potential, and it would be a shame to limit ourselves to the basics out of familiarity.


Tags: , , ,

« Older Posts

Newer Posts »