Yesterday I gave the developers at my company what I call a DBA Chat. It’s something I try to do every month to keep them apprised on various features, caveats, performance considerations, and so on. I find that educating the folks who regularly work with the database does wonders for application performance and my sanity. The benefit of this long format is that I can go over more information than a time constrained set of slides.

This month we went over one of my slightly older talks, and even though the material was three years old from my perspective, it was all new to them. And why not? Developers have a job to do, and while they do work with the database, it generally isn’t their responsibility to research the voluminous minutia and quirks commonly associated with a specific platform. That’s my job.

So here is the first of a ten part series on anti-patterns that can annihilate PGDB (PostgreSQL) performance, some of which are outright insidious. This week we’ll start slow with a short discussion on the importance of EXPLAIN. This is PGDB’s first line of defense against bad queries, and it’s overlooked more than you might expect, as well as somewhat difficult to interpret for the uninformed. I’ll probably have a more in-depth version of this article in the future for truly interpreting EXPLAIN output, but this time we’ll be focusing on using it in general.

First, we need to set up a use case to illustrate a couple of variants. Two tables with a close relationship should do it:

CREATE TABLE sys_product
    product_id   SERIAL  PRIMARY KEY,
    prod_name    TEXT    NOT NULL,
    descr        TEXT    NOT NULL DEFAULT now()
INSERT INTO sys_product (prod_name, descr)
SELECT 'Product ' ||, 'It does stuff.'
  FROM generate_series(1, 1000) a(id);
CREATE TABLE sys_order
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
INSERT INTO sys_order (product_id, item_count)
SELECT ( % 100) + 1, (random()*10)::INT + 1
  FROM generate_series(1, 1000000) a(id);
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);
CREATE INDEX idx_order_product_id
    ON sys_order (product_id);

Now, EXPLAIN comes in many variants, but there are two that directly concern us: EXPLAIN and EXPLAIN ANALYZE. The first will list the estimated cost of the query, which indexes will be used, how many rows are estimated to match, and so on. The second will physically execute the query and obtain the actual match counts along with elapsed time for each step. Here’s what that looks like:

                             QUERY PLAN                             
 Seq Scan ON sys_order  (cost=0.00..16370.00 ROWS=1000000 width=20)
                             QUERY PLAN
 Seq Scan ON sys_order  (cost=0.00..16370.00 ROWS=1000000 width=20)
     (actual TIME=0.006..465.956 ROWS=1000000 loops=1)
 Planning TIME: 0.041 ms
 Execution TIME: 886.766 ms

I’ve reformatted the results a bit, but the difference in output should be readily apparent. Interpreting this isn’t even very difficult at a high level. Begin by examining the first set of parenthesis in both results, and ignore the cost for now; it’s essentially meaningless noise intended primarily for the query planner. What concerns us is the row count, which is one-million in this case. It’s how many rows PGDB estimated it would return.

The second set of parenthesis in the ANALYZE version gives the actual values obtained by executing the query. In this case, the query matched exactly one million rows, and took 465ms to do so. If you don’t know already, a Seq Scan is short for Sequence Scan, and means the database read the entire contents of the table. That should be expected since we didn’t provide any WHERE clause to restrict the results.

Pay special attention to the rows section, because it can be a major clue to ferret out performance problems. PGDB keeps statistics on table contents, but these are necessarily sparse and in aggregate form. There’s also currently no way to correlate statistics between tables, so a JOIN can drastically affect row estimates, and since it’s how cost is calculated, can result in wildly different execution plans than the optimal case.

So we know that a Sequence Scan will read the entire table. Hopefully that point by itself is an obvious indication that doing so is generally undesirable. If we are querying a table with 300-million rows, reading its entire contents is almost never the intended approach, nor should it be.

What other variants should we look for? How about an Index Scan:

 SELECT * FROM sys_order
  WHERE order_id BETWEEN 5 AND 10;
                             QUERY PLAN                             
 INDEX Scan USING pk_order_order_id ON sys_order 
       (cost=0.42..8.53 ROWS=5 width=20)
       (actual TIME=0.008..0.012 ROWS=6 loops=1)
   INDEX Cond: ((order_id >= 5) AND (order_id <= 10))
 Planning TIME: 0.112 ms
 Execution TIME: 0.041 ms

We can see a few things from these results:

  • Estimated row count is not the same as how many actually matched.
  • The rows were found with an Index Scan.
  • Execution time was much faster than the Sequence Scan case.

It’s not critical for estimated and actual row counts to match, but they should be within an order of magnitude of each other. If this isn’t the case, it’s likely the planner has insufficient information about the data, or statistics are wrong, or data is more closely correlated than it assumed, resulting in less eliminations than it expected, and so on. This is one of the first ways to see that there could be a problem with a query’s execution plan.

Regarding the index scan, matching five or six values against a million isn’t a big deal, but there are limits. Indexes in PGDB work, in most cases, by fetching very quickly from the index file and then consulting the base table file to retrieve the matching row. This results in two random seeks to the underlying storage subsystem, and if you don’t know already, random reads are usually slower than a sequential read by an order of magnitude or more. This doesn’t apply for SSD devices, but most databases don’t reside entirely on such expensive hardware just yet.

That’s why the rows element of the EXPLAIN is important in respect to index scans; it’s entirely possible that reading the entire table would be faster than performing several hundred thousand index and table seeks. For the most part, the planner knows this, but calculating this relationship is still more of an art than a science. Consequentially, sometimes the planner guesses wrong, and query performance will suffer. By using EXPLAIN, we can see what happened, and find a way around the problem.

Let’s turn this into a JOIN to make it a little more fun:

   FROM sys_order o
   JOIN sys_product p USING (product_id)
  WHERE p.product_id IN (5, 10, 20);
                             QUERY PLAN                             
 Nested Loop  (cost=188.87..15715.29 ROWS=3614 width=20)
              (actual TIME=2.317..66.377 ROWS=30000 loops=1)
   ->  INDEX ONLY Scan USING sys_product_pkey ON sys_product p
             (cost=0.28..24.88 ROWS=3 width=4)
             (actual TIME=0.033..0.056 ROWS=3 loops=1)
         INDEX Cond: (product_id = ANY ('{5,10,20}'::INTEGER[]))
         Heap Fetches: 3
   ->  Bitmap Heap Scan ON sys_order o
              (cost=188.59..5130.14 ROWS=10000 width=20)
              (actual TIME=2.105..13.037 ROWS=10000 loops=3)
         Recheck Cond: (product_id = p.product_id)
         Heap Blocks: exact=19107
         ->  Bitmap INDEX Scan ON idx_order_product_id
               (cost=0.00..186.09 ROWS=10000 width=0)
               (actual TIME=1.170..1.170 ROWS=10000 loops=3)
               INDEX Cond: (product_id = p.product_id)
 Planning TIME: 0.274 ms
 Execution TIME: 78.834 ms

Wow! Luckily we can ignore most of this output by focusing on the Nested Loop at the top. This is really just a fancy way of saying “For Loop”. This is also the outermost level of the query plan, as EXPLAIN output is somewhat inverted. Each nesting level is a prerequisite step to construct the layer above it, somewhat like an onion.

If we look at the loops section in the actual execution information for each of these blocks, we can see that the loop really ran three times (the nested entries show loops=3), even though it says it only ran once. The distinction here is that the outermost loop ran once, and it is somewhat misleading.

But basically PGDB fetched three rows from the product table, and looped through memory using those values to find corresponding data in the order table. And as you might imagine, this is fine for small numbers of loops. There are occasions where PostgreSQL will underestimate row counts and opt for a Nested Loop, when really there are hundreds of thousands or millions of potential loop candidates, and it will use the same process as we saw above.

In such cases, execution time is extremely adversely impacted. Why? Nested loops and other types of loop are computationally expensive. As far as the database is concerned, it’s almost always faster to create an in-memory object of some kind and perform a bulk operation on all values at once, such as a comparison or a transformation. Nested loops make this impossible, and impose a series of calculations and function callbacks on each individual row set in O(n*m) scale. For small loop counts, this is often cheaper than building those afore-mentioned memory representations, but scale is vastly important.

Here’s an example of what I mean, and we don’t even need the whole EXPLAIN output to see the problem:

   FROM sys_order o
   JOIN sys_product p USING (product_id)
  WHERE p.product_id BETWEEN 6 AND 10;
                             QUERY PLAN                             
 Nested Loop  (cost=188.20..17588.55 ROWS=4000 width=20)
              (actual TIME=2.542..105.528 ROWS=50000 loops=1)

This should be ringing alarm bells. Note that PGDB expected 4000 rows, and actually matched 50,000. At this scale that isn’t really a major issue, but expand these tables by a couple orders of magnitude, and that can quickly change. I’ve seen several queries where the estimated row count is five orders of magnitude less than the actual results. Disk fetches are exceptionally expensive, and multiplying random read times by a few thousand adds up. If a query is using a nested loop is being slow, always check the statistics based on EXPLAIN output; you may need to revise your query to compensate.

Those are the Big Three in a nutshell. Even if you know nothing else about PGDB, how it plans or executes queries, or even databases in general, try to examine the execution path of your queries whenever possible. This is especially true if you have a query that’s unexpectedly slow. Being cognizant of potential issues prevents malformed queries or structures from worming their way permanently into a production application, and that’s something we can all celebrate.

Happy querying.

PG Phriday: 10 Ways to Ruin Performance: Forgetting to EXPLAIN
Tagged on:                 

6 thoughts on “PG Phriday: 10 Ways to Ruin Performance: Forgetting to EXPLAIN

  • Nice article. Just a question, in the first join example, is not a “warning” that the expected rows are 3614 and the actual rows are 30000.

    It’s not a similar case to the second join example? Thanks

  • “Execution time was much faster than the Sequence Scan case”

    This is somewhat misleading. You were only retrieving six rows in the Index Scan case whereas PG had to fetch a million rows in the Sequence Scan. The former overall run-time is naturally going to be smaller. The per-row time favors the sequential scan, but it doesn’t really make sense to compare execution times between these two.

    1. That’s actually irrelevant. The sequence scan will read the entire table, regardless of what values you are trying to match. That’s the only way it can know it’s returning all matches contained within the table. In fact, it’ll even be slightly slower with a WHERE clause, since it has to apply a post-filter instead of just fetching all results.

Comments are closed.