PG Phriday: 10 Ways to Ruin Performance: Cast Away

June 5th, 2015 | Published in Database, Tech Talk | No Comments


Unlike a lot of programming languages that have loose typing, databases are extremely serious about data types. So serious in fact, many patently refuse to perform automatic type conversions in case the data being compared is not exactly equivalent afterwards. This is the case with PGDB (PostgreSQL) and surprisingly often, queries will suffer as a result. Fortunately this is something we can address!

There are easier ways to demonstrate this, but in the interests of making this more of an adventure, let’s use a join between two tables:

CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
 
CREATE TABLE sys_exchange_map
(
    exchange_map_id  SERIAL       NOT NULL,
    vendor_id        INT          NOT NULL,
    ext_order_code   TEXT         NOT NULL,
    map_message      TEXT         NOT NULL,
    xmit_dt          TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_exchange_map (
         exchange_map_id, vendor_id, ext_order_code,
         map_message, xmit_dt
)
SELECT a.id, (a.id % 10) + 1,
       ((a.id % 10000) + 1) || ':' || SUBSTRING(random()::TEXT, 3),
       'This is a bunch of junk from an external vendor queue!',
       now() - (id % 30 || 'd')::INTERVAL
  FROM generate_series(1, 100000) a(id);
 
ALTER TABLE sys_exchange_map ADD CONSTRAINT pk_exchange_map_id
      PRIMARY KEY (exchange_map_id);
 
CREATE INDEX idx_exchange_map_xmit_dt
    ON sys_exchange_map (xmit_dt);
 
ANALYZE sys_order;
ANALYZE sys_exchange_map;

This structure simulates something I’ve seen somewhat often in real systems. In this case, our order table is normal, but there’s also an exchange mapping table that maps our orders to external vendor orders. Imagine this table is part of the queue engine running the application, so the various codes and message columns are describing the interactions within the platform.

Given this structure, it’s not uncommon to encounter a query that manually parses one or more of these codes and tries to find matching orders. This is what that might look like:

SELECT o.*
  FROM sys_exchange_map m
  JOIN sys_order o ON (
         o.order_id = SUBSTRING(m.ext_order_code, '(.*):')::NUMERIC
       )
 WHERE m.xmit_dt > CURRENT_DATE;

Unfortunately on my test VM, this query takes almost four seconds! Experienced developers and DBAs probably already know what the problem is, but let’s take a gander at two segments of the EXPLAIN ANALYZE output and see if there are any obvious clues.

 MERGE JOIN  (cost=142800.24..568726.04 ROWS=16935000 width=28)
             (actual TIME=4844.280..4875.651 ROWS=3333 loops=1)
 
 [ SNIP ]
 
 ->  Seq Scan ON sys_order o
       (cost=0.00..17353.00 ROWS=1000000 width=28)
       (actual TIME=0.036..884.354 ROWS=1000000 loops=1)
 
 Planning TIME: 0.240 ms
 Execution TIME: 3626.783 ms

We should be able to immediately see two potential problems. Thinking back on our discussions regarding estimates matching actual, notice that the database thinks it will be returning almost 17M rows, when there are really only around 3300. Further, it’s scanning the entire sys_order table instead of using the primary key. What happened, here?

This may or may not come as a surprise, but NUMERIC is not compatible with standard integer types in PGDB. For those who have worked with other databases such as Oracle, it may seem natural to use NUMERIC as a catch-all for any number. This is especially true for developers who work with different data types in a myriad of languages, some of which feature multiple compatible numeric types.

With PGDB, casting is controlled through explicit declaration of type compatibility. We can look at NUMERIC and see which types it will convert to with this query:

SELECT s.typname AS source_type,
       t.typname AS target_type,
       CASE c.castcontext
         WHEN 'a' THEN 'Assignment'
         WHEN 'i' THEN 'Implicit'
         WHEN 'e' THEN 'Explicit'
       END AS context
  FROM pg_cast c
  JOIN pg_type s ON (s.oid = c.castsource)
  JOIN pg_type t ON (t.oid = c.casttarget)
 WHERE s.typname = 'numeric';
 
 source_type | target_type |  context   
-------------+-------------+------------
 NUMERIC     | int8        | Assignment
 NUMERIC     | int2        | Assignment
 NUMERIC     | int4        | Assignment
 NUMERIC     | float4      | Implicit
 NUMERIC     | float8      | Implicit
 NUMERIC     | money       | Assignment
 NUMERIC     | NUMERIC     | Implicit

From this chart, it would appear that INT and NUMERIC are compatible, but only through assignment. This means we can use them interchangeably in calculations, for example, or save an integer to a numeric column or the reverse. This does not mean they’ll be automatically converted or treated as compatible for equality tests. Since the types are not implicitly compatible, PGDB will not use the INT index for the NUMERIC cast.

Unable to use the proper index, the planner does its best to come up with a working plan. Unfortunately the type incompatibility wreaks havoc during that phase as well. What we end up with is something of a mess, with completely incorrect row estimates, an unnecessary sequence scan, and an expensive in-memory merge. And all it would take to fix all of this, is to make one tiny change to our query:

SELECT o.*
  FROM sys_exchange_map m
  JOIN sys_order o ON (
         o.order_id = SUBSTRING(m.ext_order_code, '(.*):')::INT
       )
 WHERE m.xmit_dt > CURRENT_DATE;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Nested Loop  (cost=66.98..22648.71 ROWS=3387 width=28)
              (actual TIME=1.058..42.084 ROWS=3333 loops=1)
 
 [ SNIP ]
 
 ->  INDEX Scan USING pk_order_id ON sys_order o 
       (cost=0.43..6.19 ROWS=1 width=28)
       (actual TIME=0.003..0.003 ROWS=1 loops=3333)
 
 Planning TIME: 1.176 ms
 Execution TIME: 43.635 ms

That’s a pretty massive difference! In this particular case, it’s 80 times faster, but I’ve run into situations where a single miscast caused the query to be almost 80,000 times slower. It all depends on the scale of the data involved, and how badly everything compounds at each step of the execution process.

The message here is to pay particular attention to data types and casting. If a query is performing badly when it should be much faster, make sure types are compatible across the board. Some languages are particularly strict with regard data type usage, so consider PGDB among them.


Tags: , , ,

PG Phriday: 10 Ways to Ruin Performance: IN-Sanity

May 29th, 2015 | Published in Database, Tech Talk | No Comments


When working with a database, sometimes performance problems are both far more subtle, and much worse than a query itself might suggest. The topic of this week’s PGDB (PostgreSQL) performance killers article concerns the use of the IN clause, and how misusing it can catastrophically obliterate the database in mysterious ways.

To that end, we’ll use a slightly revised single-table test case since it’s served us pretty well so far:

DROP TABLE sys_order;
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_order (product_id, item_count, order_dt)
SELECT (a.id % 100) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_product_id
    ON sys_order (product_id);
 
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt);
 
ANALYZE sys_order;

As usual, my test system is a simple dual-CPU VM with 16GB of RAM and some mysterious storage allocation from a SAN. All settings are default, and the version of PostgreSQL is the latest release of the 9.4 branch. I always recommend using the latest version of PGDB when possible, otherwise there’s a risk of missing important planner improvements.

As it turns out in this particular story, the IN clause is actually pretty well known to most developers I’ve worked with. This isn’t some obtuse syntax that only experts have encountered, and it’s used regularly in applications and ORMs through the industry. It’s ubiquitous, and consequentially, extremely dangerous.

Why dangerous? Let’s examine a sanitized query I encountered in an actual running production system. Now, our test case is scaled down by a couple orders of magnitude, so the results won’t be as drastic as what I encountered. Still, the query below performs much worse than anything we’ve discussed so far:

EXPLAIN ANALYZE
SELECT * FROM sys_order
 WHERE order_id IN (
        SELECT DISTINCT order_id
          FROM sys_order
         WHERE product_id = 10
       )
 ORDER BY order_dt DESC
 LIMIT 30;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 LIMIT  (cost=27768.47..27768.55 ROWS=30 width=20)
        (actual TIME=1362.794..1362.840 ROWS=30 loops=1)
   ->  Sort  (cost=27768.47..27791.97 ROWS=9400 width=20)
             (actual TIME=1362.785..1362.801 ROWS=30 loops=1)
 
 [ Horrible ugly mess redacted ]
 
 Planning TIME: 0.699 ms
 Execution TIME: 1363.219 ms

What we’re looking at here, is the planner ripping itself to shreds trying to optimize a query with several problems:

  • A subquery containing the same table with no aggregates.
  • Use of DISTINCT on a primary-key column.
  • Ordering the results on the outside query.
  • Using the LIMIT on the outer query.

Taking these in order, it should be obvious that the subquery is pointless in this example. The inside query is essentially the same as the outside query, minus the ordering and result limit. There are a number of reasons this might happen. The IN clause is primarily used as a type of glue. Often, a developer or ORM will take a working query and embed it as a subquery unchanged. The justification is fairly simple: I’m interested in these how these records are related, and I already have this working query.

In most cases, IN can be simplified into some kind of JOIN, since that’s how databases tend to combine related data. By using IN and a subquery, the planner has to perform numerous unnecessary optimization steps in an attempt to reach the best plan. As the amount of complexity increases, so does the number of potential execution paths. How many elements from the inner query, for instance, can be collapsed into the outer one? What is the resource cost for each variant of doing so?

Then there’s the DISTINCT clause within the inner query. That sys_order table is not joined with anything, and there are no row multiplying functions. It’s not possible for more than one of the same primary key in the results. Yet there it is, making the planner do more work.

The last two are closely related. Since the outer query doesn’t add any new WHERE clauses, applying an order and limiting the results at that point, is simply inefficient. The database must first execute the inner query to find the relevant order_id values, and afterwards, throw away all but the top 30 results. The planner could have used the index on order_dt and read it backwards. Or it could have used the index on product_id and then ordered the results afterward, depending on which was more efficient based on statistics. Instead, it has to produce, and then subsequently discard, all data that matched the subquery.

Here’s what the query should have been:

EXPLAIN ANALYZE
SELECT *
  FROM sys_order
 WHERE product_id = 10
 ORDER BY order_dt DESC
 LIMIT 30;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 LIMIT  (cost=0.42..172.64 ROWS=30 width=20)
        (actual TIME=5.049..5.101 ROWS=30 loops=1)
   ->  INDEX Scan Backward USING idx_order_order_dt ON sys_order
           (cost=0.42..53960.42 ROWS=9400 width=20)
           (actual TIME=5.046..5.073 ROWS=30 loops=1)
         FILTER: (product_id = 10)
         ROWS Removed BY FILTER: 9000
 Planning TIME: 0.099 ms
 Execution TIME: 5.137 ms

Oh, look! There’s that backward index scan I mentioned. The row estimates are a bit off, and we’ll probably want to increase statistics and analyze to produce better values, but this is a speed improvement of over 250x. In a production system, even a few milliseconds can be a huge problem with enough throughput. Multiply that by 250, and the issue is upgraded to a catastrophe.

So how did this happen?

In this particular instance, it was the fault of Java Hibernate. An object for the inner query was passed to another object to flesh out the order detail, and the result almost crashed a production system. The fix was to make smarter use of Hibernate capabilities so it didn’t generate such a terrible query. Indeed, code refactors are probably something we should all consider doing more often in order to reduce accumulated technical debt.

Unfortunately, human-generated queries aren’t free from fault, either. It’s far too tempting to smash two queries together, than to rewrite them as a merged version using proper JOIN syntax. I’ve done it myself when in a hurry to check something. The difference is that I know better than to commit such a hack to a permanent location in our code tree. And the reason I don’t do that, is because I know the potential havoc such a query can wreak.

And now, so do you. By itself, IN is a wonderful tool. But its allure can bewitch and ultimately betray when it becomes a crutch. PGDB has a lot of powerful syntax potential, and it would be a shame to limit ourselves to the basics out of familiarity.


Tags: , , ,

PG Phriday: 10 Ways to Ruin Performance: Forgetting it EXISTS

May 22nd, 2015 | Published in Database, Tech Talk | 17 Comments


For the second of my ten part series on hidden PGDB (PostgreSQL) performance killers, I’m going to talk about something called an anti-join. It’s not a well-known approach outside of the database world, but due to how it works, it can impart almost magical plan revisions that drastically improve query performance in the right scenario. Developers can add it to a growing bag of tricks when working on database-driven content, since it comes in handy more often than you might expect.

Let’s build a test-case, shall we?

CREATE TABLE sys_product
(
    product_id   SERIAL  PRIMARY KEY,
    prod_name    TEXT    NOT NULL,
    quality      INT     NOT NULL,
    descr        TEXT    NOT NULL DEFAULT now()
);
 
INSERT INTO sys_product (prod_name, quality, descr)
SELECT 'Product ' || a.id::TEXT,
       log((a.id % 100) + 1)::INT,
       'It does stuff.'
  FROM generate_series(1, 100000) a(id);
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_product_id
    ON sys_order (product_id);
 
CREATE INDEX idx_order_valid_dt
    ON sys_order (valid_dt);
 
ANALYZE sys_product;
ANALYZE sys_order;

This is a very basic product and order table structure, and we’ve used it before in the last installment of this series. The only columns we’ve added since last time is the quality column in sys_product and the valid_dt column in sys_order. This way, we can introduce some variability into the query plan due to data correlation effects. We’ve tried to distribute the data such that new orders have not been validated, and there are still some non-validated orders in the past. This distribution is more reliable than using random().

Now let’s build a query that retrieves all orders on products with a quality of five using a simple JOIN:

EXPLAIN ANALYZE
SELECT o.*
  FROM sys_order o
  JOIN sys_product p USING (product_id)
 WHERE o.valid_dt IS NULL
   AND p.quality = 2;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash JOIN  (cost=2985.66..7406.44 ROWS=1303 width=28)
            (actual TIME=80.438..90.406 ROWS=1383 loops=1)
   Hash Cond: (o.product_id = p.product_id)
   ->  Bitmap Heap Scan ON sys_order o
              (cost=39.41..4423.00 ROWS=1933 width=28)
              (actual TIME=0.600..6.879 ROWS=2004 loops=1)
         Recheck Cond: (valid_dt IS NULL)
         Heap Blocks: exact=2004
         ->  Bitmap INDEX Scan ON idx_order_valid_dt
                    (cost=0.00..38.92 ROWS=1933 width=0)
                    (actual TIME=0.327..0.327 ROWS=2004 loops=1)
               INDEX Cond: (valid_dt IS NULL)
   ->  Hash  (cost=2084.00..2084.00 ROWS=68980 width=4)
             (actual TIME=79.773..79.773 ROWS=69000 loops=1)
         Buckets: 8192  Batches: 1  Memory Usage: 2426
         ->  Seq Scan ON sys_product p
                 (cost=0.00..2084.00 ROWS=68980 width=4)
                 (actual TIME=0.013..41.150 ROWS=69000 loops=1)
               FILTER: (quality = 2)
               ROWS Removed BY FILTER: 31000
 Planning TIME: 0.732 ms
 Execution TIME: 91.875 ms

We can actually infer quite a bit from this execution plan.

  • The sys_order table was checked with the index on valid_dt as desired.
  • The sys_product table was scanned in its entirety.
  • The planner estimates all match up almost exactly.
  • Final result was obtained by hashing the two row sets together.

From this, we could say that this is the ideal query plan. The index we wanted was used, and it reduced one million rows to about 2000. Since we don’t have an index on product quality, we might as well fetch them all and filter out the wrong quality before executing the hash. Once the hash between both is computed, we get about 1300 rows as a result.

Of course, this approach makes a huge mistake. Even though the order and product tables are within an order of magnitude of each other, we’re only interested in a relatively small fraction of the order table. Thus, fetching all products at this scale is probably a losing proposition. However, since the planner doesn’t have data regarding how closely these tables are correlated, it can’t make that assumption. Indeed, is it fetching 2000 unique product IDs, or 200? Since it can’t fetch the related products at this point, it does the best it can.

But can we do better? As it turns out, we can do a whole lot better by telling the planner what we really wanted. Note that we didn’t actually use the results from the product table in the SELECT portion of the query. Further, we know that we are only interested in a maximum of about 1300 products, so why check all of them? Is there a way to force the planner to fetch the matching orders first, and then check to see which products correspond as a second step?

We already know from last time that this is what a nested loop does, and one way to encourage the planner to pick a nested loop is by using EXISTS to break apart our JOIN condition. This accomplishes two things:

  1. Don’t join against a table to fetch columns that will never be returned.
  2. Favor some kind of looping construct to avoid excessive record comparisons.

This is also what is known as an anti-join, and PGDB performs very well when guided in that direction. Here’s what our query and plan look like using this concept:

EXPLAIN ANALYZE
SELECT o.*
  FROM sys_order o
 WHERE o.valid_dt IS NULL
   AND EXISTS (
         SELECT 1
           FROM sys_product p
          WHERE p.product_id = o.product_id
            AND p.quality = 2
          LIMIT 1
       );
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Bitmap Heap Scan ON sys_order o 
             (cost=39.16..20490.82 ROWS=967 width=28)
             (actual TIME=0.565..11.462 ROWS=1383 loops=1)
   Recheck Cond: (valid_dt IS NULL)
   FILTER: (SubPlan 1)
   ROWS Removed BY FILTER: 621
   Heap Blocks: exact=2004
   ->  Bitmap INDEX Scan ON idx_order_valid_dt
              (cost=0.00..38.92 ROWS=1933 width=0)
              (actual TIME=0.269..0.269 ROWS=2004 loops=1)
         INDEX Cond: (valid_dt IS NULL)
   SubPlan 1
     ->  LIMIT  (cost=0.29..8.31 ROWS=1 width=0)
                (actual TIME=0.003..0.003 ROWS=1 loops=2004)
           ->  INDEX Scan USING sys_product_pkey ON sys_product p
                     (cost=0.29..8.31 ROWS=1 width=0)
                     (actual TIME=0.002..0.002 ROWS=1 loops=2004)
                 INDEX Cond: (product_id = o.product_id)
                 FILTER: (quality = 2)
                 ROWS Removed BY FILTER: 0
 Planning TIME: 0.126 ms
 Execution TIME: 12.154 ms

This time, something drastically different happened:

  • The executor took our matching rows from the valid_dt index and stashed them in the memory heap.
  • Using those same rows, it executed about 2000 index lookups into the product table.
  • The results are produced almost 8x faster.

Before anyone takes this result and starts rewriting all queries everywhere to utilize anti-joins, keep in mind that they’re extremely situational. The example here was built based on an actual application query at a company I used to work with. In their case, execution went from 2500ms to 280ms following the rewrite. But in building this test-case to simulate their data, I ended up creating a dozen data distributions that were slightly slower using this approach.

In addition, take a look at the LIMIT 1 section of the subquery. We’re explicitly telling the planner that only one product can match per seek, which is what the word EXISTS should imply. Yet if we remove it, the execution plan actually reverts to the same approach used by the more naive JOIN. Ouch!

But that’s OK because queries are, by their nature, more of a linguistic construct than a computational one. Having a larger vocabulary can help convey more meaning and sometimes produce different results, but our audience is perfectly within their rights to summarize or ignore irrelevant prattle. The goal in this case is adding to our lexicon, and trying to frame applicable scenarios in a different way in hopes that the planner will catch on.

For data distributions that apply, anti-joins are a very powerful technique. If nothing else, they’re worth a try when other approaches fall flat. The only way to know for sure is experimentation! They’re just a tool, and like any tool, should only be used at opportune instances.

So remember the anti-join—it’s a great tool for that rare hex-socket screw you occasionally stumble across.


Tags: , , , ,

PG Phriday: 10 Ways to Ruin Performance: Forgetting to EXPLAIN

May 15th, 2015 | Published in Database, Tech Talk | 6 Comments


Yesterday I gave the developers at my company what I call a DBA Chat. It’s something I try to do every month to keep them apprised on various features, caveats, performance considerations, and so on. I find that educating the folks who regularly work with the database does wonders for application performance and my sanity. The benefit of this long format is that I can go over more information than a time constrained set of slides.

This month we went over one of my slightly older talks, and even though the material was three years old from my perspective, it was all new to them. And why not? Developers have a job to do, and while they do work with the database, it generally isn’t their responsibility to research the voluminous minutia and quirks commonly associated with a specific platform. That’s my job.

So here is the first of a ten part series on anti-patterns that can annihilate PGDB (PostgreSQL) performance, some of which are outright insidious. This week we’ll start slow with a short discussion on the importance of EXPLAIN. This is PGDB’s first line of defense against bad queries, and it’s overlooked more than you might expect, as well as somewhat difficult to interpret for the uninformed. I’ll probably have a more in-depth version of this article in the future for truly interpreting EXPLAIN output, but this time we’ll be focusing on using it in general.

First, we need to set up a use case to illustrate a couple of variants. Two tables with a close relationship should do it:

CREATE TABLE sys_product
(
    product_id   SERIAL  PRIMARY KEY,
    prod_name    TEXT    NOT NULL,
    descr        TEXT    NOT NULL DEFAULT now()
);
 
INSERT INTO sys_product (prod_name, descr)
SELECT 'Product ' || a.id::TEXT, 'It does stuff.'
  FROM generate_series(1, 1000) a(id);
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_order (product_id, item_count)
SELECT (a.id % 100) + 1, (random()*10)::INT + 1
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_product_id
    ON sys_order (product_id);

Now, EXPLAIN comes in many variants, but there are two that directly concern us: EXPLAIN and EXPLAIN ANALYZE. The first will list the estimated cost of the query, which indexes will be used, how many rows are estimated to match, and so on. The second will physically execute the query and obtain the actual match counts along with elapsed time for each step. Here’s what that looks like:

EXPLAIN SELECT * FROM sys_order;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan ON sys_order  (cost=0.00..16370.00 ROWS=1000000 width=20)
 
EXPLAIN ANALYZE SELECT * FROM sys_order;
 
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan ON sys_order  (cost=0.00..16370.00 ROWS=1000000 width=20)
     (actual TIME=0.006..465.956 ROWS=1000000 loops=1)
 Planning TIME: 0.041 ms
 Execution TIME: 886.766 ms

I’ve reformatted the results a bit, but the difference in output should be readily apparent. Interpreting this isn’t even very difficult at a high level. Begin by examining the first set of parenthesis in both results, and ignore the cost for now; it’s essentially meaningless noise intended primarily for the query planner. What concerns us is the row count, which is one-million in this case. It’s how many rows PGDB estimated it would return.

The second set of parenthesis in the ANALYZE version gives the actual values obtained by executing the query. In this case, the query matched exactly one million rows, and took 465ms to do so. If you don’t know already, a Seq Scan is short for Sequence Scan, and means the database read the entire contents of the table. That should be expected since we didn’t provide any WHERE clause to restrict the results.

Pay special attention to the rows section, because it can be a major clue to ferret out performance problems. PGDB keeps statistics on table contents, but these are necessarily sparse and in aggregate form. There’s also currently no way to correlate statistics between tables, so a JOIN can drastically affect row estimates, and since it’s how cost is calculated, can result in wildly different execution plans than the optimal case.

So we know that a Sequence Scan will read the entire table. Hopefully that point by itself is an obvious indication that doing so is generally undesirable. If we are querying a table with 300-million rows, reading its entire contents is almost never the intended approach, nor should it be.

What other variants should we look for? How about an Index Scan:

EXPLAIN ANALYZE
 SELECT * FROM sys_order
  WHERE order_id BETWEEN 5 AND 10;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 INDEX Scan USING pk_order_order_id ON sys_order 
       (cost=0.42..8.53 ROWS=5 width=20)
       (actual TIME=0.008..0.012 ROWS=6 loops=1)
   INDEX Cond: ((order_id >= 5) AND (order_id <= 10))
 Planning TIME: 0.112 ms
 Execution TIME: 0.041 ms

We can see a few things from these results:

  • Estimated row count is not the same as how many actually matched.
  • The rows were found with an Index Scan.
  • Execution time was much faster than the Sequence Scan case.

It’s not critical for estimated and actual row counts to match, but they should be within an order of magnitude of each other. If this isn’t the case, it’s likely the planner has insufficient information about the data, or statistics are wrong, or data is more closely correlated than it assumed, resulting in less eliminations than it expected, and so on. This is one of the first ways to see that there could be a problem with a query’s execution plan.

Regarding the index scan, matching five or six values against a million isn’t a big deal, but there are limits. Indexes in PGDB work, in most cases, by fetching very quickly from the index file and then consulting the base table file to retrieve the matching row. This results in two random seeks to the underlying storage subsystem, and if you don’t know already, random reads are usually slower than a sequential read by an order of magnitude or more. This doesn’t apply for SSD devices, but most databases don’t reside entirely on such expensive hardware just yet.

That’s why the rows element of the EXPLAIN is important in respect to index scans; it’s entirely possible that reading the entire table would be faster than performing several hundred thousand index and table seeks. For the most part, the planner knows this, but calculating this relationship is still more of an art than a science. Consequentially, sometimes the planner guesses wrong, and query performance will suffer. By using EXPLAIN, we can see what happened, and find a way around the problem.

Let’s turn this into a JOIN to make it a little more fun:

EXPLAIN ANALYZE
 SELECT o.*
   FROM sys_order o
   JOIN sys_product p USING (product_id)
  WHERE p.product_id IN (5, 10, 20);
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Nested Loop  (cost=188.87..15715.29 ROWS=3614 width=20)
              (actual TIME=2.317..66.377 ROWS=30000 loops=1)
   ->  INDEX ONLY Scan USING sys_product_pkey ON sys_product p
             (cost=0.28..24.88 ROWS=3 width=4)
             (actual TIME=0.033..0.056 ROWS=3 loops=1)
         INDEX Cond: (product_id = ANY ('{5,10,20}'::INTEGER[]))
         Heap Fetches: 3
   ->  Bitmap Heap Scan ON sys_order o
              (cost=188.59..5130.14 ROWS=10000 width=20)
              (actual TIME=2.105..13.037 ROWS=10000 loops=3)
         Recheck Cond: (product_id = p.product_id)
         Heap Blocks: exact=19107
         ->  Bitmap INDEX Scan ON idx_order_product_id
               (cost=0.00..186.09 ROWS=10000 width=0)
               (actual TIME=1.170..1.170 ROWS=10000 loops=3)
               INDEX Cond: (product_id = p.product_id)
 Planning TIME: 0.274 ms
 Execution TIME: 78.834 ms

Wow! Luckily we can ignore most of this output by focusing on the Nested Loop at the top. This is really just a fancy way of saying “For Loop”. This is also the outermost level of the query plan, as EXPLAIN output is somewhat inverted. Each nesting level is a prerequisite step to construct the layer above it, somewhat like an onion.

If we look at the loops section in the actual execution information for each of these blocks, we can see that the loop really ran three times (the nested entries show loops=3), even though it says it only ran once. The distinction here is that the outermost loop ran once, and it is somewhat misleading.

But basically PGDB fetched three rows from the product table, and looped through memory using those values to find corresponding data in the order table. And as you might imagine, this is fine for small numbers of loops. There are occasions where PostgreSQL will underestimate row counts and opt for a Nested Loop, when really there are hundreds of thousands or millions of potential loop candidates, and it will use the same process as we saw above.

In such cases, execution time is extremely adversely impacted. Why? Nested loops and other types of loop are computationally expensive. As far as the database is concerned, it’s almost always faster to create an in-memory object of some kind and perform a bulk operation on all values at once, such as a comparison or a transformation. Nested loops make this impossible, and impose a series of calculations and function callbacks on each individual row set in O(n*m) scale. For small loop counts, this is often cheaper than building those afore-mentioned memory representations, but scale is vastly important.

Here’s an example of what I mean, and we don’t even need the whole EXPLAIN output to see the problem:

EXPLAIN ANALYZE
 SELECT o.*
   FROM sys_order o
   JOIN sys_product p USING (product_id)
  WHERE p.product_id BETWEEN 6 AND 10;
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Nested Loop  (cost=188.20..17588.55 ROWS=4000 width=20)
              (actual TIME=2.542..105.528 ROWS=50000 loops=1)

This should be ringing alarm bells. Note that PGDB expected 4000 rows, and actually matched 50,000. At this scale that isn’t really a major issue, but expand these tables by a couple orders of magnitude, and that can quickly change. I’ve seen several queries where the estimated row count is five orders of magnitude less than the actual results. Disk fetches are exceptionally expensive, and multiplying random read times by a few thousand adds up. If a query is using a nested loop is being slow, always check the statistics based on EXPLAIN output; you may need to revise your query to compensate.

Those are the Big Three in a nutshell. Even if you know nothing else about PGDB, how it plans or executes queries, or even databases in general, try to examine the execution path of your queries whenever possible. This is especially true if you have a query that’s unexpectedly slow. Being cognizant of potential issues prevents malformed queries or structures from worming their way permanently into a production application, and that’s something we can all celebrate.

Happy querying.


Tags: , , , ,

PG Phriday: Functions and Performance Attributes

May 8th, 2015 | Published in Database, Tech Talk | No Comments


Users both old and new are likely aware that PGDB has functions. Some lucky readers may have even written one or two. For those without much experience in this area, or are thinking about contributing a few functions to your database for the first time, there are a couple things you should know. This week, I want to cover a few of the function declaration decorators. If you have no idea what I’m talking about, then this is probably something you’ll want to read regardless of your experience level.

Let’s begin with a very basic function that, as usual, does nothing:

CREATE OR REPLACE FUNCTION f_do_nothing(myvar INT)
RETURNS INT AS
$$
BEGIN
    RETURN myvar;
END;
$$ LANGUAGE plpgsql;
 
\timing ON
 
SELECT f_do_nothing(1)
  FROM generate_series(1, 1000000) AS s (id);

Now, hopefully we can agree that a function with one parameter which returns the contents of the parameter and does no work, is pretty simple. Calling this function one million times in a test VM required about two seconds. Some astute readers might point out that I could have used the SQL language instead of PL/pgSQL, and they’d be absolutely right. Doing so would reduce run time to about 600ms for this example.

Fortunately, the decorators I’ll be discussing will be much more powerful than even switching to a much simpler and subsequently less versatile language. As of PostgreSQL 9.4, there are four that will actually impact performance: VOLATILE, STABLE, STRICT, and IMMUTABLE.

The VOLATILE hint isn’t strictly necessary, as it’s the default. Effectively it means that the function can’t be optimized at all, as its effects can vary wildly based on table contents and parameters, even during execution. This causes PostgreSQL to invoke the function each time it’s used, regardless of what the function is actually doing. For our function, that’s clearly a detrimental approach. So what other hints do we have available?

The STABLE attribute is a bit more stringent. The implication in this case, is that the function can not modify the contents of the database in any way. Within the context of a transaction, this means the function can be semi-optimized by preventing extra calls because the data should not change. For this particular example, the STABLE keyword reduced run-time by about ten percent, to about 1.7 seconds on average. Here’s what that looks like:

CREATE OR REPLACE FUNCTION f_do_nothing(myvar INT)
RETURNS INT AS
$$
BEGIN
    RETURN myvar;
END;
$$ LANGUAGE plpgsql STABLE;
 
\timing ON
 
SELECT f_do_nothing(1)
  FROM generate_series(1, 1000000) AS s (id);

After this comes the IMMUTABLE keyword, which takes the optimization one step further. This tells the planner that the function will always return the same result for the same parameters. Due to that guarantee, function calls can tacitly be replaced by the the cached results where that benefits the planner. One way to look at this is by examining the performance:

CREATE OR REPLACE FUNCTION f_do_nothing(myvar INT)
RETURNS INT AS
$$
BEGIN
    RETURN myvar;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
 
\timing ON
 
SELECT f_do_nothing(1)
  FROM generate_series(1, 1000000) AS s (id)

The IMMUTABLE decorator reduced run time to 400ms in this case, but we don’t really know how many times the function was actually called. Just because a function call can be replaced by its results, doesn’t mean that will happen. Let’s modify the situation slightly by adding an explicit pause in the code:

CREATE OR REPLACE FUNCTION f_do_nothing(myvar INT)
RETURNS INT AS
$$
BEGIN
    PERFORM pg_sleep(1);
    RETURN myvar;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
 
\timing ON
 
SELECT f_do_nothing(1), f_do_nothing(1);

If your system is like mine, this example ran for two seconds, despite the fact that PostgreSQL could have replaced the second call with the results of the first. This seems counter-intuitive, but the real power of the IMMUTABLE keyword is not simply in the replacement structure, but in the parameter/result equivalence. It’s a fundamental misunderstanding that the database must prevent excess function calls, though that would be ideal. Let’s modify the code again to make the situation more explicit:

CREATE OR REPLACE FUNCTION f_do_nothing_imm(myvar INT)
RETURNS INT AS
$$
BEGIN
    RAISE NOTICE 'IMMUTABLE RAN!';
    RETURN myvar;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
 
CREATE OR REPLACE FUNCTION f_do_nothing_vol(myvar INT)
RETURNS INT AS
$$
BEGIN
    RAISE NOTICE 'VOLATILE RAN!';
    RETURN myvar;
END;
$$ LANGUAGE plpgsql VOLATILE;
 
SELECT 1
  FROM generate_series(1, 5) AS s (id)
 WHERE f_do_nothing_imm(10) = 10
   AND f_do_nothing_vol(10) = 10;

And the NOTICE output in this case is very telling:

NOTICE:  IMMUTABLE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  VOLATILE RAN!

While it seems that the SELECT section being optimized is somewhat arbitrary, the WHERE portion appears to have been reduced appropriately. It all comes down to how the planner decides to simplify the execution plan of the query, and the conditionals in the WHERE clause are much more likely to experience substitution.

Finally comes the STRICT decorator. It goes one step further than IMMUTABLE and assumes that a function with any NULL parameters will also return NULL. Again, this is an optimization where the planner can completely substitute a function call with NULL, thereby not only removing the function execution itself, but avoiding the requisite function parameter and return value mapping. Its nature also means it can be combined with the other keywords, like so:

CREATE OR REPLACE FUNCTION f_do_nothing(myvar INT)
RETURNS INT AS
$$
BEGIN
    RETURN myvar;
END;
$$ LANGUAGE plpgsql VOLATILE STRICT;
 
\timing ON
 
SELECT f_do_nothing(NULL)
  FROM generate_series(1, 1000000) AS s (id);

This time, even though the function is clearly marked VOLATILE, it doesn’t execute even once. The run time is merely the time required to generate and return one million values.

Now, regardless of power these keywords provide, there are some important caveats that apply. When we used notices to view the call path of our functions, the planner did what we expected, and prevented extra calls to the immutable version. So let’s try that example one more time with a variable in place and examine those notices again:

SELECT 1
  FROM generate_series(1, 5) AS s (id)
 WHERE f_do_nothing_imm(s.id % 2) = s.id % 2
   AND f_do_nothing_vol(s.id % 2) = s.id % 2;
 
NOTICE:  VOLATILE RAN!
NOTICE:  IMMUTABLE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  IMMUTABLE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  IMMUTABLE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  IMMUTABLE RAN!
NOTICE:  VOLATILE RAN!
NOTICE:  IMMUTABLE RAN!

Doh! Though the immutable function should have only run twice, we see that it ran just as often as the volatile version. It would appear that the function declaration documentation clears this up (emphasis mine):

If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.

The problem in this case, is that the PostgreSQL planner automatically considers variables as their worst-case scenario: any number of infinite values could be substituted here. Even though we know only two of the five values were passed to the function, the PostgreSQL does not until execution time. Since the planner is determining how IMMUTABLE affects execution, that keyword is ignored for dynamic parameters.

With that said, the IMMUTABLE keyword exists beyond the context of optimization. In Set Theory, functions must be deterministic. Thus for indexes to operate as expected, functions must always return the same values when given the same inputs. Since PostgreSQL supports functional indexes, this means that only IMMUTABLE functions are candidates for providing indexable values.

This is also why the substitution process used by the planner seems arbitrary from an outside perspective; we’re not privy to the entire decision tree that generates the final execution plan. In many cases, either multiple function executions can not be reduced, or the cost and complexity of doing so would be computationally prohibitive.

In the end, it comes down to opportunity cost. By using these decorators, we give PostgreSQL as much information as possible to make decisions that affect the query execution plan. Consequentially, while we may reduce query run time, we do increase the amount of possible function candidates for table indexing. This last part is especially important, because such indexes do reduce the computational overhead of executing the function on the indexed value.

As the query planner improves over successive generations, these decorators only get more powerful. Try to use them where appropriate, and you can only improve the situation.


Tags: , , ,

« Older Posts

Newer Posts »