Not a lot of people remember what Postgres was like before version 8.4. In many ways, this was the first “modern” release of the database engine. CTEs, Window Functions, column level permissions, in-place upgrade compatible with subsequent versions, collation support, continuous query statistic collection; it was just a smorgasbord of functionality.

Of these, CTEs or Common Table Expressions, probably enjoy the most user-level exposure; for good reason. Before this, there was no way to perform a recursive query in Postgres, which really hurts in certain situations. Want to display all related child threads in an online discussion? How about fetching the components of an organization chart by following management assignments? Better get ready for a lot of queries in a loop.

In addition to that, complicated queries were difficult to logically simplify. Reporting queries are especially prone to frequent sequences of aggregates and subqueries. It’s not uncommon to build a query that’s several pages long in this kind of context. Optimizing such an unwieldy beast is often difficult or even impossible simply due to all of the components and confusing nesting.

CTEs changed these things for the better and in the eyes of many, finally brought Postgres to parity with Oracle and its long-established recursive query support. So let’s explore what CTEs really deliver, and how they can improve our Postgres experience—caveats and all.

Let’s start with a trivial table and some data:

CREATE TABLE employee 
(
  employee_id  SERIAL PRIMARY KEY,
  full_name    VARCHAR NOT NULL,
  manager_id   INT REFERENCES employee
);
 
INSERT INTO employee (full_name, manager_id) VALUES
  ('King Randor', NULL),
  ('Prince Adam', 1),
  ('Teela', 2),
  ('Man-at-Arms', 2),
  ('Skeletor', NULL),
  ('Evil-Lyn', 5),
  ('Trap Jaw', 5),
  ('Clawful', 6);

It’s easy enough to display the management relationships. Here’s how our cartoon cohorts look with a basic JOIN:

SELECT m.full_name AS boss, e.full_name AS goon
  FROM employee e
  JOIN employee m ON (m.employee_id = e.manager_id)
 ORDER BY e.manager_id;
 
    boss     |    goon     
-------------+-------------
 King Randor | Prince Adam
 Prince Adam | Teela
 Prince Adam | Man-at-Arms
 Skeletor    | Evil-Lyn
 Skeletor    | Trap Jaw
 Evil-Lyn    | Clawful

In this trivial example, we can visually follow the results and understand that Clawful is ultimately a minion of Skeletor. We could also leverage our knowledge that the organization chart is only three levels deep and employ a third join to fully represent all relationships. But such a shallow corporate hierarchy is exceedingly rare, so let’s use a CTE to flush out the table instead.

WITH RECURSIVE org_tree AS (
    SELECT NULL::VARCHAR AS boss, *, 0 AS level,
           employee_id AS end_boss
      FROM employee
     WHERE manager_id IS NULL
    UNION ALL
    SELECT t.full_name AS boss, e.*, t.level + 1 AS level,
           t.end_boss
      FROM employee e
      JOIN org_tree t ON (t.employee_id = e.manager_id)
)
SELECT repeat(' ', level * 5) || full_name AS relationship
  FROM org_tree
 ORDER BY end_boss, level;
 
     relationship      
-----------------------
 King Randor
      Prince Adam
           Teela
           Man-at-Arms
 Skeletor
      Trap Jaw
      Evil-Lyn
           Clawful

Well that’s quite an improvement! But how does it work?

Our initial clue is the first query within the CTE. Other databases may do this differently, but Postgres creates a temporary in-memory table to act as a holding area to represent the CTE contents as they’re constructed. When we specify the RECURSIVE decorator, we gain the ability to bootstrap that temporary data with one query. The second query can then refer to the cumulative result in each iteration of the recursion.

The result is one query that loops in on itself three times in our example. We took advantage of this by adding a new column to track how deep the recursion is so we can visualize this more easily. Here’s what the contents of the “tree” table look like for each phase:

WITH RECURSIVE tree AS (
    SELECT NULL::VARCHAR AS boss, *, 0 AS level
      FROM employee
     WHERE manager_id IS NULL
    UNION ALL
    SELECT t.full_name AS boss, e.*, t.level + 1 AS level
      FROM employee e
      JOIN tree t ON (t.employee_id = e.manager_id)
)
SELECT * FROM tree;
 
    boss     | employee_id |  full_name  | manager_id | level 
-------------+-------------+-------------+------------+-------
             |           1 | King Randor |            |     0
             |           5 | Skeletor    |            |     0
 King Randor |           2 | Prince Adam |          1 |     1
 Skeletor    |           6 | Evil-Lyn    |          5 |     1
 Skeletor    |           7 | Trap Jaw    |          5 |     1
 Prince Adam |           3 | Teela       |          2 |     2
 Prince Adam |           4 | Man-at-Arms |          2 |     2
 Evil-Lyn    |           8 | Clawful     |          6 |     2

Each “level” here represents one dive into the employee table to fetch employees of the employees already listed. This loop naturally terminates once every boss is listed in the results. But there’s one flaw in this particular construction: what if we wanted to choose any grunt and see the whole chain of command from that point? To do that, we need to modify the CTE slightly to incorporate our desired predicate in the CTE portion itself so we can follow the relationship properly.

Here’s how that looks:

WITH RECURSIVE tree AS (
    SELECT *, 0 AS level
      FROM employee
     WHERE full_name = 'Clawful'
    UNION ALL
    SELECT e.*, t.level + 1 AS level
      FROM tree t
      JOIN employee e ON (e.employee_id = t.manager_id)
)
SELECT full_name
  FROM tree
 ORDER BY level DESC;
 
 full_name 
-----------
 Skeletor
 Evil-Lyn
 Clawful

Not bad, eh? We had to flip the JOIN because we started with a specific minion instead of the list of all executives. Then we followed the chain backwards, adding one middle-management peon per iteration until we reached the End Boss. We could combine this kind of trickery by writing a CTE that refers to another CTE and produce a query that would output the entire organization given any member in the hierarchy. We won’t, because that’s a gigantic and rather ugly query, but the capability is there.

What we can do, is demonstrate using CTEs to logically separate query fragments of a larger whole. In the past, a reporting query might consist of an imposing bulk of awkward subqueries to produce necessary aggregates and decode or label various summaries. In the worst cases, such queries might meander for dozens of pages. It’s often a miracle the end result executes at all, and debugging it is equally problematic.

Here’s how we might use CTEs to solve that conundrum:

WITH RECURSIVE org_tree AS (
    SELECT NULL::VARCHAR AS boss, *, 0 AS level,
           employee_id AS end_boss
      FROM employee
     WHERE manager_id IS NULL
    UNION ALL
    SELECT t.full_name AS boss, e.*, t.level + 1 AS level,
           t.end_boss
      FROM employee e
      JOIN org_tree t ON (t.employee_id = e.manager_id)
),
org_stats AS (
  SELECT m.full_name AS ceo, COUNT(*)-1 AS minions,
         MAX(level) AS cruelty
    FROM org_tree org
    JOIN employee m ON (m.employee_id = org.end_boss)
   GROUP BY m.full_name
),
org_attributes AS (
  SELECT m.full_name AS ceo,
         SUM(1) FILTER (WHERE org.full_name ILIKE '%evil%') AS evil,
         SUM(1) FILTER (WHERE org.full_name ILIKE '%prince%' OR
                              org.full_name ILIKE '%king%') AS royalty
    FROM org_tree org
    JOIN employee m ON (m.employee_id = org.end_boss)
   GROUP BY m.full_name
)
SELECT st.*, atr.evil, atr.royalty
  FROM org_stats st
  JOIN org_attributes atr USING (ceo);
 
     ceo     | minions | cruelty | evil | royalty 
-------------+---------+---------+------+---------
 King Randor |       3 |       2 |      |       2
 Skeletor    |       3 |       2 |    1 |

The first portion of the query is just our previous recursive attempt to flatten the organization chart and see how everything is related. The second summarizes basic statistics like employee count and maximum abstraction through middle-management. The third is just a bunch of miscellaneous attributes that might be interesting in a report. All of our examples are trivial, but in a real report, each of these may reflect much more comprehensive aggregates and formulas. Yet despite query complexity, we can determine the end goal of a fragment at a glance. Combine this with SQL comments, and we have a very user-friendly report.

Of course, CTEs are not all sunshine and roses. Remember when we said a CTE is built in a temporary memory location to facilitate recursive functionality and allow CTEs to reference each other? A consequence is that every CTE acts as what we call an optimization fence.

Normally before a query is executed, it is broken down into its component parts and the planner translates those elements into execution instructions. This might mean collapsing certain conditionals, simplifying or substituting a subquery, pushing predicates down into a stack for better row elimination, and so on.

When the planner encounters a CTE however, it can go no further. It will optimize the CTE query itself, but it does so as an encapsulated black box. Even if a WHERE clause from the referring query could greatly reduce matched rows during the CTE execution, that optimization cannot be applied. The CTE executes as written as if we had done this instead:

CREATE TEMP TABLE my_cte_chunk AS
SELECT ...

This applies to every CTE in a query. It’s better to think of each CTE as a virtual temporary table. While that allows each CTE to refer to the entire contents of another CTE, it also means we may lose several opportunities to optimize a query. It’s not uncommon to unroll a CTE and receive a much faster query in return. Query planners are complex beasts, and like any software compiler, may simplify necessary instructions by eliminating entire branches from the execution tree due to redundancy or empty result paths. Using a CTE reduces the planner’s ability to do that.

On the other hand, an experienced user can leverage this knowledge to their benefit. Since the query planner cannot penetrate optimization fences, it means we can override its decision tree. When the data or statistics indicate the planner will improperly prefer a highly inefficient plan, we can force it along an improved path. In these cases, we’re actively trading the potential for future planner improvements for immediate advantage.

The primary argument here is that the planner improvements we need may not arrive for years, or at all. Can we justify suffering bad performance for an undetermined length of time until some nebulous future planner addresses our obscure data edge case? Often the answer to this question is ‘no’. In the rare instances where this justification applies, leveraging optimization fences is probably a safe bet. At least we have the option!

In the end, Postgres improved its reputation among power users, and we gained a versatile tool that enabled the previously impossible. New recursion, simplification, and optimization options, all from a single feature? Yes, please!

PG Phriday: Everything in Common
Tagged on:                 

5 thoughts on “PG Phriday: Everything in Common

  • Users need to remember that CTEs are not just an optimization fence, they also force materialization of every CTE expression. That’s quite different from what other fences (such as OFFSET 0) do. If you have a query producing a large number of rows, a CTE can be MUCH more expensive that using OFFSET 0.

    That’s why I always recommend using subselects in the FROM clause for organizing query code. It’s not as pretty, but gives the optimizer the best chance of producing a good plane. If you do need to over-ride something, you can then insert OFFSET 0 fences as necessary.

    1. Very true. CTEs aren’t just an optimization fence, they’re an optimization Great Wall. I said it like three times in the article, but the fact that CTEs exist and refer to each others’ contents means they must be materialized.

      Of course, that’s sometimes the desired result. Though I’ll admit that I prefer larger CTE chains to be converted to temp or unlogged tables instead. At least that way, each individual step can be indexed or analyzed where necessary to make the entire chain more efficient.

  • Hi Shaun

    Per your publisher, the book “PostgreSQL High Availability Cookbook – Second Edition” is due Jan 2017. Do you know when it will be available.

    regards.

    1. Not sure. I just got the semi-final proofs back on Monday, and I just sent back my corrections last night. I’m more inclined to believe that it’ll come out early February, but they might rush it to make the deadline.

Comments are closed.