PG Phriday: In the Window

I’ll be the first to admit that I found Postgres window functions fantastically confusing when I first encountered them. They’re a powerful and versatile tool for building reports and summaries, but that functionality hides behind a fairly steep learning curve. One of the ways to combat their inherent complexity is to fully explore how they work, instead of just trying to wrap our heads around the expected results.

Window doggies have gotten decidedly smug

Window doggies have gotten decidedly smug

To that end, let’s set up a quick set of data in the customary fashion:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE + INTERVAL '1d' - ((s.id * 10) || 'm')::INTERVAL
  FROM generate_series(1, 5000) s(id);

CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

ANALYZE sensor_log;

Yes, it’s the trusty sensor_log table once again. This time around, we only really care about demonstration in lieu of volume, so we’ve elected for five thousand rows in place of the usual five million. Our data represents one thousand sensors sequentially taking readings every minute. This should provide enough overlap to easily demonstrate what’s going on behind the scenes.

Let’s start with probably the easiest window function of the lot: row_number. All it does is number rows in the result set so we have a kind of counter that’s useful in a lot of different contexts. How do the first ten rows for today look?

SELECT location, reading, row_number() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;

 location | reading | row_number 
----------+---------+------------
 143      |      43 |          1
 142      |      42 |          2
 141      |      41 |          3
 140      |      40 |          4
 139      |      39 |          5
 138      |      38 |          6
 137      |      37 |          7
 136      |      36 |          8
 135      |      35 |          9
 134      |      34 |         10

Window functions must be called on some kind of data window. An empty set of () represents the entire data set, with no ordering, groups, or other shenanigans involved. We’re just numbering the results, and the output would have been no different if we removed the window function. This kind of use is very similar to Oracle’s ROWNUM pseudo-column.

Yet a row number by itself isn’t that interesting. Let’s number the rows in order of sensor reading, and fetch the first ten rows of those results:

SELECT location, reading, row_number() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;

 location | reading | row_number 
----------+---------+------------
 100      |       0 |          1
 101      |       1 |          2
 1        |       1 |          3
 102      |       2 |          4
 2        |       2 |          5
 103      |       3 |          6
 3        |       3 |          7
 4        |       4 |          8
 104      |       4 |          9
 105      |       5 |         10

If () was the window representing all rows, then (ORDER BY reading) is that same content after being sorted by the reading column. Not only did Postgres sort our results, but it numbered them in the post-sorted order. This is a very fine distinction! Consider what happens when we move the ORDER BY clause into the query proper.

SELECT location, reading, row_number() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;

 location | reading | row_number 
----------+---------+------------
 100      |       0 |         44
 1        |       1 |        143
 101      |       1 |         43
 2        |       2 |        142
 102      |       2 |         42
 3        |       3 |        141
 103      |       3 |         41
 104      |       4 |         40
 4        |       4 |        140
 105      |       5 |         39

What the heck happened here? The data looks exactly the same, but the artificial row numbers are seemingly arbitrary. Indeed they are! By design. This is part of the reason window functions are so difficult to explain and comprehend. The fact of the matter is that each window is a virtual and separate manifestation of the plain query results.

Anyone who has struggled with pointers in C or C++ know that abstracted structures introduce certain pitfalls into obtaining desired results.

Window functions and you!

Window functions and you!

To help unravel the mystery a bit, let’s look at the natural state of the results without any window function nonsense. Of course we must also shift the data by 40 rows so we can see some of the same information the window received.

SELECT location, reading
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10 OFFSET 40;

 location | reading 
----------+---------
 103      |       3
 102      |       2
 101      |       1
 100      |       0
 99       |      99
 98       |      98
 97       |      97
 96       |      96
 95       |      95
 94       |      94

This output represents the table rows as sorted by the index we created on reading_date. Since these results are not artificially sorted in any way, this what the window function is actually seeing without its own specific sort operation. We shifted the results by 40 rows and as expected, row 44 has the value of 0. The window function gave us exactly what we requested, but it numbered the rows before Postgres sorted them.

We can actually watch this in action by looking at the two query plans:

EXPLAIN
SELECT location, reading, row_number() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;

                              QUERY PLAN
----------------------------------------------------------------------
 Limit  (cost=15.91..16.08 rows=10 width=11)
   ->  WindowAgg  (cost=15.91..18.41 rows=143 width=11)
         ->  Sort  (cost=15.91..16.27 rows=143 width=11)
               Sort Key: reading
               ->  Index Scan using idx_sensor_log_date on sensor_log
                     (cost=0.29..10.79 rows=143 width=11)
                   Index Cond: (reading_date > ('now'::cstring)::date)

EXPLAIN
SELECT location, reading, row_number() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;

                              QUERY PLAN
----------------------------------------------------------------------
 Limit  (cost=15.67..15.69 rows=10 width=11)
   ->  Sort  (cost=15.67..16.03 rows=143 width=11)
         Sort Key: reading
         ->  WindowAgg  (cost=0.29..12.58 rows=143 width=11)
               ->  Index Scan using idx_sensor_log_date on sensor_log
                     (cost=0.29..10.79 rows=143 width=11)
                   Index Cond: (reading_date > ('now'::cstring)::date)

Note that the WindowAgg step occurs at the end of query execution in the first example, taking place directly after a sort operation. The second query sorts after the WindowAgg, indicating the window only has access to unsorted rows. The key detail is that window functions only have access to rows within the query as if it had executed without them. This also happens after other aggregates are applied, meaning it’s a bad idea (or even impossible) to mix regular aggregates with window functions.

The easiest way to comprehend how a window function works is to run the query without them. That’s the data the window has access to, regardless of how we slice and dice them within the window itself. It also explains why we’re unable to refer to window function elements in other query clauses. They’re unusable in predicates and we can’t leverage their calculations to group or limit results.

So imagine for a moment we don’t have the standard Postgres LIMIT clause. This is how we would snag the top ten results of our location readings:

SELECT *
  FROM (SELECT location, reading,
               row_number() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE row_number <= 10;

 location | reading | row_number 
----------+---------+------------
 100      |       0 |          1
 101      |       1 |          2
 1        |       1 |          3
 102      |       2 |          4
 2        |       2 |          5
 103      |       3 |          6
 3        |       3 |          7
 4        |       4 |          8
 104      |       4 |          9
 105      |       5 |         10

This is clearly silly when using row_number, but the trick works the same with other window functions. Here’s how we’d obtain the 10th ranked readings for today’s data:

SELECT *
  FROM (SELECT location, reading,
               dense_rank() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE dense_rank = 10;

 location | reading | dense_rank 
----------+---------+------------
 109      |       9 |         10
 9        |       9 |         10

Keep in mind that Postgres must fetch the full results internally to materialize them for the window functions. This is true whether there are 143 rows as with our example, or 143-million.

The more advanced use cases for window functions are a topic for another day. Consider this a very high-level introduction to how they work and their inherent limitations instead of a comprehensive guide. There’s a lot of material here that deserves closer inspection, so there’s no need to rush. Either way, don’t let window functions confuse you more than necessary. Like any independent agent, you just need to know what they’re doing behind the scenes.

Like this, but with slightly fewer bombs

Like this, but with slightly fewer bombs