PG Phriday: Stuck in the Middle with Postgres

Earlier this year, I implied Postgres was some kind of super middleware for dragging data out of every external resource it could locate. But that example only used the Postgres foreign data wrapper to contact another Postgres server. Why be so unimaginative? The future is as unlimited as it is terrifying.

Meet the new Postgres mascot

Let’s start with a few prerequisites. We want to see the combined functionality of a few different Postgres capabilities, so let’s try and combine data from MySQL, Postgres, and Javascript using V8. To do that, we need a couple of easily obtained resources. Debian-based systems can do something like this:

sudo apt-get install postgresql-9.6-plv8 postgresql-9.6-mysql-fdw

Otherwise Postgres packages are available on the Postgres site or as some other name in your favorite distribution. All we need for this is some kind of MySQL install, and the MySQL and V8 Postgres extensions. More ambitious readers can try applying these techniques to SQL Server as well.

Now imagine we have a marketing infrastructure with data strewn all over the world, in a vast assortment of incompatible or legacy apps, each with its own portion of the data we want to see. Not an ideal situation really, and one normally solved by ETL to pull all of the data into a single location for perusal. But transformation takes time, and we’re exceptionally lazy. Instead of all that extract, transform, and load action, let’s just query the data directly.

We can get started by creating a basic market MySQL table with two measly rows for demonstration purposes:

CREATE TABLE market_data (
  ext_root   VARCHAR(10) PRIMARY KEY,
  trader     VARCHAR(20) NOT NULL,
  stock      VARCHAR(10) NOT NULL,
  action     CHAR(1) NOT NULL,
  price      NUMERIC(10,4) NOT NULL,
  xact_date  TIMESTAMP
);
 
INSERT INTO market_data (ext_root, trader, stock, action, price)
       VALUES ('IZ724JJ5:3', 'bsmith', 'GOOG', 'b', 800);
INSERT INTO market_data (ext_root, trader, stock, action, price)
       VALUES ('IZ724JJ5:4', 'bsmith', 'GOOG', 's', 842);

Easy, right? The rows should automatically have a timestamp applied as if they’re part of a rapidly accumulating feed. This data isn’t normalized because that would greatly complicate the data model. For the purposes of this activity, that’s not really necessary. If it helps, imagine this is just a view on top of all of the relevant tables and flattens the data to something an external resource can easily consume.

The next thing we need is some data in a Postgres database, because why not? Consider this on some server in a far-away land filled with unicorns and butterflies cavorting with innocent children in a field of daisies. It’s definitely not in our datacenter!

CREATE TABLE trade_calc (
  ext_root  VARCHAR(10) PRIMARY KEY,
  stdev     NUMERIC NOT NULL,
  mean      NUMERIC NOT NULL,
  delta     NUMERIC NOT NULL
);
 
INSERT INTO trade_calc VALUES ('IZ724JJ5:3', 13.7, 825, 11.3);
INSERT INTO trade_calc VALUES ('IZ724JJ5:4', 8.5, 832, 1.5);

In this case, the table contains supplementary information regarding the trade. Maybe an application is generating various calculations derived from the trade and storing it in a local resource for later consumption. Regardless, we’ve retained the external root identifier used to tie the rows together, which is a common feature of market data.

The next step is to introduce some magical binding substance of some kind.

Why not?

We’ll be installing all of these wonderful resources in a Postgres database dedicated specifically for that purpose. One with user accounts and grants in a centralized location where everyone and his pet fish can access and aggregate data stored there. The MySQL portion of our tentacle would look and function something like this:

CREATE EXTENSION mysql_fdw;
 
CREATE SERVER mysql_market
  FOREIGN DATA WRAPPER mysql_fdw
  OPTIONS (host 'localhost');
 
CREATE USER MAPPING FOR postgres
  SERVER mysql_market
  OPTIONS (username 'root');
 
CREATE SCHEMA mysql;
 
IMPORT FOREIGN SCHEMA market LIMIT TO (market_data)
  FROM SERVER mysql_market
  INTO mysql;
 
SELECT * FROM mysql.market_data;
 
  ext_root  | trader | stock | action |  price   |      xact_date      
------------+--------+-------+--------+----------+---------------------
 IZ724JJ5:3 | bsmith | GOOG  | b      | 800.0000 | 2017-03-10 15:09:32
 IZ724JJ5:4 | bsmith | GOOG  | s      | 842.0000 | 2017-03-10 15:09:45

One foreign MySQL resource allocated and tested. Check!

Without a readily available node.js or REST interface to contact, we can simulate polling some kind of JSON resource with the PLV8 language. This a good way to demonstrate tying elements to Postgres using something other than SQL. It also gives us a way to grab data for situations where no foreign data wrapper exists and we’re not confident enough to create one.

In this case, we’re calling a V8 function which gets the data through some dubious means and sends it back to us:

CREATE EXTENSION plv8;
 
CREATE SCHEMA js;
SET search_path TO js;
 
CREATE TYPE clearing_data AS (
  ext_root   VARCHAR(10),
  market     VARCHAR(10),
  bank       VARCHAR(10),
  is_funded  BOOLEAN
);
 
CREATE OR REPLACE FUNCTION get_clearing_info()
RETURNS SETOF clearing_data AS
$$
  plv8.return_next({
    "ext_root": "IZ724JJ5:3", "market": "NASDAQ",
    "bank": "Chase", "is_funded": FALSE
  });
  plv8.return_next({
    "ext_root": "IZ724JJ5:4", "market":
    "NASDAQ", "bank": "Citi", "is_funded": TRUE
  });
$$ LANGUAGE plv8;
 
CREATE VIEW all_clearing_data AS
SELECT * FROM get_clearing_info();
 
SELECT * FROM js.all_clearing_data;
 
  ext_root  | market | bank  | is_funded 
------------+--------+-------+-----------
 IZ724JJ5:3 | NASDAQ | Chase | f
 IZ724JJ5:4 | NASDAQ | Citi  | t

Success! Hard-coded data isn’t exactly robust, but demos are open season, and hey look over there! In a real scenario, we’d probably have a function that accepted parameters and performed a pull from some external resource. The view is just a way of hiding the function call for users who might have their head explode to know they can select rows from a function.

Lastly we need to contact that Postgres server out in the magical field of unicorns, butterflies, and candy canes. What, I didn’t mention candy last time? Well there’s candy now. Deal with it.

CREATE EXTENSION postgres_fdw;
 
CREATE SERVER pgsql_market
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (dbname 'market', host 'localhost');
 
CREATE USER MAPPING FOR postgres
  SERVER pgsql_market
  OPTIONS (USER 'postgres');
 
CREATE SCHEMA pgsql;
 
IMPORT FOREIGN SCHEMA PUBLIC LIMIT TO (trade_calc)
  FROM SERVER pgsql_market
  INTO pgsql;
 
SELECT * FROM pgsql.trade_calc;
 
  ext_root  | stdev | mean | delta 
------------+-------+------+-------
 IZ724JJ5:3 |  13.7 |  825 |  11.3
 IZ724JJ5:4 |   8.5 |  832 |   1.5

Essentially that was the same process we used with MySQL, except this time we used the Postgres FDW instead. Either way, it works and returns data exactly as expected.

The final step is to slap on the duct tape and invoke everything at once:

SELECT m.*, t.*
  FROM mysql.market_data m
  JOIN pgsql.trade_calc t USING (ext_root)
  JOIN js.all_clearing_data c USING (ext_root);
 
  ext_root  | trader | stock | action |  price   |      xact_date      |  ext_root  | stdev | mean | delta 
------------+--------+-------+--------+----------+---------------------+------------+-------+------+-------
 IZ724JJ5:3 | bsmith | GOOG  | b      | 800.0000 | 2017-03-10 15:09:32 | IZ724JJ5:3 |  13.7 |  825 |  11.3
 IZ724JJ5:4 | bsmith | GOOG  | s      | 842.0000 | 2017-03-10 15:09:45 | IZ724JJ5:4 |   8.5 |  832 |   1.5

The real power here is in combining this with the external resource gathering techniques we explored in the previously mentioned article. Is the external REST call way too slow? Slap on a materialized view and index the crap out of it. Have a super stable API for an internally developed resource? Write a foreign data wrapper so Postgres can invoke it directly. Then add a materialized view for good measure. Add a refresh schedule for any resources that adversely affect performance for direct inquiries.

What we end up with is a container database that represents 0% of its own data. It’s a resource we can truncate at will, may crash with no consequences, and might exist anywhere since it’s nothing but an interface layer. A Postgres instance designed this way becomes a SQL API to access assorted data feeds through a standardized execution model. Heck, everyone can have their very own.

Were we to attempt this with our favorite language or application stack, we would probably need to write our own data joining techniques, implement set theory, incorporate hashing and sorting algorithms, and so on. In the end, we’d have written a tiny database engine to handle data combination and aggregation, without most capabilities of even a mediocre out-of-the-box solution. In this scenario, Postgres bridges that gap for us.

If Postgres has already done all the hard work, why replicate it? And isn’t that what being lazy is all about?

No coding, only nap

PG Phriday: Through the Window

Now that we know how Postgres window functions work, why not play with them a bit to get a better understanding of their capabilities? So long as we understand window functions are applied after data gathering and aggregation steps, much of their mystery and complexity is defanged. Let’s start actually using them for stuff!

Captain Murphy is tired of your nonsense

(Note: I’m a bit under the weather today, so this Phriday will probably be a bit truncated and potentially incoherent thanks to the drugs. Apologies in advance.)

Let’s start off with the same example data as last week with one tiny alteration. To properly illustrate some of these concepts, we need some actual data variance, so we’ll be using random numbers for readings instead of modulo math.

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, round(random() * 100),
       CURRENT_DATE + INTERVAL '1d' - ((s.id * 10) || 'm')::INTERVAL
  FROM generate_series(1, 5000) s(id);
 
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

First things first: ordering matters. Window functions are either applied over the whole data-set or some partition of it. They’re also cumulative in nature, meaning we can change the aggregation results by altering the window itself. To better see what this means, consider these two sums:

SELECT location, reading, SUM(reading) OVER ()
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE
 ORDER BY location
 LIMIT 10;
 
 location | reading | SUM  
----------+---------+------
 1        |      12 | 7058
 10       |      26 | 7058
 100      |      98 | 7058
 101      |      99 | 7058
 102      |      46 | 7058
 103      |      84 | 7058
 104      |      60 | 7058
 105      |      35 | 7058
 106      |      58 | 7058
 107      |       6 | 7058
 
SELECT location, reading, SUM(reading) OVER (ORDER BY location)
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE
 ORDER BY location
 LIMIT 10;
 
 location | reading | SUM 
----------+---------+-----
 1        |      12 |  12
 10       |      26 |  38
 100      |      98 | 136
 101      |      99 | 235
 102      |      46 | 281
 103      |      84 | 365
 104      |      60 | 425
 105      |      35 | 460
 106      |      58 | 518
 107      |       6 | 524

Interestingly, we actually learned two things here. First, window functions ignore LIMIT clauses. That sum reflects every reading in the table for today. Second is that ordering by the sensor location resulted in a cumulative total for the sum for each row. This applies to all of the available window functions. A maximum would reflect the current maximum for each row until a higher value replaces it. An average is a cumulative average as values are processed, and so on.

This isn’t particularly useful for sums, but what if we wanted to watch our data converge upon the overall average?

SELECT location, reading,
       round(avg(reading) OVER (ORDER BY location), 2) AS running_avg,
       round(reading - 
           avg(reading) OVER (ORDER BY location), 2) AS variance
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE
 ORDER BY location
 LIMIT 10;
 
 location | reading | running_avg | variance 
----------+---------+-------------+----------
 1        |      12 |       12.00 |     0.00
 10       |      26 |       19.00 |     7.00
 100      |      98 |       45.33 |    52.67
 101      |      99 |       58.75 |    40.25
 102      |      46 |       56.20 |   -10.20
 103      |      84 |       60.83 |    23.17
 104      |      60 |       60.71 |    -0.71
 105      |      35 |       57.50 |   -22.50
 106      |      58 |       57.56 |     0.44
 107      |       6 |       52.40 |   -46.40

If we monitored those kinds of results second by second, we could flag any kind of appreciable jitter from the average as currently represented by the data. It may be completely expected for values to increase through the day, so the total average for all data is meaningless to us. Ordering allows us to control the aggregate’s construction in a way that isn’t really possible using other methods, barring some kind of recursive CTE magic.

Postgres’ new slogan

With ordering out of the way, there’s the small detail of data partitioning as well. The manual goes into much more detail about this, but it’s just a fancy way of saying GROUP BY within the window itself.

SELECT location, reading,
       SUM(reading) OVER (PARTITION BY location) AS all_readings,
       rank() OVER (PARTITION BY location ORDER BY reading)
  FROM sensor_log
 WHERE location::INT <= 10
 LIMIT 10;
 
 location | reading | all_readings | rank 
----------+---------+--------------+------
 0        |      32 |          260 |    1
 0        |      35 |          260 |    2
 0        |      37 |          260 |    3
 0        |      71 |          260 |    4
 0        |      85 |          260 |    5
 1        |      12 |          173 |    1
 1        |      14 |          173 |    2
 1        |      17 |          173 |    3
 1        |      44 |          173 |    4
 1        |      86 |          173 |    5

Again, we can learn a few different things from these results. First is that the window results are restricted to the partition we declared. We set the partition to limit sums, ranks, and other window functions to the domain of the location. Postgres will apply window aggregates specifically to each location as it appears in the query output.

Next consider that we partitioned the sum, but did not order it. In case it wasn’t obvious already, this tells us that PARTITION and ORDER BY clauses are independent, as are the window definitions. In this case, we want the reading total for each location, followed by the rank of each reading within that group. If we had ordered the sum window, we would have gotten a running total instead of an overall value. On the other hand, we want the rank to behave in an ordered manner.

Separate windows, separate effects. Of course, we may not actually want that to happen. If we end up using the same window over and over again, it doesn’t make sense to declare it for each column. Thankfully Postgres has a shorthand for that:

SELECT location, reading,
       SUM(reading) OVER (PARTITION BY location) AS all_readings,
       SUM(reading) OVER locs AS running_total,
       rank() OVER locs
  FROM sensor_log
 WHERE location::INT <= 10
WINDOW locs AS (PARTITION BY location ORDER BY reading)
 LIMIT 10;
 
 location | reading | all_readings | running_total | rank 
----------+---------+--------------+---------------+------
 0        |      32 |          260 |            32 |    1
 0        |      35 |          260 |            67 |    2
 0        |      37 |          260 |           104 |    3
 0        |      71 |          260 |           175 |    4
 0        |      85 |          260 |           260 |    5
 1        |      12 |          173 |            12 |    1
 1        |      14 |          173 |            26 |    2
 1        |      17 |          173 |            43 |    3
 1        |      44 |          173 |            87 |    4
 1        |      86 |          173 |           173 |    5

Perfect! Not only did we use the same window twice, but we were able to mix it with another direct declaration within the same query. That isn’t exactly ground-breaking territory, but it’s reassuring when things work as expected.

Keep in mind that the same window function limitations apply to this syntax as well. Since windows are evaluated after WHERE, GROUP BY, or HAVING clauses, the WINDOW portion of a query must appear after them as well.

Now go forth and leverage window functions to produce reports without awkward subqueries and CTEs muddying up the works! The only better way to use a window is to cool pies.

Mmmmm… pie!

PG Phriday: In the Window

I’ll be the first to admit that I found Postgres window functions fantastically confusing when I first encountered them. They’re a powerful and versatile tool for building reports and summaries, but that functionality hides behind a fairly steep learning curve. One of the ways to combat their inherent complexity is to fully explore how they work, instead of just trying to wrap our heads around the expected results.

Window doggies have gotten decidedly smug

To that end, let’s set up a quick set of data in the customary fashion:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE + INTERVAL '1d' - ((s.id * 10) || 'm')::INTERVAL
  FROM generate_series(1, 5000) s(id);
 
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

Yes, it’s the trusty sensor_log table once again. This time around, we only really care about demonstration in lieu of volume, so we’ve elected for five thousand rows in place of the usual five million. Our data represents one thousand sensors sequentially taking readings every minute. This should provide enough overlap to easily demonstrate what’s going on behind the scenes.

Let’s start with probably the easiest window function of the lot: row_number. All it does is number rows in the result set so we have a kind of counter that’s useful in a lot of different contexts. How do the first ten rows for today look?

SELECT location, reading, ROW_NUMBER() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 143      |      43 |          1
 142      |      42 |          2
 141      |      41 |          3
 140      |      40 |          4
 139      |      39 |          5
 138      |      38 |          6
 137      |      37 |          7
 136      |      36 |          8
 135      |      35 |          9
 134      |      34 |         10

Window functions must be called on some kind of data window. An empty set of () represents the entire data set, with no ordering, groups, or other shenanigans involved. We’re just numbering the results, and the output would have been no different if we removed the window function. This kind of use is very similar to Oracle’s ROWNUM pseudo-column.

Yet a row number by itself isn’t that interesting. Let’s number the rows in order of sensor reading, and fetch the first ten rows of those results:

SELECT location, reading, ROW_NUMBER() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 100      |       0 |          1
 101      |       1 |          2
 1        |       1 |          3
 102      |       2 |          4
 2        |       2 |          5
 103      |       3 |          6
 3        |       3 |          7
 4        |       4 |          8
 104      |       4 |          9
 105      |       5 |         10

If () was the window representing all rows, then (ORDER BY reading) is that same content after being sorted by the reading column. Not only did Postgres sort our results, but it numbered them in the post-sorted order. This is a very fine distinction! Consider what happens when we move the ORDER BY clause into the query proper.

SELECT location, reading, ROW_NUMBER() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 100      |       0 |         44
 1        |       1 |        143
 101      |       1 |         43
 2        |       2 |        142
 102      |       2 |         42
 3        |       3 |        141
 103      |       3 |         41
 104      |       4 |         40
 4        |       4 |        140
 105      |       5 |         39

What the heck happened here? The data looks exactly the same, but the artificial row numbers are seemingly arbitrary. Indeed they are! By design. This is part of the reason window functions are so difficult to explain and comprehend. The fact of the matter is that each window is a virtual and separate manifestation of the plain query results.

Anyone who has struggled with pointers in C or C++ know that abstracted structures introduce certain pitfalls into obtaining desired results.

Window functions and you!

To help unravel the mystery a bit, let’s look at the natural state of the results without any window function nonsense. Of course we must also shift the data by 40 rows so we can see some of the same information the window received.

SELECT location, reading
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10 OFFSET 40;
 
 location | reading 
----------+---------
 103      |       3
 102      |       2
 101      |       1
 100      |       0
 99       |      99
 98       |      98
 97       |      97
 96       |      96
 95       |      95
 94       |      94

This output represents the table rows as sorted by the index we created on reading_date. Since these results are not artificially sorted in any way, this what the window function is actually seeing without its own specific sort operation. We shifted the results by 40 rows and as expected, row 44 has the value of 0. The window function gave us exactly what we requested, but it numbered the rows before Postgres sorted them.

We can actually watch this in action by looking at the two query plans:

EXPLAIN
SELECT location, reading, ROW_NUMBER() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;
 
                              QUERY PLAN
----------------------------------------------------------------------
 LIMIT  (cost=15.91..16.08 ROWS=10 width=11)
   ->  WindowAgg  (cost=15.91..18.41 ROWS=143 width=11)
         ->  Sort  (cost=15.91..16.27 ROWS=143 width=11)
               Sort KEY: reading
               ->  INDEX Scan USING idx_sensor_log_date ON sensor_log
                     (cost=0.29..10.79 ROWS=143 width=11)
                   INDEX Cond: (reading_date > ('now'::cstring)::DATE)
 
EXPLAIN
SELECT location, reading, ROW_NUMBER() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;
 
                              QUERY PLAN
----------------------------------------------------------------------
 LIMIT  (cost=15.67..15.69 ROWS=10 width=11)
   ->  Sort  (cost=15.67..16.03 ROWS=143 width=11)
         Sort KEY: reading
         ->  WindowAgg  (cost=0.29..12.58 ROWS=143 width=11)
               ->  INDEX Scan USING idx_sensor_log_date ON sensor_log
                     (cost=0.29..10.79 ROWS=143 width=11)
                   INDEX Cond: (reading_date > ('now'::cstring)::DATE)

Note that the WindowAgg step occurs at the end of query execution in the first example, taking place directly after a sort operation. The second query sorts after the WindowAgg, indicating the window only has access to unsorted rows. The key detail is that window functions only have access to rows within the query as if it had executed without them. This also happens after other aggregates are applied, meaning it’s a bad idea (or even impossible) to mix regular aggregates with window functions.

The easiest way to comprehend how a window function works is to run the query without them. That’s the data the window has access to, regardless of how we slice and dice them within the window itself. It also explains why we’re unable to refer to window function elements in other query clauses. They’re unusable in predicates and we can’t leverage their calculations to group or limit results.

So imagine for a moment we don’t have the standard Postgres LIMIT clause. This is how we would snag the top ten results of our location readings:

SELECT *
  FROM (SELECT location, reading,
               ROW_NUMBER() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE ROW_NUMBER <= 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 100      |       0 |          1
 101      |       1 |          2
 1        |       1 |          3
 102      |       2 |          4
 2        |       2 |          5
 103      |       3 |          6
 3        |       3 |          7
 4        |       4 |          8
 104      |       4 |          9
 105      |       5 |         10

This is clearly silly when using row_number, but the trick works the same with other window functions. Here’s how we’d obtain the 10th ranked readings for today’s data:

SELECT *
  FROM (SELECT location, reading,
               dense_rank() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE dense_rank = 10;
 
 location | reading | dense_rank 
----------+---------+------------
 109      |       9 |         10
 9        |       9 |         10

Keep in mind that Postgres must fetch the full results internally to materialize them for the window functions. This is true whether there are 143 rows as with our example, or 143-million.

The more advanced use cases for window functions are a topic for another day. Consider this a very high-level introduction to how they work and their inherent limitations instead of a comprehensive guide. There’s a lot of material here that deserves closer inspection, so there’s no need to rush. Either way, don’t let window functions confuse you more than necessary. Like any independent agent, you just need to know what they’re doing behind the scenes.

Like this, but with slightly fewer bombs

PG Phriday: Getting Assertive

There are a lot of languages available for authoring Postgres functions, but there’s nothing quite like the the classic PL/pgSQL. It’s SQL! It’s not SQL! It’s a kind of horrifying mutant crossbreed suitable only for terrifying small children and generating complex reports from a cavalcade of dubious sources! And deep within its twisted entrails is an often overlooked feature usually only available in far more mature entities.

Just wait until they add sharding

That’s right, it’s obvious we’re referring to the ASSERT statement.

When we say “often overlooked”, we’re not kidding. The pertinent documentation offers a meager overview following a far more comprehensive summary of the RAISE statement. We thought it deserved better.

Let’s begin with a simple table example that can form the basis of a more complex function:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

What can we do with five million rows of simulated data from 1000 sensors that collect data every ten seconds? A good place to start is a summary or report table of some kind. We’re only doing cursory analysis, so we should begin with aggregating the daily average, minimum, and maximum. If we get adventurous later, we can add standard deviations, jitter, deltas, and other fancy logic report owners might find useful.

Since this type of data is easy to tabulate, we require one single table and a function to manage its contents.

CREATE TABLE sensor_report (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading_date  DATE NOT NULL,
  avg_reading   BIGINT NOT NULL,
  min_reading   BIGINT NOT NULL,
  max_reading   BIGINT NOT NULL
);
 
CREATE INDEX idx_sensor_report_date ON sensor_report (reading_date);
 
CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.
 
  SELECT INTO last_update MAX(reading_date)
    FROM sensor_report;
 
  -- Bootstrap the report table if it's currently empty.
 
  IF last_update IS NULL THEN
    SELECT INTO last_update MIN(reading_date)
      FROM sensor_log;
  END IF;
 
  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.
 
  DELETE FROM sensor_report
   WHERE reading_date >= last_update;
 
  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.
 
  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), MIN(reading), MIN(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT COUNT(*) INTO insert_count
    FROM capture;
 
   RETURN insert_count;
END;
$$ LANGUAGE plpgsql;
 
SELECT sp_trickle_report();
 
 sp_trickle_report 
-------------------
            579000
 
ANALYZE sensor_report;

Thankfully this is a very simple (and somewhat naive) reporting function. Imagine this is our first iteration and we want to debug some of its activity. If we were using Python for example, we might pepper the code with assert statements or craft specific tests accomplish that same goal. Well PL/pgSQL allows us to do just that.

For the second iteration, let’s assume we want to avoid bootstrapping the report table if it’s empty. After all, bootstrapping only needs to be done once, so why pollute the function with that edge case? Besides, it’s entirely possible we might require a separate procedure to initialize the reporting table, and isolated functionality is generally less buggy.

If the table isn’t bootstrapped, it’s probably because we’re in a testing environment that is missing data. In this case, it makes sense to complain so the missing data is loaded, even though it’s not strictly an erroneous scenario. So let’s replace the bootstrap with an assertion. While we’re at it, let’s say a new requirement from the upstream code suggests a minimum of 2000 sensors are required for meaningful output. In such a case, we might need to revise our fixture data in the development and QA environments.

Here’s how that might look:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.
 
  SELECT INTO last_update MAX(reading_date)
    FROM sensor_report;
 
  ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';
 
  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.
 
  DELETE FROM sensor_report
   WHERE reading_date >= last_update;
 
  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.
 
  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), MIN(reading), MIN(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT COUNT(*) INTO insert_count
    FROM capture;
 
  ASSERT insert_count >= 2000, 'Not enough current sensor activity!';
 
  RETURN insert_count;
END;
$$ LANGUAGE plpgsql;

The function itself isn’t that much different; the important part is the result of our changes. The only truly meaningful modification is that the function no longer initializes the report table if it was empty. Ignoring that, we can treat the function exactly as before, as no output will change, and no new failure conditions are triggered.

This is due to the fact that assertions are ignored by default. Remember, they’re primarily intended for debugging purposes, so they must be explicitly enabled. We can do this by making an entry in postgresql.conf, but we can also activate them within our current session.

Let’s use our new assertions:

SET plpgsql.check_asserts = TRUE;
 
TRUNCATE TABLE sensor_report;
 
SELECT sp_trickle_report();
 
ERROR:  Need TO bootstrap report TABLE.
CONTEXT:  PL/pgSQL FUNCTION sp_trickle_report() line 12 at ASSERT
 
INSERT INTO sensor_report (
         location, reading_date, avg_reading,
         min_reading, max_reading
       )
SELECT location, reading_date::DATE,
       avg(reading), MIN(reading), MIN(reading)
  FROM sensor_log
 GROUP BY location, reading_date::DATE;
 
ANALYZE sensor_report;
 
SELECT sp_trickle_report();
 
ERROR:  NOT enough CURRENT sensor activity!
CONTEXT:  PL/pgSQL FUNCTION sp_trickle_report() line 38 at ASSERT
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 5000, s.id % 100,
       CURRENT_DATE - ((s.id * 1) || 's')::INTERVAL
  FROM generate_series(1, 50000) s(id);
 
SELECT sp_trickle_report();
 
 sp_trickle_report 
-------------------
              5000

We begin by truncating the report table so we can witness the first assertion. As we can see, Postgres correctly complains that the table is empty when it shouldn’t be. Since this is a true ERROR, execution immediately halts where the assertion failed. We see that error and use some alternative method for filling the report table, and try the function again.

Stop! Assert time!

The second execution fails because the sensor_log table only contains data based on 1000 sensors, yet we expect 2000. Besides telling us to update our fixture data to account for more sensors, we also learn what the function expects. So we decide to insert some fake data from 5000 sensors and try again. The last attempt works as expected, and we’re free to move on with our lives.

It seems odd initially that a formerly ignored statement gets escalated all the way to a fatal error, doesn’t it? But how else should assertions work? If we just needed information, we could have simply used RAISE NOTICE with the relevant details. The power of ASSERT is that we don’t need to remove the statements to deactivate them.

The reason assertions are fatal is because they raise an ASSERT_FAILURE exception. We can actually take advantage of that in debugging. We can catch that exception just like any other, meaning we can ignore it or escalate as necessary. Consider this example:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.
 
  SELECT INTO last_update MAX(reading_date)
    FROM sensor_report;
 
  ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';
 
  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.
 
  DELETE FROM sensor_report
   WHERE reading_date >= last_update;
 
  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.
 
  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), MIN(reading), MIN(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT COUNT(*) INTO insert_count
    FROM capture;
 
  ASSERT insert_count >= 2000, 'Not enough current sensor activity!';
 
  RETURN insert_count;
 
EXCEPTION
  WHEN ASSERT_FAILURE THEN
    RAISE NOTICE 'Tripped over debugging code: %', SQLERRM;
    RETURN 0;
END;
$$ LANGUAGE plpgsql;

The only thing we added was a single exception block to catch any exceptions thrown within the function. Then we toned down the error slightly while maintaining the original assertion message, and return 0 from the function since it didn’t process any rows due to the exception.

We can see this for ourselves by truncating the report table again:

SET plpgsql.check_asserts = TRUE;
 
TRUNCATE TABLE sensor_report;
 
SELECT sp_trickle_report();
 
NOTICE:  Tripped OVER debugging code: Need TO bootstrap report TABLE.
 sp_trickle_report 
-------------------
                 0

Now the function itself doesn’t fail or introduce an error into any transaction contexts we might have open, but we can still see something is wrong. The function claims to have inserted no rows, and we see a notice specifically from our own debugging system. In the production environment, this code would be completely silent and the function would operate without the overly oppressive assertions.

Is ASSERT overlooked because there are more suitable approaches to solve these problems? Or is it because the feature was added in 9.5, and it’s extremely easy to miss in the deluge of other headline-grabbing advancements? Honestly it’s probably a combination of those two and a myriad of other causes. The important thing is that we know about it now.

And as we are all well aware…

G.I. Joe!

PG Phriday: Down in the Dumps

These days with multiple Postgres database environments a commonality, it’s not unheard of to copy data from one to another. Perhaps a production extract is necessary to properly vet data in a staging environment. Maybe the development environment needs to update its data to reflect an especially pernicious and intractable edge case someone observed. In any of these scenarios, we are likely to extract data from multiple tables to import it elsewhere. The question is: how?

Last week I flatly stated that pg_dump is not an adequate backup tool. Despite that, it’s perfect for situations like these, where we only need a few tables out of an entire database. However, tables frequently reference other tables in foreign keys. While such integrity concerns do not affect data export, they greatly complicate import efforts.

The deeper nested the referential integrity rules become, we’ll pull out exponentially more hair in trying to track all of them to their requisite tables. Even after we’ve found them all, we still need to unravel the horrors again in the target cluster.

A typical table dependency chart

To better illustrate what we mean by “horrors”, take this relatively simple table hierarchy:

CREATE TABLE tab_a (id INT PRIMARY KEY);
CREATE TABLE tab_a1 (id INT PRIMARY KEY REFERENCES tab_a);
CREATE TABLE tab_b (id INT PRIMARY KEY REFERENCES tab_a);
CREATE TABLE tab_b1 (id INT PRIMARY KEY REFERENCES tab_b);
CREATE TABLE tab_c (id INT PRIMARY KEY REFERENCES tab_b);
CREATE TABLE tab_c1 (id INT PRIMARY KEY REFERENCES tab_c);
 
INSERT INTO tab_a VALUES (1);
INSERT INTO tab_a1 VALUES (1);
INSERT INTO tab_b VALUES (1);
INSERT INTO tab_b1 VALUES (1);
INSERT INTO tab_c VALUES (1);
INSERT INTO tab_c1 VALUES (1);

Imagine these are six tables out of potentially hundreds. From an outside perspective, we may only know that tab_a needs to be copied from one system to another. No matter what tool we use for this, finding the other tables is somewhat irritating. An ORM will display the relationships in a physical chart, and the psql command line client or pgAdmin will report tables that immediately depend on a table we examine. Not a great way to just isolate them all at once.

Fortunately we have the Postgres system catalog. The pg_constraint system table tracks all of those table relationships, and we can recursively fetch the entire tree using a CTE. Remember how awesome CTEs are? Let’s mix some chocolate with our peanut butter by using one to mine the catalog.

Starting with tab_a, here’s how we might find all of the other tables in the list:

WITH RECURSIVE deptree AS (
  SELECT conrelid, conrelid::REGCLASS::TEXT AS TABLE_NAME,
         confrelid, confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint
   WHERE contype IN ('f', 'p')
     AND (confrelid::REGCLASS::TEXT = 'tab_a' OR
          conrelid::REGCLASS::TEXT = 'tab_a')
   UNION
  SELECT c.conrelid, c.conrelid::REGCLASS::TEXT AS TABLE_NAME,
         c.confrelid, c.confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint c
    JOIN deptree d ON (d.conrelid = c.confrelid)
   WHERE c.contype = 'f'
)
SELECT *
  FROM deptree;
 
 conrelid | TABLE_NAME | confrelid | dep_table 
----------+------------+-----------+-----------
    69327 | tab_a      |         0 | -
    69332 | tab_a1     |     69327 | tab_a
    69342 | tab_b      |     69327 | tab_a
    69352 | tab_b1     |     69342 | tab_b
    69362 | tab_c      |     69342 | tab_b
    69372 | tab_c1     |     69362 | tab_c

As complicated as this query appears, it’s not actually that bad. We simply bootstrap the CTE with any tables that depend on tab_a, and repeat the process with each successive loop. Even if there are dozens or hundreds of incestuous relationships, we’ll capture all of them.

In any case, we have all the data we need to construct a pg_dump command to export all of these tables. To keep this as speedy as possible, let’s perform a parallel dump of the indicated tables into a directory we can transmit to another server.

pg_dump -t tab_a -t tab_a1 -t tab_b -t tab_b1 -t tab_c -t tab_c1 \
        -a -Fd -j 8 -f /tmp/tab_exports postgres

Though we can list each table individually, it would be nice if we could supply a filename of desired tables instead. Dependency graphs of sufficient depth could make for a literally impossible command-line given character limit constraints. This means we may need to execute multiple dumps or rely on wildcards and hope that there’s enough name overlap.

Given that caveat, maybe we want to combine pg_dump with xargs and its parallel capabilities instead:

mkdir /tmp/tab_exports
 
cat /tmp/tablist.txt | xargs -I{} -P 8 \
    pg_dump -t {} -a -Fc -f /tmp/tab_exports/{}.pgr postgres

Either way, we now have a data-only representation of every table in the dependency tree of the table we’re actually importing. Now we just have to insert these values into the target system. And this is usually where we first encounter a major roadblock: existing contents.

First of all, that data is in our way. Secondly, the most efficient manner of ridding ourselves of their oppression is the TRUNCATE command. While TRUNCATE is a DDL-level command and can’t be circumvented, it does have a CASCADE decorator which will empty the entire dependency chain. This actually helps us since we have data to rebuild anything that might be affected by a cascading truncate.

Let’s give it a try on the recipient database:

TRUNCATE TABLE tab_a;
 
ERROR:  cannot TRUNCATE a TABLE referenced IN a FOREIGN KEY CONSTRAINT
DETAIL:  TABLE "tab_a1" REFERENCES "tab_a".
HINT:  TRUNCATE TABLE "tab_a1" at the same TIME, OR USE TRUNCATE ... CASCADE.
 
TRUNCATE TABLE tab_a CASCADE;
 
NOTICE:  TRUNCATE cascades TO TABLE "tab_a1"
NOTICE:  TRUNCATE cascades TO TABLE "tab_b"
NOTICE:  TRUNCATE cascades TO TABLE "tab_b1"
NOTICE:  TRUNCATE cascades TO TABLE "tab_c"
NOTICE:  TRUNCATE cascades TO TABLE "tab_c1"

Well that was terrifying. Also note that the error only reported one of the dependencies. If we relied only on this information, we might attempt to drop the foreign key constraint thinking it was the only one. Every subsequent attempt to truncate the table would result in another error.

We’ve avoided all of that, but imagine if we had attempted to restrict our efforts to only the data in tab_a. We’d be left with no benefits of cascade, manually tracing the errors until success, inability to reestablish the constraints due to likely missing keys, and so on. No thanks.

Though truncate and the foreign keys aided us in emptying the tables on the recipient, restoring from our dump is somewhat problematic. We want to do it in parallel, but foreign keys directly prevent this by enforcing key integrity at each level. So we’re stuck either importing each table in order of the dependency tree, or we need to do a bit of preparation.

We can leverage that CTE again and build a couple of SQL scripts to help us out:

CREATE TEMP TABLE tab_deps AS
WITH RECURSIVE deptree AS (
  SELECT conrelid, conrelid::REGCLASS::TEXT AS TABLE_NAME,
         confrelid, confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint
   WHERE contype IN ('f', 'p')
     AND (confrelid::REGCLASS::TEXT = 'tab_a' OR
          conrelid::REGCLASS::TEXT = 'tab_a')
   UNION
  SELECT c.conrelid, c.conrelid::REGCLASS::TEXT AS TABLE_NAME,
         c.confrelid, c.confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint c
    JOIN deptree d ON (d.conrelid = c.confrelid)
   WHERE c.contype = 'f'
)
SELECT *
  FROM deptree;
 
COPY (
  SELECT 'ALTER TABLE ' || TABLE_NAME || ' DISABLE TRIGGER ALL;'
    FROM tab_deps
) TO '/tmp/no_trigs.sql';
 
COPY (
  SELECT 'ALTER TABLE ' || TABLE_NAME || ' ENABLE TRIGGER ALL;'
    FROM tab_deps
) TO '/tmp/fix_trigs.sql';

One thing that’s interesting about Postgres is that foreign keys are actually enforced by hidden system-level triggers. Given that information, we can simply turn them off for the duration of our import, and re-enable them afterwards. Now we have all the parts we need to restore the donor data into the environment of our choice.

It should be cautioned that all of the import steps should be performed on the target system itself. Even if this means synchronizing the dump directory to another system first. We don’t want to fat-finger these commands in production or an equally important system.

In any case, this is how that might go:

psql -f /tmp/no_trigs.sql postgres
psql -c 'TRUNCATE TABLE tab_a CASCADE;' postgres
pg_restore -d postgres -j 8 /tmp/tab_exports
psql -f /tmp/fix_trigs.sql postgres

Or if we used xargs, we’ll want this procedure instead:

psql -f /tmp/no_trigs.sql postgres
psql -c 'TRUNCATE TABLE tab_a CASCADE;' postgres
 
cat /tmp/tablist.txt | xargs -I{} -P 8 \
    pg_restore -d postgres /tmp/tab_exports/{}.pgr
 
psql -f /tmp/fix_trigs.sql postgres

And voila, we’re saved!

Of course table structures between the environments may not match. In that case, we would simply drop all of the involved tables instead of truncating them and stunning their constraint triggers. As such, we would want to remove the -a data-only flag from pg_dump commands we used so pg_restore can recreate them. Ultimately, we’ve broken the paradox of satisfying constraints while simultaneously circumventing them.

It’s not an ideal situation, but hey, whatever works.