PG Phriday: Getting Assertive

Friday February 17, 2017

There are a lot of languages available for authoring Postgres functions, but there’s nothing quite like the the classic PL/pgSQL. It’s SQL! It’s not SQL! It’s a kind of horrifying mutant crossbreed suitable only for terrifying small children and generating complex reports from a cavalcade of dubious sources! And deep within its twisted entrails is an often overlooked feature usually only available in far more mature entities.

Just wait until they add sharding

That’s right, it’s obvious we’re referring to the ASSERT statement.

When we say “often overlooked”, we’re not kidding. The pertinent documentation offers a meager overview following a far more comprehensive summary of the RAISE statement. We thought it deserved better.

Let’s begin with a simple table example that can form the basis of a more complex function:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);

CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

ANALYZE sensor_log;

What can we do with five million rows of simulated data from 1000 sensors that collect data every ten seconds? A good place to start is a summary or report table of some kind. We’re only doing cursory analysis, so we should begin with aggregating the daily average, minimum, and maximum. If we get adventurous later, we can add standard deviations, jitter, deltas, and other fancy logic report owners might find useful.

Since this type of data is easy to tabulate, we require one single table and a function to manage its contents.

CREATE TABLE sensor_report (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading_date  DATE NOT NULL,
  avg_reading   BIGINT NOT NULL,
  min_reading   BIGINT NOT NULL,
  max_reading   BIGINT NOT NULL
);

CREATE INDEX idx_sensor_report_date ON sensor_report (reading_date);

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.

  SELECT INTO last_update max(reading_date)
    FROM sensor_report;

  -- Bootstrap the report table if it's currently empty.

  IF last_update IS NULL THEN
    SELECT INTO last_update min(reading_date)
      FROM sensor_log;
  END IF;

  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.

  DELETE FROM sensor_report
   WHERE reading_date >= last_update;

  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.

  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), min(reading), min(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT count(*) INTO insert_count
    FROM capture;

   RETURN insert_count;
END;
$$ LANGUAGE plpgsql;

SELECT sp_trickle_report();

 sp_trickle_report 
-------------------
            579000

ANALYZE sensor_report;

Thankfully this is a very simple (and somewhat naive) reporting function. Imagine this is our first iteration and we want to debug some of its activity. If we were using Python for example, we might pepper the code with assert statements or craft specific tests accomplish that same goal. Well PL/pgSQL allows us to do just that.

For the second iteration, let’s assume we want to avoid bootstrapping the report table if it’s empty. After all, bootstrapping only needs to be done once, so why pollute the function with that edge case? Besides, it’s entirely possible we might require a separate procedure to initialize the reporting table, and isolated functionality is generally less buggy.

If the table isn’t bootstrapped, it’s probably because we’re in a testing environment that is missing data. In this case, it makes sense to complain so the missing data is loaded, even though it’s not strictly an erroneous scenario. So let’s replace the bootstrap with an assertion. While we’re at it, let’s say a new requirement from the upstream code suggests a minimum of 2000 sensors are required for meaningful output. In such a case, we might need to revise our fixture data in the development and QA environments.

Here’s how that might look:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.

  SELECT INTO last_update max(reading_date)
    FROM sensor_report;

  ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';

  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.

  DELETE FROM sensor_report
   WHERE reading_date >= last_update;

  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.

  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), min(reading), min(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT count(*) INTO insert_count
    FROM capture;

  ASSERT insert_count >= 2000, 'Not enough current sensor activity!';

  RETURN insert_count;
END;
$$ LANGUAGE plpgsql;

The function itself isn’t that much different; the important part is the result of our changes. The only truly meaningful modification is that the function no longer initializes the report table if it was empty. Ignoring that, we can treat the function exactly as before, as no output will change, and no new failure conditions are triggered.

This is due to the fact that assertions are ignored by default. Remember, they’re primarily intended for debugging purposes, so they must be explicitly enabled. We can do this by making an entry in postgresql.conf, but we can also activate them within our current session.

Let’s use our new assertions:

SET plpgsql.check_asserts = TRUE;

TRUNCATE TABLE sensor_report;

SELECT sp_trickle_report();

ERROR:  Need to bootstrap report table.
CONTEXT:  PL/pgSQL function sp_trickle_report() line 12 at ASSERT

INSERT INTO sensor_report (
         location, reading_date, avg_reading,
         min_reading, max_reading
       )
SELECT location, reading_date::DATE,
       avg(reading), min(reading), min(reading)
  FROM sensor_log
 GROUP BY location, reading_date::DATE;

ANALYZE sensor_report;

SELECT sp_trickle_report();

ERROR:  Not enough current sensor activity!
CONTEXT:  PL/pgSQL function sp_trickle_report() line 38 at ASSERT

INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 5000, s.id % 100,
       CURRENT_DATE - ((s.id * 1) || 's')::INTERVAL
  FROM generate_series(1, 50000) s(id);

SELECT sp_trickle_report();

 sp_trickle_report 
-------------------
              5000

We begin by truncating the report table so we can witness the first assertion. As we can see, Postgres correctly complains that the table is empty when it shouldn’t be. Since this is a true ERROR, execution immediately halts where the assertion failed. We see that error and use some alternative method for filling the report table, and try the function again.

Stop! Assert time!

The second execution fails because the sensor_log table only contains data based on 1000 sensors, yet we expect 2000. Besides telling us to update our fixture data to account for more sensors, we also learn what the function expects. So we decide to insert some fake data from 5000 sensors and try again. The last attempt works as expected, and we’re free to move on with our lives.

It seems odd initially that a formerly ignored statement gets escalated all the way to a fatal error, doesn’t it? But how else should assertions work? If we just needed information, we could have simply used RAISE NOTICE with the relevant details. The power of ASSERT is that we don’t need to remove the statements to deactivate them.

The reason assertions are fatal is because they raise an ASSERT_FAILURE exception. We can actually take advantage of that in debugging. We can catch that exception just like any other, meaning we can ignore it or escalate as necessary. Consider this example:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.

  SELECT INTO last_update max(reading_date)
    FROM sensor_report;

  ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';

  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.

  DELETE FROM sensor_report
   WHERE reading_date >= last_update;

  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.

  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), min(reading), min(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT count(*) INTO insert_count
    FROM capture;

  ASSERT insert_count >= 2000, 'Not enough current sensor activity!';

  RETURN insert_count;

EXCEPTION
  WHEN ASSERT_FAILURE THEN
    RAISE NOTICE 'Tripped over debugging code: %', SQLERRM;
    RETURN 0;
END;
$$ LANGUAGE plpgsql;

The only thing we added was a single exception block to catch any exceptions thrown within the function. Then we toned down the error slightly while maintaining the original assertion message, and return 0 from the function since it didn’t process any rows due to the exception.

We can see this for ourselves by truncating the report table again:

SET plpgsql.check_asserts = TRUE;

TRUNCATE TABLE sensor_report;

SELECT sp_trickle_report();

NOTICE:  Tripped over debugging code: Need to bootstrap report table.
 sp_trickle_report 
-------------------
                 0

Now the function itself doesn’t fail or introduce an error into any transaction contexts we might have open, but we can still see something is wrong. The function claims to have inserted no rows, and we see a notice specifically from our own debugging system. In the production environment, this code would be completely silent and the function would operate without the overly oppressive assertions.

Is ASSERT overlooked because there are more suitable approaches to solve these problems? Or is it because the feature was added in 9.5, and it’s extremely easy to miss in the deluge of other headline-grabbing advancements? Honestly it’s probably a combination of those two and a myriad of other causes. The important thing is that we know about it now.

And as we are all well aware…

G.I. Joe!

See Also