PG Phriday: Cult of Functionality

Friday September 02, 2016

It’s no surprise Postgres does more than merely store data; you can’t do everything with SQL. Often, it’s often more beneficial to process data locally without transmitting it to a client interface. Local manipulation can save hours in network traffic alone, let alone client-side allocation and per-row processing. Databases like Postgres are specifically for bulk data operations, so why not take advantage?

But doing that requires functions—anonymous or otherwise—and a Turing-complete language to write them with. For data-hungry jobs that benefit from local evaluation, Postgres has a veritable smorgasbord of procedural languages ready to sate almost any coding appetite. But where do we start?

Tightly integrating Postgres with a project is getting easier with each release. So reasonable in fact, we’re liable to forget that Postgres has its own fairly capable language in PL/pgSQL. Let’s work with Postgres’ free language a bit and explore its functionality, shall we?

Just so we have something to manipulate, let’s start by defining a basic table and associated data:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 100, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id || 's')::INTERVAL
  FROM generate_series(1, 500) s(id);

ANALYZE sensor_log;

With that out of the way, let’s start with a short history lesson. It’s hard to ignore the blocky nature of SQL statement organization. This is almost a natural consequence of a language saddled with such high verbosity and numerous ubiquitous keywords. Each SQL statement becomes a series of high-level directives led by grouped operations, making contextual break-points for each. So we end up seeing SELECT ... FROM ... WHERE ... GROUP BY ... operational blocks.

Some language nerds might recognize this cadence. To make it a bit more obvious, here’s a very basic Postgres PL/pgSQL function:

CREATE OR REPLACE FUNCTION f_add_nums(x INT, y INT)
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val INT;
BEGIN
    ret_val := x + y;
    RETURN ret_val;
END;
$BODY$;

There’s no coincidence that this highly resembles Oracle’s PL/SQL. But Oracle was inspired by Ada, a language developed for use by the US government. Ada itself is highly influenced by Pascal. With this kind of pedigree and a very similar block-style approach, it shouldn’t be surprising so many database languages feel similar.

With that in mind, Postgres does take advantage of many of its own syntax choices within this realm, either to simplify statements or provide better integration with the core. For example, there are essentially two ways to loop through a SQL statement within a procedure:

-- First, using an explicit cursor:

CREATE OR REPLACE FUNCTION f_process_log()
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val BIGINT := 0;
    log_record RECORD;

    log_results CURSOR FOR
        SELECT reading
          FROM sensor_log;
BEGIN
    FOR log_record IN log_results LOOP
      ret_val := ret_val + log_record.reading;
    END LOOP;

    RETURN ret_val;
END;
$BODY$;

-- Next, with an implicit cursor as a shortcut:

CREATE OR REPLACE FUNCTION f_process_log()
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val BIGINT := 0;
    read_value BIGINT;
BEGIN
    FOR read_value IN
        SELECT reading
          FROM sensor_log
    LOOP
      ret_val := ret_val + read_value;
    END LOOP;

    RETURN ret_val;
END;
$BODY$;

Of these two approaches, the second is likely preferred by actual human beings. The cursor is still there, but exists ephemerally, doing its magic in the background. In using the explicit cursor, we have to ensure we haven’t opened it previously in the function, and addressing the results is a bit more cumbersome.

What about conditionals based on query results? Again, there are both hard and easy ways of going about this:

-- First, using a variable as an outright result:

CREATE OR REPLACE FUNCTION f_check_sensor(sensor_loc TEXT)
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val BIGINT;
BEGIN
    ret_val := (SELECT id
                  FROM sensor_log
                 WHERE location = sensor_loc
                 LIMIT 1);

    IF ret_val IS NULL THEN
        RETURN FALSE;
    END IF;

    RETURN TRUE;
END;
$BODY$;

-- Next, using built-in syntax:

CREATE OR REPLACE FUNCTION f_check_sensor(sensor_loc TEXT)
RETURNS BOOLEAN 
LANGUAGE plpgsql AS
$BODY$
BEGIN
    PERFORM id
       FROM sensor_log
      WHERE location = sensor_loc
      LIMIT 1;

    IF NOT FOUND THEN
        RETURN FALSE;
    END IF;

    RETURN TRUE;
END;
$BODY$;

This time we used the PERFORM syntax instead of SELECT. It’s one of many available while relying on PL/pgSQL. It’s meant for times when we don’t explicitly require query results, but still need to execute a query. Once again, we leveraged implicit-style syntax to simplify the code (and our thought process).

This also extends into exception handling. We’ve already covered exception performance considerations, but there are also cool syntax tricks available. Imagine we want to process all log readings for the current day, raising a warning when certain boundaries are violated.

Watch what happens if we just use a naked exception:

CREATE OR REPLACE FUNCTION f_process_current_log()
RETURNS VOID 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    log_rec RECORD;
BEGIN
    FOR log_rec IN
        SELECT location, reading, reading_date
          FROM sensor_log
         WHERE reading_date >= CURRENT_DATE
    LOOP
        -- Imagine this is some complicated procedure. Maybe we
        -- call a function, for instance, and consider a bad
        -- result or a passed exception as fatal.

        IF log_rec.reading > 98 THEN
            RAISE EXCEPTION 'location % reading % invalid at %!',
                  log_rec.location, log_rec.reading,
                  log_rec.reading_date;
        END IF;
    END LOOP;
END;
$BODY$;

SELECT f_process_current_log();

ERROR:  location 99 reading 99 invalid at 2016-09-02 07:58:21!

Well, we built our fake data, so we know for a fact that there should be five errors returned, but processing stopped right away. We could reduce the exception to a warning to prevent this, yet we can’t always count on downstream functions on being so forgiving. If this function invokes several other sub-functions, we’ll eventually get a fatal exception that will derail the entire processing loop. How can we prevent this?

Perhaps not so obvious, but Postgres blocks can go pretty much anywhere. Here’s the same function without its fatal processing issue:

CREATE OR REPLACE FUNCTION f_process_current_log()
RETURNS VOID 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    log_rec RECORD;
BEGIN
    FOR log_rec IN
        SELECT location, reading, reading_date
          FROM sensor_log
         WHERE reading_date >= CURRENT_DATE
    LOOP
        -- Imagine this is some complicated procedure. Maybe we
        -- call a function, for instance, and consider a bad
        -- result or a passed exception as fatal.

        BEGIN
            IF log_rec.reading > 98 THEN
                RAISE EXCEPTION 'location % reading % invalid at %!',
                      log_rec.location, log_rec.reading,
                      log_rec.reading_date;
            END IF;
        EXCEPTION
            WHEN OTHERS THEN
                RAISE NOTICE '%', SQLERRM;
        END;
    END LOOP;
END;
$BODY$;

SELECT f_process_current_log();

NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:58:21!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:56:41!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:55:01!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:53:21!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:51:41!

By wrapping the “dangerous” external call in another block, we were able to trap and override the exception it raised. This is effectively the same as a try/catch block in other languages, but may be potentially confusing because of its similarity to the outer block syntax. Regardless, PL/pgSQL has most of the trappings of a modern language, even if some appear a bit antiquated by current standards.

The idea with PL/pgSQL is that it highly resembles the SQL language itself, and allows directly embedding SQL statements. Within a database context, this is a massive time saver for native SQL speakers. Consider the mechanism for enabling a function to return row results:

CREATE OR REPLACE FUNCTION f_generate_and_return(new_count INT)
RETURNS SETOF sensor_log 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    n_count INT := 0;
    new_row RECORD;
BEGIN
    FOR n_count IN 1..new_count
    LOOP
        INSERT INTO sensor_log (location, reading, reading_date)
        SELECT n_count, n_count % 100, now()
        RETURNING * INTO new_row;

        RETURN NEXT new_row;
    END LOOP;
END;
$BODY$;

SELECT * FROM f_generate_and_return(5);

 id  | location | reading |        reading_date        
-----+----------+---------+----------------------------
 501 | 1        |       1 | 2016-09-02 13:39:33.566244
 502 | 2        |       2 | 2016-09-02 13:39:33.566244
 503 | 3        |       3 | 2016-09-02 13:39:33.566244
 504 | 4        |       4 | 2016-09-02 13:39:33.566244
 505 | 5        |       5 | 2016-09-02 13:39:33.566244

How about that? Not only were we able to insert new rows, but we fetched them to immediately verify their presence. By using the SETOF decorator, we told Postgres that the function would return the declared result set. We could do this for any existing table or view within the database, or we could define our own type for special cases. In essence, we are transforming our function into a reentrant version with a persistent stack, and all that implies. Yet everything still resembles SQL, not some other language that happens to support SQL through function calls or other convoluted contortions.

Now, this is an extremely trivial application of this capability. In fact, we highly recommend against looping inserts, but some complicated business logic or external function calls might demand this, or a similar level of expensive processing. Try to consider this as a demonstration, rather than a bible for best practices.

In any case, we hope Pl/pgSQL becomes part of your everyday lexicon!

See Also