PG Phriday: Cult of Functionality
It’s no surprise Postgres does more than merely store data; you can’t do everything with SQL. Often, it’s often more beneficial to process data locally without transmitting it to a client interface. Local manipulation can save hours in network traffic alone, let alone client-side allocation and per-row processing. Databases like Postgres are specifically for bulk data operations, so why not take advantage?
But doing that requires functions—anonymous or otherwise—and a Turing-complete language to write them with. For data-hungry jobs that benefit from local evaluation, Postgres has a veritable smorgasbord of procedural languages ready to sate almost any coding appetite. But where do we start?
Tightly integrating Postgres with a project is getting easier with each release. So reasonable in fact, we’re liable to forget that Postgres has its own fairly capable language in PL/pgSQL. Let’s work with Postgres’ free language a bit and explore its functionality, shall we?
Just so we have something to manipulate, let’s start by defining a basic table and associated data:
CREATE TABLE sensor_log ( id SERIAL PRIMARY KEY NOT NULL, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL ); INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 100, s.id % 100, CURRENT_DATE + INTERVAL '8h' - (s.id || 's')::INTERVAL FROM generate_series(1, 500) s(id); ANALYZE sensor_log;
With that out of the way, let’s start with a short history lesson. It’s hard to ignore the blocky nature of SQL statement organization. This is almost a natural consequence of a language saddled with such high verbosity and numerous ubiquitous keywords. Each SQL statement becomes a series of high-level directives led by grouped operations, making contextual break-points for each. So we end up seeing
SELECT ... FROM ... WHERE ... GROUP BY ... operational blocks.
Some language nerds might recognize this cadence. To make it a bit more obvious, here’s a very basic Postgres PL/pgSQL function:
CREATE OR REPLACE FUNCTION f_add_nums(x INT, y INT) RETURNS INT LANGUAGE plpgsql AS $BODY$ DECLARE ret_val INT; BEGIN ret_val := x + y; RETURN ret_val; END; $BODY$;
There’s no coincidence that this highly resembles Oracle’s PL/SQL. But Oracle was inspired by Ada, a language developed for use by the US government. Ada itself is highly influenced by Pascal. With this kind of pedigree and a very similar block-style approach, it shouldn’t be surprising so many database languages feel similar.
With that in mind, Postgres does take advantage of many of its own syntax choices within this realm, either to simplify statements or provide better integration with the core. For example, there are essentially two ways to loop through a SQL statement within a procedure:
-- First, using an explicit cursor: CREATE OR REPLACE FUNCTION f_process_log() RETURNS INT LANGUAGE plpgsql AS $BODY$ DECLARE ret_val BIGINT := 0; log_record RECORD; log_results CURSOR FOR SELECT reading FROM sensor_log; BEGIN FOR log_record IN log_results LOOP ret_val := ret_val + log_record.reading; END LOOP; RETURN ret_val; END; $BODY$; -- Next, with an implicit cursor as a shortcut: CREATE OR REPLACE FUNCTION f_process_log() RETURNS INT LANGUAGE plpgsql AS $BODY$ DECLARE ret_val BIGINT := 0; read_value BIGINT; BEGIN FOR read_value IN SELECT reading FROM sensor_log LOOP ret_val := ret_val + read_value; END LOOP; RETURN ret_val; END; $BODY$;
Of these two approaches, the second is likely preferred by actual human beings. The cursor is still there, but exists ephemerally, doing its magic in the background. In using the explicit cursor, we have to ensure we haven’t opened it previously in the function, and addressing the results is a bit more cumbersome.
What about conditionals based on query results? Again, there are both hard and easy ways of going about this:
-- First, using a variable as an outright result: CREATE OR REPLACE FUNCTION f_check_sensor(sensor_loc TEXT) RETURNS INT LANGUAGE plpgsql AS $BODY$ DECLARE ret_val BIGINT; BEGIN ret_val := (SELECT id FROM sensor_log WHERE location = sensor_loc LIMIT 1); IF ret_val IS NULL THEN RETURN FALSE; END IF; RETURN TRUE; END; $BODY$; -- Next, using built-in syntax: CREATE OR REPLACE FUNCTION f_check_sensor(sensor_loc TEXT) RETURNS BOOLEAN LANGUAGE plpgsql AS $BODY$ BEGIN PERFORM id FROM sensor_log WHERE location = sensor_loc LIMIT 1; IF NOT FOUND THEN RETURN FALSE; END IF; RETURN TRUE; END; $BODY$;
This time we used the
PERFORM syntax instead of
SELECT. It’s one of many available while relying on PL/pgSQL. It’s meant for times when we don’t explicitly require query results, but still need to execute a query. Once again, we leveraged implicit-style syntax to simplify the code (and our thought process).
This also extends into exception handling. We’ve already covered exception performance considerations, but there are also cool syntax tricks available. Imagine we want to process all log readings for the current day, raising a warning when certain boundaries are violated.
Watch what happens if we just use a naked exception:
CREATE OR REPLACE FUNCTION f_process_current_log() RETURNS VOID LANGUAGE plpgsql AS $BODY$ DECLARE log_rec RECORD; BEGIN FOR log_rec IN SELECT location, reading, reading_date FROM sensor_log WHERE reading_date >= CURRENT_DATE LOOP -- Imagine this is some complicated procedure. Maybe we -- call a function, for instance, and consider a bad -- result or a passed exception as fatal. IF log_rec.reading > 98 THEN RAISE EXCEPTION 'location % reading % invalid at %!', log_rec.location, log_rec.reading, log_rec.reading_date; END IF; END LOOP; END; $BODY$; SELECT f_process_current_log(); ERROR: location 99 reading 99 invalid at 2016-09-02 07:58:21!
Well, we built our fake data, so we know for a fact that there should be five errors returned, but processing stopped right away. We could reduce the exception to a warning to prevent this, yet we can’t always count on downstream functions on being so forgiving. If this function invokes several other sub-functions, we’ll eventually get a fatal exception that will derail the entire processing loop. How can we prevent this?
Perhaps not so obvious, but Postgres blocks can go pretty much anywhere. Here’s the same function without its fatal processing issue:
CREATE OR REPLACE FUNCTION f_process_current_log() RETURNS VOID LANGUAGE plpgsql AS $BODY$ DECLARE log_rec RECORD; BEGIN FOR log_rec IN SELECT location, reading, reading_date FROM sensor_log WHERE reading_date >= CURRENT_DATE LOOP -- Imagine this is some complicated procedure. Maybe we -- call a function, for instance, and consider a bad -- result or a passed exception as fatal. BEGIN IF log_rec.reading > 98 THEN RAISE EXCEPTION 'location % reading % invalid at %!', log_rec.location, log_rec.reading, log_rec.reading_date; END IF; EXCEPTION WHEN OTHERS THEN RAISE NOTICE '%', SQLERRM; END; END LOOP; END; $BODY$; SELECT f_process_current_log(); NOTICE: location 99 reading 99 invalid at 2016-09-02 07:58:21! NOTICE: location 99 reading 99 invalid at 2016-09-02 07:56:41! NOTICE: location 99 reading 99 invalid at 2016-09-02 07:55:01! NOTICE: location 99 reading 99 invalid at 2016-09-02 07:53:21! NOTICE: location 99 reading 99 invalid at 2016-09-02 07:51:41!
By wrapping the “dangerous” external call in another block, we were able to trap and override the exception it raised. This is effectively the same as a try/catch block in other languages, but may be potentially confusing because of its similarity to the outer block syntax. Regardless, PL/pgSQL has most of the trappings of a modern language, even if some appear a bit antiquated by current standards.
The idea with PL/pgSQL is that it highly resembles the SQL language itself, and allows directly embedding SQL statements. Within a database context, this is a massive time saver for native SQL speakers. Consider the mechanism for enabling a function to return row results:
CREATE OR REPLACE FUNCTION f_generate_and_return(new_count INT) RETURNS SETOF sensor_log LANGUAGE plpgsql AS $BODY$ DECLARE n_count INT := 0; new_row RECORD; BEGIN FOR n_count IN 1..new_count LOOP INSERT INTO sensor_log (location, reading, reading_date) SELECT n_count, n_count % 100, now() RETURNING * INTO new_row; RETURN NEXT new_row; END LOOP; END; $BODY$; SELECT * FROM f_generate_and_return(5); id | location | reading | reading_date -----+----------+---------+---------------------------- 501 | 1 | 1 | 2016-09-02 13:39:33.566244 502 | 2 | 2 | 2016-09-02 13:39:33.566244 503 | 3 | 3 | 2016-09-02 13:39:33.566244 504 | 4 | 4 | 2016-09-02 13:39:33.566244 505 | 5 | 5 | 2016-09-02 13:39:33.566244
How about that? Not only were we able to insert new rows, but we fetched them to immediately verify their presence. By using the
SETOF decorator, we told Postgres that the function would return the declared result set. We could do this for any existing table or view within the database, or we could define our own type for special cases. In essence, we are transforming our function into a reentrant version with a persistent stack, and all that implies. Yet everything still resembles SQL, not some other language that happens to support SQL through function calls or other convoluted contortions.
Now, this is an extremely trivial application of this capability. In fact, we highly recommend against looping inserts, but some complicated business logic or external function calls might demand this, or a similar level of expensive processing. Try to consider this as a demonstration, rather than a bible for best practices.
In any case, we hope Pl/pgSQL becomes part of your everyday lexicon!