PG Phriday: Anonymous Blocks and Object Manipulation

Postgres has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing Postgres to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
DECLARE
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
BEGIN
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
  LOOP
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';

    EXECUTE 'ALTER TABLE ' || tab || ' DROP CONSTRAINT ' ||
        quote_ident(chk);

    EXECUTE 'ALTER TABLE ' || tab || ' ADD CONSTRAINT ' ||
        quote_ident(chk) || ' CHECK (' ||
        quote_ident(col) || ' >= ' || quote_literal(sdate) ||
          ' AND ' ||
        quote_ident(col) || ' < ' || quote_literal(edate) || ')';
  END LOOP;
END;
$$ LANGUAGE plpgsql;

I didn’t just use a bunch of unnecessary variables for giggles. The original version of this block used a single RECORD and a subquery to collect all of the necessary substitutions in their calculated forms. However, I felt this discussion needed a simpler step-by-step logic. Now let’s discuss this rather large block of SQL, because it is a very interesting lesson in several different aspects of the PL/pgSQL procedural language.

If you didn’t already know, you can loop through SQL results in a FOR loop, and pull SELECT results into variables while doing so. This is fairly common knowledge, so I won’t dwell on it. We should however, examine the query itself:

SELECT i.inhrelid::REGCLASS::TEXT AS tab,
       co.conname AS cname,
       substring(co.consrc FROM '\w+') AS col
  FROM pg_inherits i
  JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
 WHERE co.contype = 'c'

Here, we’re making use of system catalog tables that help Postgres manage table metadata. First comes pg_inherits for information on child partitions, since they’re extremely likely to inherit from some base table as suggested by the partition documentation. Next, we incorporate information from pg_constraint so we know the name of each check constraint (contype of ‘c’) to modify.

Regarding the SELECT block itself, there is admittedly some magic going on here. The REGCLASS type serves a dual purpose in Postgres. For one, it is compatible with the OID object identifier type used extensively in the catalog tables. And second, when cast to TEXT, it outputs the schema and object name it represents, based on the current namespace. This means that, no matter where we are in the database, we will get a full substitution of the object—wherever it lives.

In that same block, we also abuse the consrc field to obtain the name of the column used in the constraint. There’s probably a more official way to get this, but as it turns out, the \w wildcard will match any word character. By globbing with +, we essentially grab the first series of word characters in the check. It might not work with other check constraints, but date-based partition rules generally only have an upper and lower bound. For these, the first match gives us the column name, and we don’t care about the rest.

Within the loop itself, things are a bit more straight-forward. After a bit of variable juggling, we start by dropping the old check. It was malformed, so good riddance. Then we build the new constraint based on our desired start and end dates. Note the use of quote_literal here. By using this function, the date variables are converted to text and quoted as static values. The end result is a query like this:

ALTER TABLE some_child_201504
  ADD CONSTRAINT ck_insert_date_201504
CHECK (insert_date >= '2015-04-01' AND
       insert_date < '2015-05-01')

Because these static text values do not match the column type, Postgres will automatically cast them in the physical constraint it actually creates. This prevents the check type mismatches we wrestled with in the last article.

So ends this example of fixing some broken DDL with an ad-hoc anonymous block. In the past, it was fairly common for DBAs to write queries using concatenation to write the DDL commands in the SELECT section of the query. Then we would direct that output to a script, and execute it separately. In this particular case, we would need two scripts: one to drop the constraints, and another to re-create them. That approach is certainly an option for those still uncomfortable working with anonymous blocks or EXECUTE statements.

In the end, I always encourage exploring capabilities to their full extent. Dig into Server Programming documentation if you really want to learn more.