PG Phriday: Anonymous Blocks and Object Manipulation

April 17th, 2015 | Published in Database, Tech Talk | No Comments


PGDB has had anonymous blocks since the release of 9.0 in late 2010. But it must either be one of those features that got lost in the shuffle, or is otherwise considered too advanced, because I rarely see it used in the wild. If that’s the case, it’s a great shame considering the raw power it conveys. Without committing to a function, we can essentially execute any code in the database, with or without SQL input.

Why is that good? One potential element of overhead when communicating with a database is network transfer. If processing millions of rows, forcing PGDB to allocate and push those results over the network will be much slower than manipulating them locally within the database itself. However, the looming specter of ad-hoc scripts is always a threat as well.

It was the latter scenario that prompted this particular discussion. A few weeks ago, I addressed date-based constraints and how they’re easy to get wrong. Knowing this, there’s a good chance we have objects in our database that need revision in order to operate properly. In one particular instance, I needed to correct over 800 existing check constraints an automated system built over the last year.

I hope you can imagine that’s not something I would want to do by hand. So it was a great opportunity to invoke an anonymous block, because there’s very little chance I’d need to do this regularly enough to justify a fully-fledged function. In the end, I came up with something like this:

DO $$
DECLARE
  chk TEXT;
  col TEXT;
  edate DATE;
  sdate DATE;
  tab TEXT;
  ym TEXT;
BEGIN
  FOR tab, chk, col IN 
      SELECT i.inhrelid::REGCLASS::TEXT AS tab,
             co.conname AS cname,
             substring(co.consrc FROM '\w+') AS col
        FROM pg_inherits i
        JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
       WHERE co.contype = 'c'
  LOOP
    ym := substring(tab FROM '......$');
    sdate := to_date(ym, 'YYYYMM01');
    edate := sdate + INTERVAL '1 mon';

    EXECUTE 'ALTER TABLE ' || tab || ' DROP CONSTRAINT ' ||
        quote_ident(chk);

    EXECUTE 'ALTER TABLE ' || tab || ' ADD CONSTRAINT ' ||
        quote_ident(chk) || ' CHECK (' ||
        quote_ident(col) || ' >= ' || quote_literal(sdate) ||
          ' AND ' ||
        quote_ident(col) || ' < ' || quote_literal(edate) || ')';
  END LOOP;
END;
$$ LANGUAGE plpgsql;

I didn’t just use a bunch of unnecessary variables for giggles. The original version of this block used a single RECORD and a subquery to collect all of the necessary substitutions in their calculated forms. However, I felt this discussion needed a simpler step-by-step logic. Now let’s discuss this rather large block of SQL, because it is a very interesting lesson in several different aspects of the PL/pgSQL procedural language.

If you didn’t already know, you can loop through SQL results in a FOR loop, and pull SELECT results into variables while doing so. This is fairly common knowledge, so I won’t dwell on it. We should however, examine the query itself:

SELECT i.inhrelid::REGCLASS::TEXT AS tab,
       co.conname AS cname,
       substring(co.consrc FROM '\w+') AS col
  FROM pg_inherits i
  JOIN pg_constraint co ON (co.conrelid = i.inhrelid)
 WHERE co.contype = 'c'

Here, we’re making use of system catalog tables that help PGDB manage table metadata. First comes pg_inherits for information on child partitions, since they’re extremely likely to inherit from some base table as suggested by the partition documentation. Next, we incorporate information from pg_constraint so we know the name of each check constraint (contype of ‘c’) to modify.

Regarding the SELECT block itself, there is admittedly some magic going on here. The REGCLASS type serves a dual purpose in PGDB. For one, it is compatible with the OID object identifier type used extensively in the catalog tables. And second, when cast to TEXT, it outputs the schema and object name it represents, based on the current namespace. This means that, no matter where we are in the database, we will get a full substitution of the object—wherever it lives.

In that same block, we also abuse the consrc field to obtain the name of the column used in the constraint. There’s probably a more official way to get this, but as it turns out, the \w wildcard will match any word character. By globbing with +, we essentially grab the first series of word characters in the check. It might not work with other check constraints, but date-based partition rules generally only have an upper and lower bound. For these, the first match gives us the column name, and we don’t care about the rest.

Within the loop itself, things are a bit more straight-forward. After a bit of variable juggling, we start by dropping the old check. It was malformed, so good riddance. Then we build the new constraint based on our desired start and end dates. Note the use of quote_literal here. By using this function, the date variables are converted to text and quoted as static values. The end result is a query like this:

ALTER TABLE some_child_201504
  ADD CONSTRAINT ck_insert_date_201504
CHECK (insert_date >= '2015-04-01' AND
       insert_date < '2015-05-01')

Because these static text values do not match the column type, PGDB will automatically cast them in the physical constraint it actually creates. This prevents the check type mismatches we wrestled with in the last article.

So ends this example of fixing some broken DDL with an ad-hoc anonymous block. In the past, it was fairly common for DBAs to write queries using concatenation to write the DDL commands in the SELECT section of the query. Then we would direct that output to a script, and execute it separately. In this particular case, we would need two scripts: one to drop the constraints, and another to re-create them. That approach is certainly an option for those still uncomfortable working with anonymous blocks or EXECUTE statements.

In the end, I always encourage exploring capabilities to their full extent. Dig into Server Programming documentation if you really want to learn more.


Tags: , ,

PG Phriday: Functions and Addressing JSON Data

April 10th, 2015 | Published in Database, Tech Talk | 6 Comments


Fairly recently, a friend of mine presented a problem he wanted to solve with some JSON he had in a table. After he presented the end result he was trying to reach, I made the assumption that this would be pretty easy to do. But then I looked at the JSON Functions to try and find that quick fix. Though I read extensively and used rather liberal interpretations of the functions, there’s no way to directly manipulate JSON object contents with PostgreSQL.

Wait! Before you start yelling at me for being an idiot, I know what you’re going to say. I thought the same thing… at first. Go ahead and look, though. As of PostgreSQL 9.4, there is no built-in functionality to add or remove JSON elements without one or more intermediate transformation steps through PostgreSQL arrays or records. But that isn’t necessarily unexpected. Why?

Because PostgreSQL is a database. Its primary purpose is to store data and subsequently extract and view it. From this perspective, there’s no reason for PostgreSQL to have an entire library of JSON modification functions or operators. Regardless of this however, actions such as data merges and bulk updates still need to be possible. Yet all other fields allow a single update statement to append information, or otherwise perform a native calculation to replace the value in-line. There must be a way to do this with JSON too, without jumping through too many burning hoops.

Luckily there is, but it does require some preliminary work. Let’s start with a simple JSON document, as seen by PostgreSQL:

SELECT '{"Hairy": true, "Smelly": false}'::JSON;

               json
----------------------------------
 {"Hairy": true, "Smelly": false}

Ok. Now, how would I add an attribute named “Wobbly”? Well, I could pull the data into an external application, add it, and store the result. But suppose this was in a table of millions of records? That’s probably the least efficient way to modify them. This could be parallelized to a certain extent, but that requires a lot of scaffolding code and is way too much work for something so simple.

Instead, let’s create a function to do it for us. We’ve already established that PostgreSQL JSON manipulation is extremely limited, so what other option do we have? Here’s a python function:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$

    if not key:
        return data

    from json import loads, dumps
    js = loads(data)
    js[key] = loads(value)
    return dumps(js)

$$ LANGUAGE plpythonu;

Now we could add the field with ease:

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                    json_update                    
---------------------------------------------------
 {"Hairy": true, "Smelly": false, "Wobbly": false}

And if we want to get really fancy, there’s always PLV8:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$
    if (key)
        data[key] = value;
    return data;

$$ LANGUAGE plv8;

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                 json_update                  
----------------------------------------------
 {"Hairy":true,"Smelly":false,"Wobbly":false}

Though with PLV8, there are a couple of relatively minor caveats.

  1. PLV8 doesn’t work with JSONB yet, which is why all of these examples are in JSON instead.
  2. You might notice that it stripped all the extraneous whitespace, which may not be desirable.

Either way, both of these variants do something that PostgreSQL can’t do on its own. This is one of the reasons PostgreSQL is my favorite database; it’s so easy to extend and enhance.

Just as a thought experiment, which of these functional variants is faster? I didn’t use the IMMUTABLE or STRICT decorators, so it would be easy to run a loop of a few thousand iterations and see what the final run-time is. Here’s a modification of the test query:

EXPLAIN ANALYZE
SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON)
  FROM generate_series(1, 100000);

On my test VM, the python function took around four seconds, while the PLV8 version only needed a second and a half. Clearly PLV8’s native handling of its own datatype helps here, and python having to repeatedly import the json library hurts its own execution. By adding IMMUTABLE, both fly through all 100-thousand iterations in less than 200ms.

Don’t be afraid to stray from SQL when using PostgreSQL. In fact, this might be a good case for thinking about PostgreSQL in an entirely different light. I might start calling it PGDB from now on, simply to honor its roots and its primary functionality. SQL is no longer the Alpha and Omega when it comes to its capabilities these days. So I feel it’s only right to adapt along with it.

Here’s to the future of PGDB!


Tags: , , , ,

Comments Working Again

March 30th, 2015 | Published in Tech Talk | 1 Comment


This is what happens when you don’t have a QA department. :p

Sorry everyone. I upgraded WordPress and a bunch of plugins a few weeks back, and didn’t realize the reCAPTCHA plugin changed providers, and has been marking all comments as spam since then. I went through the spam backlog and recovered anything obvious. Though just to make sure, I’m going to dig through any tertiary settings and make sure legit email addresses haven’t been identified as spam sources. Just so I don’t have to worry about approving every comment, even from pre-approved submitters.

I know this affects, at most, two or three people that actually visit this site occasionally, but now those people are happy. Or at least less annoyed. :)


Tags: , ,

PG Phriday: High Availability Through Delayed Replication

March 27th, 2015 | Published in Database, Tech Talk | 6 Comments


High availability of PostgreSQL databases is incredibly important to me. You might even say it’s a special interest of mine. It’s one reason I’m both excited and saddened by a feature introduced in 9.4. I’m Excited because it’s a feature I plan to make extensive use of, and saddened because it has flown under the radar thus far. It’s not even listed in the What’s new in PostgreSQL 9.4 Wiki page. If they’ll let me, I may have to rectify that.

What is this mysterious change that has me drooling all over my keyboard? The new recovery_min_apply_delay standby server setting. In name and intent, it forces a standby server to delay application of upstream changes. The implications, however, are much, much more important.

Let me tell you a story; it’s not long, actually. A couple years ago, I had to help a client that was using a hilariously over-engineered stack to prevent data loss. I only say that because at first glance, the number of layers and duplicate servers would shock most people, and the expense would finish the job. This was one of my full recommended stacks, plus a few extra bits for the truly paranoid. DRBD-bonded servers, Pacemaker failover, off-site disaster recovery streaming clones, nightly backup, off-site backup and historical WAL storage, and long-term tape archival in a vault for up to seven years. You would need to firebomb several cities to get rid of this data.

But data permanence and availability are not synonymous. All it took was a single misbehaving CPU to take out the entire constellation of database servers, and corrupt a bunch of recent WAL files for good measure. How this is possible, and how difficult it is to avoid, is a natural extension of using live streaming replicas for availability purposes. We always need to consider one important factor: immediacy applies to everything.

Here’s what actually happened:

  1. A CPU on master-1 went bad.
  2. Data being written to database files was corrupt.
  3. DRBD copied the bad blocks, immediately corrupting master-2.
  4. Shared memory became corrupt.
  5. Streaming replication copied the corrupt data to dr-master-1.
  6. DRBD copied the bad blocks, immediately corrupting dr-master-2.
  7. In turn, PostgreSQL noticed the corruption and crashed on each server.
  8. Monitoring systems started screaming on all channels.

Just like that, a bulletproof high-availability cluster imploded into itself. All we had left at that point were the pristine backups, and the off-site WAL archives. This is one of the major reasons I wrote walctl, actually. Keeping archived WAL files on a tertiary server isolates them from issues that affect the primary or disaster recovery clusters. Further, it means the files can be pulled by any number of new clones without overloading the masters, which are intended to be used for OLTP.

In this case, we pulled a backup from the off-site backup vault, gathered the WAL files that were generated before the CPU went bad, and got the cluster running again in a couple of hours. But this could have easily been much worse, and without the previously-mentioned expensive paranoia and surplus of redundancy levels, it would have. And despite the fact we recovered everything, there’s still the several-hour outage to address.

You see, we weren’t paranoid enough. For a truly high-available architecture, corruption of the data source should always be considered a possibility. Both DRBD and PostgreSQL strive to copy data as quickly as possible, just as they should. Synchronization delay is another huge, but unrelated problem applications often need to circumvent when communicating with replicas. One way to solve this is to keep a third standby server that uses traditional WAL consumption, and then implement a time delay.

Effectively, this means preventing the extra server from processing WAL files for some period of time. This interval allows a DBA to interrupt replay before corruption reaches a fully online replica. It takes time for monitoring systems to report outages, and for the DBA to log into a server and diagnose the problem. As we’ve seen, it can already be too late; the data is already corrupt, and a backup is the only recourse. But a delayed server is already online, can easily be recovered to a point right before the corruption started, and can drastically reduce the duration of an outage.

There are several ways of imposing this delay, and all of them require at least one more series of scripts or software to strictly regulate file availability. They’re also largely irrelevant since the introduction of PostgreSQL 9.4 and the recovery_min_apply_delay setting. Instead of a cron job, or using a complicated script as the restore_command in recovery.conf, or some other method, we just set this variable and we get the desired offset. Here’s a two-hour window:

recovery_min_apply_delay = '2h'

This works with both streaming replication, and more traditional WAL file recovery. There is however, one caveat to using this setting. Since the replica can not apply the changes as they’re presented, they are held in the pg_xlog directory until the imposed purgatory expires. On highly transactional systems, this can result in unexpected space usage on replicas that activate the setting. The larger the safety margin, the more files will accumulate awaiting replay.

Barring that, it’s a huge win for anyone who wants to run a highly available cluster. In fact, it can even be integrated into cluster automation, so a delayed server is stopped if the primary system is down. This keeps our desired window intact while we investigate the problem, without us having to stop troubleshooting and shut down the time-offset replica ourselves.

In addition, a delayed server can be used for standard recovery purposes. If a user erroneously deletes data, or a rogue process drops a critical object, there’s a replica ready and waiting to let us recover the data and reintroduce it to the master server.

Having a server sitting around with self-inflicted synchronization offset seems ridiculous at first glance. But from the perspective of a DBA, it can literally save the database if used properly. I highly recommend anyone who can afford to implement this technique, does so. Your system uptime will thank you.


Tags: , , , ,

PG Phriday: Date Based Partition Constraints

March 20th, 2015 | Published in Database, Tech Talk | 3 Comments


PostgreSQL has provided table partitions for a long time. In fact, one might say it has always had partitioning. The functionality and performance of table inheritance has increased over the years, and there are innumerable arguments for using it, especially for larger tables consisting of hundreds of millions of rows. So I want to discuss a quirk that often catches developers off guard. In fact, it can render partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning documentation. And the pg_partman extension at PGXN follows the standard partitioning model to automate many of the pesky tasks for maintaining several aspects of partitioning. With modules like this, there’s no need to manually manage new partitions, constraint maintenance, or even some aspects of data movement and archival.

However, existing partition sets exist, and not everyone knows about extensions like this, or have developed in-house systems instead. Here’s something I encountered recently:

CREATE TABLE sys_order
(
    order_id     SERIAL       PRIMARY KEY,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE TABLE sys_order_part_201502 ()
       INHERITS (sys_order);

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-02-01'::DATE + INTERVAL '1 mon');

This looks innocuous enough, but PostgreSQL veterans are already shaking their heads. The documentation alludes to how this could be a problem:

Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month changes the right boundary of this range constraint into a dynamic value. PostgreSQL will not use dynamic values in evaluating check constraints. Here’s a query plan from PostgreSQL 9.4.1, which is the most recent release as of this writing:

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Well, it looks like the PostgreSQL planner wants to check both tables, even though the constraint we added to the child does not apply. Now, this isn’t a bug per se, but it might present as somewhat counter-intuitive. Let’s replace the constraint with one that does not use a dynamic value and try again:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-03-01'::DATE);

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Wait a minute… what happened here? There’s no dynamic values; the constraint is a simple pair of static dates. Yet still, PostgreSQL wants to check both tables. Well, this was a trick question of sorts, because the real answer lies in the data types used in the constraint. The TIMESTAMP WITH TIME ZONE type, you see, is not interchangeable with TIMESTAMP. Since the time zone is preserved in this type, the actual time and date can vary depending on how it’s cast.

Watch what happens when we change the constraint to match the column type used for order_dt:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::TIMESTAMPTZ AND
             order_dt < '2015-03-01'::TIMESTAMPTZ);

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..0.00 rows=1 width=20)
   ->  Seq Scan on sys_order  ...

Now all of the types will be directly compatible, removing any possibility of time zones being cast to a different date than the constraint uses. This is an extremely subtle type mismatch, as many developers and DBAs alike, consider these types as interchangeable. This is further complicated by the fact DATE seems to be the best type to use for the constraint, since time isn’t relevant to the desired boundaries.

It’s important to understand that even experienced developers and DBAs can get types wrong. This is especially true when including information like the time zone appears completely innocent. In fact, it’s the default PostgreSQL datetime type for a very good reason: time zones change. Without the time zone, data in the column is bound to the time zone wherever the server is running. That this applies to dates as well, can come as a bit of a surprise.

The lesson here is to always watch your types. PostgreSQL removed a lot of automatic casting in 8.3, and received no small amount of backlash for doing so. However, we can see how subtly incompatible types can cause major issues down the line. In the case of partitioning, a type mismatch can be the difference between reading 10-thousand rows, or 10-billion.


Tags: , ,

« Older Posts

Newer Posts »