On PostgreSQL View Dependencies

As many seasoned DBAs might know, there’s one area that PostgreSQL still manages to be highly aggravating. By this, I mean the role views have in mucking up PostgreSQL dependencies. The part that annoys me personally, is that it doesn’t have to be this way.

Take, for example, what happens if you try to modify a VARCHAR column so that the column length is higher. We’re not changing the type, or dropping the column, or anything overly complicated. Yet we’re faced with this message:

ERROR:  cannot alter type of a column used by a view or rule
DETAIL:  rule _RETURN on view v_change_me depends on column "too_short"

Though PostgreSQL tells us which view and column prompted this error, that’s the last favor it provides. The only current way to fix this error is to drop the view, alter the column, then recreate the view. In a production 24/7 environment, this is extremely problematic. The system I work with handles over two-billion queries per day; there’s no way I’m dropping a view that the platform depends on, even in a transaction.

This problem is compounded when views depend on other views. The error doesn’t say so, but I defined another view named v_change_me_too that depends on v_change_me, yet I would never know it by the output PostgreSQL generated. Large production systems can have dozens, or even hundreds of views that depend on complex hierarchies of tables and other views. Yet there’s no built-in way to identify these views, let alone modify them safely.

If you want to follow along, this is the code I used to build my test case:

CREATE TABLE change_me ( too_short VARCHAR(30) );
CREATE VIEW v_change_me AS SELECT * FROM change_me;
CREATE VIEW v_change_me_too AS SELECT * FROM v_change_me;

And here’s the statement I used to try and make the column bigger:

ALTER TABLE change_me ALTER too_short TYPE VARCHAR(50);

It turns out we can solve this for some cases, though it takes a very convoluted path. The first thing we need to do is identify all of the views in the dependency chain. To do this, we need a recursive query. Here’s one that should find all the views in our sample chain, starting with the table itself:

WITH RECURSIVE vlist AS (
    SELECT c.oid::REGCLASS AS view_name
      FROM pg_class c
     WHERE c.relname = 'change_me'
     UNION ALL
    SELECT DISTINCT r.ev_class::REGCLASS AS view_name
      FROM pg_depend d
      JOIN pg_rewrite r ON (r.oid = d.objid)
      JOIN vlist ON (vlist.view_name = d.refobjid)
     WHERE d.refobjsubid != 0
)
SELECT * FROM vlist;

If we execute that query, both v_change_me and v_change_me_too will show up in the results. Keep in mind that in actual production systems, this list can be much longer. For systems that can survive downtime, this list can be passed to pg_dump to obtain all of the view definitions. That will allow a DBA to drop the views, modify the table, then accurately recreate them.

For simple cases where we’re just extending an existing column, we can take advantage of the fact the pg_attribute catalog table allows direct manipulation. In PostgreSQL, TEXT-type columns have a length 4-bytes longer than the column limit. So we simply reuse the recursive query and extend that length:

WITH RECURSIVE vlist AS (
    SELECT c.oid::REGCLASS AS view_name
      FROM pg_class c
     WHERE c.relname = 'change_me'
     UNION ALL
    SELECT DISTINCT r.ev_class::REGCLASS AS view_name
      FROM pg_depend d
      JOIN pg_rewrite r ON (r.oid = d.objid)
      JOIN vlist ON (vlist.view_name = d.refobjid)
     WHERE d.refobjsubid != 0
)
UPDATE pg_attribute a
   SET a.atttypmod = 50 + 4
  FROM vlist
 WHERE a.attrelid = vlist.view_name
   AND a.attname = 'too_short';

Now, this isn’t exactly a perfect solution. If views alias the column name, things get a lot more complicated. We have to modify the recursive query to return both the view name, and the column alias. Unfortunately the pg_depend view always sets the objsubid column to 0 for views. The objsubid column is used to determine which which column corresponds to the aliased column.

Without having this value, it becomes impossible to know what to modify in pg_attribute for the views. In effect, instead of being a doubly-linked list, pg_depend is a singly-linked list we can only follow backwards. So we can discover what the aliases depend on, but not what the aliases are. I can’t really think of any reason this would be set for tables, but not for views.

This means, of course, that large production systems will still need to revert to the DROP -> ALTER -> CREATE route for column changes to dependent views. But why? PostgreSQL knows the entire dependency chain. Why is it impossible to modify these in an atomic transaction context? If I have one hundred views on a table, why do I have to drop all of them before modifying the table? And, again, the type of modification in this example is extremely trivial; we’re not going from a TEXT to an INT, or anything that would require drastically altering the view logic.

For highly available databases, this makes it extremely difficult to use PostgreSQL without some type of short outage. Column modifications, while not common, are a necessary evil. Since it would be silly to recommend never using views, we have to live with downtime imposed by the database software. Now that PostgreSQL is becoming popular in enterprise settings, issues like this are gaining more visibility.

Hopefully this is one of those easy fixes they can patch into 9.5 or 9.6. If not, I can see it hampering adoption.