PG Phriday: Growing Pains

April 22nd, 2016 | Published in Database, News, Tech Talk | 1 Comment

Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well. Performance tests showed huge improvements, initial deployments uncovered no outright incompatibilities, and the conversion was underway.

Then we started to use it.

Much to our chagrin, dev and test environments aren’t always indicative of real scenarios and utilization patterns. Once Postgres-XL reached our semi-production environment, everything started to go wrong. Here’s a list of what happened, and why our 43TB (and growing) database can’t use Postgres-XL until it matures a bit more. As a note, all of the examples listed here were observed using the latest 9.5r1 release.

I also want to strongly emphasize that this is not an indictment of Postgres-XL. Postgres greatly needs the capabilities it adds to truly break into the VLDB market. Sometimes though, projects aren’t as capable as we originally hoped, and subtle issues aren’t always readily apparent until software is field-tested. We’re still not ruling it out.

That said, on to the adventure.

Trigger Support

In short, there isn’t any. This at least, was something we knew about. In our case, it wasn’t a problem because a VLDB installation is commonly a warehouse of some description. These tend to depend on ETL or other source scripts to supply data that’s already prepared and doesn’t rely on post-insert transformations.

For anyone else, this could be a roadblock. Trigger support isn’t always essential to running an active database, but they’re common enough that a conversion process will have to contend with replacing them. This missing feature practically relegates Postgres-XL to warehouse use all by itself. Indeed, like us, many may consider triggers unnecessary in a horizontally distributed database or NoSQL databases would have never garnered so much attention.

But it is a missing feature a Postgres user would expect. This is the first lesson that Postgres-XL is a variant of Postgres with similar functionality, but it isn’t Postgres.


Again, there isn’t any. This is one of those things we only found after scripts started failing. Savepoints allow transactions to retain integrity through several phases if the state and progress are verified. If something unexpected happens, the whole transaction doesn’t have to be abandoned outright. Clever database developers use this as something of a state machine to make multiple attempts, or perform alternate actions in case of failures within a transaction.

Perhaps due to internal transaction implementation details, Postgres-XL can’t retain this functionality. Distributing transactions and involving a transaction manager while also maintaining a state machine within it, is definitely not a trivial problem. That makes this a completely understandable omission, and since the feature is rarely utilized, many won’t even notice it’s missing.

Yet it was something we had to route around. Another thing to keep in mind when converting to Postgres-XL.

Sub-queries and DISTINCT ON Woes

This one caught us totally off-guard. The documentation on SELECT doesn’t seem to mention this, but it is indeed an issue with Postgres-XL. Observe:

CREATE TABLE bar (id INT, nid INT);
SELECT FROM generate_series(1,100) a(id);
  FROM foo f1
  JOIN foo f2 USING (id)
ERROR:  Postgres-XL does NOT currently support ORDER BY IN subqueries

But is that error accurate? If we remove the INSERT preamble so the query executes on its own, we merely see the expected output of rows. So even if the DISTINCT ON is being internally transformed into a sub-query, it isn’t doing so consistently. The error is wrong anyway, because this works just fine:

  FROM foo a

Not only is that an ordered sub-query, it’s a JOIN on an ordered sub-query, an arguably more complex scenario. I can only assume this was missing functionality, but in implementing the necessary code changes to remove the warning, they missed a spot.

In any case, this really threw a wrench into our functions that acted as ETL components. We took the error at face value, and groaned at trying to rewrite DISTINCT ON without using sub-queries or ORDER BY. Ironically an even more convoluted query came to our rescue:

WITH sub AS (
    FROM foo f1
    JOIN foo f2 USING (id)

This works because CTEs instantiate the results as a temporary table, which doesn’t trigger the error. Then the INSERT proceeds normally. Huh. Well, that’s a bit odd, but easily circumvented.

Then our luck ran out.


Now we’re going from missing features and wonky implementations to outright bugs. The next issue that reared its ugly head was related to temporary tables. Well, maybe. We started seeing this in the logs:

ERROR:  could not open relation with OID 0

A quick Google led to this thread, which seemed to indicate a problem with temporary tables in stored procedures called two times consecutively within a session. Unfortunately the test cases provided in the thread do not replicate the problem on our installation. Regardless, there was a good chance it was related, so we removed temp tables anyway. Our solution leveraged UNLOGGED tables instead, because they have stable OIDs and were much less likely to invoke whatever edge case we were triggering. So our functions turned into this:

-- Functions like this caused the OID 0 errors.
  SELECT FROM generate_series(1,100) a(id);
LANGUAGE plpgsql;
-- Using UNLOGGED tables and functions like this worked fine.
-- Of course, now concurrent function calls won't work.
  SELECT FROM generate_series(1,100) a(id);
LANGUAGE plpgsql;

And what would you know? Success! It wasn’t ideal, but we could limp along with the workaround until they fixed the bug. Or so we thought.

We Need ACID

Probably the most amusing part of this whole adventure is that the errors we kept encountering were almost immediately consecutive. Once we worked around one issue, we merely cleared the way for another completely unrelated problem. This time, we saw this error in the logs:

ERROR:  catalog is missing 25 attribute(s) for relid 36564

That’s not encouraging. Back to Google we went, giving us another thread detailing the problem. This time, the issue seems related to autovacuum somehow. It’s also probable that repeatedly truncating UNLOGGED tables wrought havoc on the Postgres system catalog, and Postgres-XL wasn’t properly propagating something. The theory at the end of the thread is unfortunately incorrect. Disabling autovacuum only reduces the bug’s prevalence, as the original reporter noted when I asked how to correct (or remove) the broken tables.

Regardless of the underlying cause, this was the final straw. Once those errors showed up, that meant the system catalog was corrupt. The UNLOGGED tables used by our functions were completely invalid, and worse, we couldn’t even drop them to start over. The only way out is a full dump/restore, a rather drastic measure because some ephemeral table became an actual Ghost in the Shell. And the real worry here—that the missing attributes would affect a real table—became the overriding concern.

Because the D in ACID stands for Duability, and this bug illustrates that the current incarnation of Postgres-XL has none. Database metadata is sacrosanct, because without it, all that remains is indecipherable binary blobs taking up disk space. There’s no way I’d dump 40TB of data on an installation, when history suggests some of that data could mysteriously go missing.

The End of the Road

And that’s the real misfortune. No matter how much we desperately wanted to deploy Postgres-XL into our warehouse, we literally can’t. Doing so at this point would be irresponsible and dangerous. Some of the problems we encountered are understandable and workarounds exist for most. But the outright deluge of roaches in our sundae, with a rotten cherry on top, is impossible to ignore.

I have every confidence in 2ndQuadrant eventually resolving the worst roadblocks. Before Postgres-XC became Postgres-XL, it practically languished in Development Limbo, accumulating code smell and copious edge cases from managing node interactions. I’ll continue installing every subsequent release in our dev and QA environments until there’s a legitimate reliability breakthrough. I still maintain that Postgres-XL is the way forward for highly distributed horizontal VLDB installations.

Just… not yet.

Tags: , , , ,

PG Phriday: Postgres Password Practices

April 15th, 2016 | Published in Database, Tech Talk | 2 Comments

Connecting to a Postgres database can be a headache for end users and DBAs alike. Not because of the work involved, but the general irritation of managing passwords—an issue common to any system that requires authentication controls.

The user wants to say, “Who am I? None of your business!”

While the database is more comfortable with, “Go away, before I taunt you a second time!”

Well, there’s some middle ground everyone can enjoy, and a few Postgres-specific caveats which add a bit of flavor to the whole interaction.

Let’s start with the pg_hba.conf configuration file that coordinates authentication for Postgres. The first format I recommend for most initial setups relies on OS authentication and looks like this:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer

The peer method keeps Postgres out of the loop entirely. The assumption here is that the OS already verified the user, so if they connect as themselves, there’s no reason for further interrogation. Unfortunately, this only works with local connections made from the physical server where Postgres is running. Aside from local administrative scripts, it’s extremely unlikely for users to authenticate this way.

So what do we do for TCP/IP connections made from desktops using a tool like pgAdmin, or applications interacting with the database from multiple remote servers? The easiest and probably most common approach is to simply use MD5 authentication against a user tracked within Postgres itself. That process works for any user created like this:

-- Make a group to manage localacct accounts.
CREATE ROLE localacct;
-- Create the user directly.
CREATE USER some_dude WITH PASSWORD 'foobar';
GRANT localacct TO some_dude;
-- Create blank user, then assign a password.
-- Omit the second part for a user with no password.
CREATE USER some_dude;
ALTER USER some_dude WITH PASSWORD 'foobar';
GRANT localacct TO some_dude;

When a user has a local Postgres-managed password, it’s easy to verify through a direct connection. It’s just a quick encrypted hash check, and the connection can be rejected or accepted in short order. In order for this to work, we also need to modify the pg_hba.conf file and tell Postgres to reload the configuration files. In the end, our auth config should look like this:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer
host      all    +localacct    md5

By specifying +localacct as the user, we’re telling Postgres that any user in the localacct group should authenticate using MD5. This trick is mentioned in passing within the documentation, but I haven’t seen it used very often in the field. The major benefit it provides, is the ability to control authentication using SQL statuments in the future instead of further modifications to the pg_hba.conf file itself. I went into more detail about this approach a couple of years ago.

Notice also that we limited the connections to the 10.* address mask. The reason for this is to reduce potential connection vectors. If at all possible, always limit connections to, at the very least, the network local to your corporate firewall or local organization. Exposing Postgres directly to the internet is an extremely bad idea. Databases primarily act as a storage and retrieval location for data, much like a filesystem. It makes sense for applications, local users, and GUIs, but why would we expose a raw SQL interface to the world? Beyond that, there are inherent risks we’ll discuss a bit later.

In any case, that covers direct local connections and authenticated users. Another common use case is corporate authentication through some kind of shared password management like Active Directory, Samba, LDAP, or something else. Employees that need database access could have a second password managed by the DBA, or we could just pass the buck and let shared authentication handle it. A very common way to do this is through PAM. Companies with an infrastructure team probably have PAM set up to verify local UNIX access through the primary auth server.

As it turns out, Postgres can leverage that stack very easily. Let’s make another modification to our auth config file:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer
host      all    +localacct    md5
host      all    all     pam

While PAM is only one of the methods supported by Postgres, it’s also one of the easiest to leverage regarding delegation. Like the peer method, Postgres trusts whatever response PAM returns regarding the password a user or application supplied. There is however, one giant caveat that necessarily follows the fact PAM must receive the password unmolested: it’s transferred via plain text.

When Postgres uses md5 authentication, the internal connection libraries understand the encryption method Postgres supports. Because of this, the user’s password can be encrypted before it’s ever transferred over the network, and thus it is never exposed. Postgres checks the password hash in its local catalog, and that’s the end of the story. With PAM, the plaintext password must be transmitted over the network connection so PAM can use its own verification process that is completely unknown to Postgres.

This is probably fine over a corporate subnet, but it’s still probably a very good idea to use SSL connections. This at least encrypts the communication protocol itself, so password exposure is greatly reduced. Setting up a bunch of keys everywhere is somewhat inconvenient, but at least can be automated to a degree using configuration management and desktop automation tools.

Given that caveat, why even bother? Well, MD5 is horribly broken. If the Postgres password hashes are ever stolen, MD5 is trivial to crack, and most passwords will be exposed in short order. That in turn, gives an attacker application or superuser-level direct access to potentially sensitive data. There are solutions to this of course, but implementing them would require modifying the Postgres client libraries.

Unfortunately Postgres’ extension system does not apply to the authentication process for reasons stated earlier. The Postgres client library must understand the candidate method so passwords can be encrypted before hashes are transmitted and verified; extensions only work within the database engine itself. As a result, Postgres can’t use bcrypt, scrypt, pbkdf2, or any enhanced hashing system for authentication purposes. That means it remains a potential attack vector if exposed to the public internet.

There’s something else we can do to prevent password exposure, too. Postgres maintains a very handy pgpass configuration file that exists on the client side. When properly set up, users no longer have to supply passwords manually. These files are secured to local access to the user, and are very helpful for automation purposes for application passwords. When using .pgpass, passwords won’t be in application configuration files, and thus won’t accidentally be committed to source management tools like git, svn, or mercurial.

The configuration like for our some_dude user might look like this:

# hostname:port:database:username:password

The application could be installed on the app_engine user account on a UNIX system, but would still be able to connect as some_dude without supplying a password during the actual connection process. This Python snippet, for example would work just fine, even though we set up Postgres to expect an md5 password:

import psycopg2
conn = psycopg2.connect(database='cool_app', user = 'some_dude')

This also allows us to use a configuration management tool to broadcast the .pgpass file to multiple servers as a configuration element. So larger organizations could deploy an application to any number of servers, and passwords would never be stored in source control. Password changes are as simple as syncing the new .pgpass file to all applicable servers. We can take the PAM auth idea even further and create managed application users to remove Postgres from the password loop entirely. Then, our final pg_hba.conf file would look something like this:

local     all    all                     peer
host      all    all    pam

Once we do that, passwords are entirely controlled by the corporate organization itself. Standard password rotations and configuration changes are entirely invisible to Postgres, end-users, and even applications. Nobody except a handful of administrators has access to any password but their own. This indemnifies everyone, even the DBA, from leaking passwords because the automated layer is the only thing with full access to everything. Individuals only know their own passwords, applications follow the same model, and never the two shall meet.

No matter what approach we use, probably the most important thing is that Postgres be behind a firewall of some kind. No matter what kind of front-end application or GUI presents that data to the world, the Postgres port (5432 by default) should be under lock and key. Even should we ignore the inherent insecurity of MD5 as an encryption method in the modern age, PAM or some other method could expose passwords if SSL connections are not strictly observed.

This is the main reason my examples show 10.* network addresses in the examples. Doing so implies only private network access to Postgres from other systems on that same private network. This is in reality, a minimum level. In practice we should go much further, such as isolating production systems from development or QA systems that might run destructive tests or unverified code. Misconfigured systems in these environments might not even realize they’re operating on production data, given how IP addresses and virtual IPs can change, and the ease of copy-and-paste mishaps.

And if a developer ever needs application access to fix a bug even though nobody knows the password? Easy:

GRANT some_app_read TO some_dude;

At most, this should be done in a production context for a limited interval. In dev or QA environments, maybe it makes sense to expose test application data to everyone, but production should be as locked down as possible. No passwords are exchanged here, and the developer still has all necessary components to track down possible issues and obtain test cases for verification and automated test frameworks in future application iterations.

In the end, as long as us DBAs and users are cognizant of the risks, and utilize the tools above, we can access our Postgres database with relative ease. If taken to its logical conclusion, we don’t even need to really worry about tracking which password goes with which thing; that’s all automatic. It’s not quite a world without passwords, but it’s one without typing passwords, and that’s close enough for me.

Tags: , , ,

PG Phriday: JOIN the Club

April 8th, 2016 | Published in Database, Tech Talk | No Comments

I’ve been going on and on about esoteric Postgres features for so long, sometimes I forget my own neophyte beginnings. So let’s back way off and talk about how JOINs work. What are they? What do they do? Why would you use one? What can go wrong? It’s easier than you might imagine, but there are definitely some subtleties to account for.

To make this more relatable, let’s start with a club of pet owners that likes to keep track of everything:

  type_id    SERIAL PRIMARY KEY,
  type_name  VARCHAR NOT NULL
CREATE TABLE pet_owner
  owner_id    SERIAL PRIMARY KEY,
  owner_name  VARCHAR NOT NULL
CREATE TABLE pet_record
  record_id  SERIAL PRIMARY KEY,
  type_id    INT NOT NULL REFERENCES pet_type (type_id),
  owner_id   INT NOT NULL REFERENCES pet_owner (owner_id),
  pet_name   VARCHAR NOT NULL
INSERT INTO pet_type (type_name)
VALUES ('Cat'), ('Dog'), ('Horse');
INSERT INTO pet_owner (owner_name)
VALUES ('Catherine L. Ady'), ('Peter Venkman'), ('Wilbur Post');
INSERT INTO pet_record (type_id, owner_id, pet_name)
VALUES (1, 1, 'Fluffy'), (1, 1, 'Mr. Whiskers'),
       (1, 2, 'Garfield'), (2, 2, 'Spinee'),
       (3, 3, 'Mr. Ed');

Now that we have some data, we want to retrieve it in a way that’s relevant and descriptive. I could go on a long and exhausting description regarding JOIN theory, but those already exist and probably do a better job anyway. Instead, I’ll just say there are two types of JOIN: INNER and OUTER.

INNER JOINs are the default in Postgres. Because it’s the default, we don’t actually need to use the INNER keyword. Legacy systems don’t always allow this, so some ORMs and end users still use INNER JOIN explicitly, but it isn’t necessary or even encouraged with Postgres. The INNER keyword simply means that rows must exist in both tables to appear in the results.

For example, let’s list all pets and their pet type:

SELECT p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t ON (t.type_id = p.type_id);
   pet_name   | type_name 
 Fluffy       | Cat
 Mr. Whiskers | Cat
 Garfield     | Cat
 Spinee       | Dog
 Mr. Ed       | Horse

This is the least complicated type of JOIN there is. In this particular case, the type_id column exists in both pet_type and pet_record. We can use that information to join the tables together so they resemble a single table. From that pool of columns, we can pick and choose the ones that interest us in the SELECT section.

Postgres also supports some shorthand syntax that simplifies the process:

SELECT p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t USING (type_id);

The primary caveat to USING is that the JOIN columns aren’t explicitly listed. If there are multiple tables with similarly named columns involved in a JOIN, Postgres might complain about ambiguity, and then we’d have to revert to the regular syntax. Still, it’s a nice shortcut for most queries.

If that’s an INNER JOIN, what then is an OUTER JOIN? This is a bit more difficult to visualize, but the logic behind it is this: if a row from one table does not exist in the other, represent the missing data as NULL. In other words, since we’re combining two tables to be a larger, virtual table, don’t remove records simply because they’re missing.

As an example, let’s add another pet type and then count how many pets of each type exist. First with a regular JOIN, and then with an OUTER JOIN:

INSERT INTO pet_type (type_name) VALUES ('Cow');
-- First with a standard JOIN
SELECT t.type_name, COUNT(p.record_id) AS total
  FROM pet_type t
  JOIN pet_record p USING (type_id)
 GROUP BY t.type_name;
 type_name | total 
 Cat       |     3
 Horse     |     1
 Dog       |     1
-- Now use an OUTER JOIN
SELECT t.type_name, COUNT(p.record_id) AS total
  FROM pet_type t
  LEFT JOIN pet_record p USING (type_id)
 GROUP BY t.type_name;
 type_name | total 
 Cow       |     0
 Cat       |     3
 Horse     |     1
 Dog       |     1
-- Here's the "table" the OUTER JOIN made
SELECT t.*, p.*
  FROM pet_type t
  LEFT JOIN pet_record p USING (type_id);
 type_id | type_name | record_id | type_id | owner_id |   pet_name   
       1 | Cat       |         2 |       1 |        1 | Fluffy
       1 | Cat       |         3 |       1 |        1 | Mr. Whiskers
       1 | Cat       |         4 |       1 |        2 | Garfield
       2 | Dog       |         5 |       2 |        2 | Spinee
       3 | Horse     |         6 |       3 |        3 | Mr. Ed
       4 | Cow       |           |         |          |

The idea here is that nobody owns a cow, but we still want the pet type listed so we can see that nobody owns a cow. Otherwise, we might not even consider that owning a cow is even an option. But we didn’t even use the OUTER keyword! What’s with this LEFT nonsense? It just so happens that, like INNER, the OUTER keyword is also optional in Postgres. What actually makes a JOIN an OUTER JOIN are the LEFT, RIGHT, and FULL keywords.

Looking at the virtual table produced by the JOIN itself, it’s clear that we only have information supplied by the table that was on the left side of the join. Since this is an OUTER join type, Postgres substituted a bunch of blank data for the missing row in the pet_record table on the right, since its data was optional. If we wanted to represent the same query as a RIGHT join, we just need to move things around a bit:

SELECT t.*, p.*
  FROM pet_record p
  RIGHT JOIN pet_type t USING (type_id);

Now the records on the right are the source of the JOIN and those on the left are represented by NULL if they don’t exist. Otherwise, the output is exactly the same as before. Because of this, it’s fairly uncommon to encounter RIGHT JOIN in the wild. Western countries use writing systems that read left-to-right, so it’s likely joins will also follow that trend.

In that case, how is FULL different? Combine the idea of LEFT and RIGHT, and you get FULL. Essentially when joining two tables with FULL, rows that don’t exist on the right are represented by NULL, and rows that don’t exist on the left are NULL as well. This kind of output makes sense when combining reports or other aggregated output, since some summaries won’t have an equivalent on either side, but we still want the data.

In instances like our relational diagram, it’s not possible to observe the result of a FULL JOIN on the IDs because at least one ID always exists. But… we can fake it by joining on a column they don’t share, and never matches:

SELECT t.*, p.*
  FROM pet_record p
  FULL JOIN pet_type t ON (p.pet_name = t.type_name);
 type_id | type_name | record_id | type_id | owner_id |   pet_name   
       1 | Cat       |           |         |          | 
       4 | Cow       |           |         |          | 
       2 | Dog       |           |         |          | 
         |           |         2 |       1 |        1 | Fluffy
         |           |         4 |       1 |        2 | Garfield
       3 | Horse     |           |         |          | 
         |           |         6 |       3 |        3 | Mr. Ed
         |           |         3 |       1 |        1 | Mr. Whiskers
         |           |         5 |       2 |        2 | Spinee

It’s like we just haphazardly glued the tables together into one giant table. But examining the output itself, the NULL values always occur when there’s no match for the column(s) we used for the join itself. Thus FULL is operating exactly as advertised.

OUTER JOIN is normally used for optional relationships, like in our example where there might be a type of pet nobody owns. It can also be used to find club members that currently own no pets. The point here is that a regular JOIN would exclude these situations and potentially produce erroneous data in a report. OUTER JOIN fills a very defined role, and even though it’s not as dominant, it’s indispensable.

As a final note on JOINs, there is one easily made mistake that generally results in something of a disaster. Imagine we wanted to know who owned each pet, and we rush the query with some cut-and-paste magic. This is what we find:

SELECT o.owner_name, p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t ON (t.type_id = p.type_id)
  JOIN pet_owner o ON (t.type_id = p.type_id);
    owner_name    |   pet_name   | type_name 
 Catherine L. Ady | Fluffy       | Cat
 Catherine L. Ady | Mr. Whiskers | Cat
 Catherine L. Ady | Garfield     | Cat
 Catherine L. Ady | Spinee       | Dog
 Catherine L. Ady | Mr. Ed       | Horse
 Peter Venkman    | Fluffy       | Cat
 Peter Venkman    | Mr. Whiskers | Cat
 Peter Venkman    | Garfield     | Cat
 Peter Venkman    | Spinee       | Dog
 Peter Venkman    | Mr. Ed       | Horse
 Wilbur Post      | Fluffy       | Cat
 Wilbur Post      | Mr. Whiskers | Cat
 Wilbur Post      | Garfield     | Cat
 Wilbur Post      | Spinee       | Dog
 Wilbur Post      | Mr. Ed       | Horse

Oh no! There’s no way every pet is owned by every person! What we have here is a Cartesian Product, and is the bane of databases everywhere. In our case, it happened because we joined the pet_owner table on type_id instead of owner_id. Since the join clause can be anything, Postgres dutifully joined every row in the owner table to every row in the record table. This operation is multiplicative, and so the complexity grows exponentially based on the size of the source tables.

We got off lucky because our tables are small example sets. Imagine what would have happened if we were joining two million-row tables! Producing, transmitting, and receiving one trillion rows instead of a million might just cause some problems. Even small results of a couple thousand rows can be amplified tremendously into millions of results due to a malformed JOIN. However, this example actually provides some valuable insight about what’s going on during the join itself.

Consider the overzealous output again. The rows we want are there, it’s just that the invalid ones weren’t pruned before they were returned to us. There are clearly some efficiency models in place to avoid exhaustive combinations, but this really is how joins work. They combine every row from one table with every row in another table, and weed out the results that don’t match the JOIN criteria. Because of that, the output can be anything from a boring and expected list of pet owners and their pets, or a nonsensical combination of everything. It’s all in the clauses.

Here’s what would have happened if we didn’t suffer from cut-and-paste-itis:

SELECT o.owner_name, p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t ON (t.type_id = p.type_id)
  JOIN pet_owner o ON (o.owner_id = p.owner_id);
    owner_name    |   pet_name   | type_name 
 Catherine L. Ady | Fluffy       | Cat
 Catherine L. Ady | Mr. Whiskers | Cat
 Peter Venkman    | Garfield     | Cat
 Peter Venkman    | Spinee       | Dog
 Wilbur Post      | Mr. Ed       | Horse

Much better. This is one of the main reasons databases like Postgres are so versatile. Not only will they give you exactly what you ask for, but the diversity of the results covers much of what is possible through Set Theory. The mathematics of Set Theory are justifiably impenetrable to someone without a degree in Mathematics—and even some that do—yet the power lurking there is undeniable. For everyone else, the basics are more than sufficient.

That kind of adaptability is a double-edged sword to be sure, but knowing the pitfalls is the best way to avoid them. Postgres is one of those things that’s easy to use and hard to master precisely because of the Math it represents. Now that you have some of the tools necessary to use Postgres, feel free to try and master it.

I still haven’t, but that doesn’t mean I won’t stop trying.

Tags: , ,

PG Phriday: 5 Reasons Postgres Sucks! (You Won’t Believe Number 3!)

April 1st, 2016 | Published in Database, News, Tech Talk | 6 Comments

I’ve been a Postgres DBA since 2005. After all that time, I’ve come to a conclusion that I’m embarrassed I didn’t reach much earlier: Postgres is awful. This isn’t a “straw that broke the camel’s back” kind of situation; there is a litany of ridiculous idiocy in the project that’s, frankly, more than enough to stave off any DBA, end user, or developer. But I’ll limit my list to five, because clickbait.

1. Security Schmecurity

Look, every good Postgres DBA knows that the pg_hba.conf file that handles connection security should have these lines:

local  all  all       trust
host   all  all  all  trust

And then the Postgres developers wonder why it’s so easy to exploit. Here I am, Joe T. Hacker, and I can just do this:

psql -U postgres -h production_server secure_database

And boom! I’m in as the superuser, and I can start dropping tables, replacing tables with whatever I want, screw with data, whatever. Sloppy. Just sloppy.

2. Super Users? Super Losers

Oh yeah, and after I connect to steal sensitive data and then clean up after myself, I can’t even do this:

secure_database=# DROP DATABASE secure_database;
ERROR:  cannot DROP the currently OPEN DATABASE
secure_database=# DROP USER postgres;
ERROR:  CURRENT USER cannot be dropped

Ugh! Are you kidding me? This is basic stuff, people. As a superuser, I should be able to circumvent all security, durability, and even internal consistency. That the backend code doesn’t acknowledge what a superuser really is, merely proves that the database owner is just another user. Useless.

Sure, I could use the provided tool and do it from the command line like this:

dropdb -U postgres -h production_server secure_database

But I’m already connected! Oh, or I could switch databases and then do it like this:

secure_database=# \c postgres
secure_database=# DROP DATABASE secure_database;

But if I’ve hacked in, every keystroke counts. It’s not even necessary! They could just queue the action and apply it after I disconnect, right? This kind of amateur work is why I can’t justify being a Postgres DBA anymore.

3. Awkward JOIN Syntax is Awkward

I cut my teeth as a DBA starting in 1999 with Oracle. Now that is a real database engine! One of the really cool things Oracle does is overload the OUTER JOIN syntax with a neat shortcut. Not only is the OUTER decorator optional, but so are LEFT and RIGHT! Let’s compare the only syntax Postgres supports with the snazzy method Oracle provides, and see which one is obviously way better:

-- Postgres (bleh)
SELECT a.col1, b.col2
  FROM a
  LEFT JOIN b ON (b.a_id =;
-- Oracle (Nice!)
SELECT a.col1, b.col2
  FROM a, b
 WHERE = b.a_id (+);

See that (+) bit? Just slap that on any columns that should join two tables, and it will automatically control how the join works. If it’s on the right, it’s a LEFT JOIN, and if it’s on the left, it’s a RIGHT JOIN. See? Totally intuitive. Sure, Oracle can use the same syntax Postgres childishly demands, but who would purposefully do that? Gross.

4. JSON and the Kitchen Sink

Everyone keeps touting JSON support as some huge benefit to Postgres. Even I did it once or twice. But upon pondering it more seriously, where does it end? What’s the next interface method du jour that we’ll be cobbling onto Postgres next?

First it was XML, then JSON, and next will be what, YAML? They might as well, since you can define your own types anyway.

5. Extensions are the Worst Idea Ever

This is easily the worst of the bunch, and follows naturally from the previous point. Ever since Postgres 9.1, it’s been possible to easily write extensions that are effectively bolted directly to the database engine. Any random schmo can waltz up, whip up some code in basically any language, and it will be able to do any of this. Suddenly Postgres has some “new functionality.”

New functions? New index types? Even override the planner, or fork arbitrary background workers!?

Are you kidding me!? But it gets even worse! There’s a site called PGXN with a related utility for making extensions easy to find and install. I don’t even want to imagine the shenanigans! Any random idiot, including me, could just put an extension there. Just convince a DBA to install it like this:

pgxn install awesome_tool_notanexploitorbotnetiswear

Suddenly they’re pwn3d. Do you want to be pwn3d? I didn’t think so. That’s the kind of risk you run by using Postgres as your primary database engine.


I for one, have seen the light. I can’t believe I was so misled by Postgres for over a decade. How am I even employable at all, having advocated a hacker’s playground for most of my professional career? How can any of us in the Postgres community live with ourselves, knowing we’ve effectively enabled effortless system compromises in the name of functionality?

I guess I’ll go back to using Oracle like the whipped dog I am. It’s what inspired me to be a DBA, and I abandoned it in favor of this travesty. I just pray Larry Ellison will forgive me for my transgressions.

I apologize for everything, and I sincerely hope the rest of the Postgres community joins me in supplication. If our end-users and developers are magnanimous, there might yet be a future for us.

Tags: , , , , ,

PG Phriday: Mining for Metadata

March 18th, 2016 | Published in Database, Tech Talk | No Comments

Every good database engine has a system catalog that describes the myriad of structures that model and preserve our data. Of course this is expected, as it would be somewhat silly for a database system not to use tables to represent its internal mechanisms. But that doesn’t mean they have to be humanly readable, or even make sense without a series of views or esoteric functions to decipher them. The information_schema standard serves a necessary role in that regard, and the Postgres implementation is extremely comprehensive. Yet the regular Postgres catalog is also fairly usable, so let’s explore and see what we find.

At first glance, there is a veritable smorgasbord of catalog tables and views to choose from. So, where to start? Let’s look at something simple and see what tables are available:

  id           SERIAL PRIMARY KEY,
  full_name    TEXT NOT NULL,
  description  TEXT
INSERT INTO picard (full_name, description)
SELECT, 'There are ' || || ' lights!'
  FROM generate_series(1, 1000000) a(id);
ANALYZE picard;
  FROM pg_tables
 WHERE schemaname NOT IN ('information_schema', 'pg_catalog');
-[ RECORD 1 ]--------
schemaname  | public
tablename   | picard
tableowner  | sthomas
tablespace  | 
hasindexes  | t
hasrules    | f
hastriggers | f

This is all pretty basic info, right? Note that we had to remove tables in the information_schema and pg_catalog schemas or we’d have to wade through over 50 other records. Just more proof that Postgres uses its own catalogs to… track its catalogs. What may not be so obvious is that pg_tables is actually a view.

A DBA might find this obvious, given the concept of normalization. For the uninitiated however, database object names might be referred to multiple times through several catalog entries. To avoid using the same text value in multiple places and risking mismatches or modifying the name everywhere should it change, Postgres uses surrogate keys. Again, this approach is extremely common, but makes views a necessity to decode those proxy values.

Let’s use pg_views and examine the definition of pg_tables as well as itself:

SELECT definition
  FROM pg_views
 WHERE viewname = 'pg_tables';
 SELECT n.nspname AS schemaname,
    c.relname AS tablename,
    pg_get_userbyid(c.relowner) AS tableowner,
    t.spcname AS tablespace,
    c.relhasindex AS hasindexes,
    c.relhasrules AS hasrules,
    c.relhastriggers AS hastriggers
   FROM ((pg_class c
     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
     LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
  WHERE (c.relkind = 'r'::"char");
SELECT definition
  FROM pg_views
 WHERE viewname = 'pg_views';
 SELECT n.nspname AS schemaname,
    c.relname AS viewname,
    pg_get_userbyid(c.relowner) AS viewowner,
    pg_get_viewdef(c.oid) AS definition
   FROM (pg_class c
     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
  WHERE (c.relkind = 'v'::"char");

What do we have, here? It appears as if both pg_tables and pg_views are derived from information in pg_class. As it turns out, pg_class is probably the most important catalog table in the system. It’s essentially a reference for nearly every database object available. Since it’s in the middle of a vast constellation of other references, practically every column in the table is a key to another low-level catalog.

Why are views a relkind of ‘v’, while tables are designated ‘r’? Well in Postgres, that ‘r’ stands for ‘relation’, as tables describe relationships across data. Probably. Yet the documentation for pg_class describes all contents as relations, not just tables. Further, the column names in pg_class don’t reflect a consistent association to the table name. Let’s look at pg_namespace, which decodes schema (namespace) names, to see why that’s relevant:

\d pg_namespace
  COLUMN  |   TYPE    | Modifiers 
 nspname  | name      | NOT NULL
 nspowner | oid       | NOT NULL
 nspacl   | aclitem[] | 
    "pg_namespace_nspname_index" UNIQUE, btree (nspname)
    "pg_namespace_oid_index" UNIQUE, btree (oid)

Both pg_tables and pg_views utilize this table to decode pg_class.relnamespace into a text value. Note the ‘nsp’ prefix for all of the column names? That naming scheme is fairly consistent across most of the other catalog tables and views. The pg_proc table that tracks functions (procedures) prefixes everything with ‘pro’, while pg_index uses ‘ind’. From that perspective, pg_class is a rather notable exception.

Though a bit perplexing and lost to the annals of history, one benefit of this abnormality is that it makes pg_class easy to remember. It’s a short name, and since it’s used practically everywhere in the other catalog views, it’s almost synonymous with the catalog itself. Regardless of why, pg_class is of the first places to look for nearly anything not covered by a ‘nicer’ view. A fun example of this is exploiting the relpages and reltuples columns:

SELECT relname AS tablename, relpages*8 AS kb_used,
       reltuples AS approx_rows
  FROM pg_class
 WHERE relname = 'picard';
 tablename | kb_used | approx_rows 
 picard    |   66592 |       1e+06

These two are generally used by the query planner in various calculations to estimate query cost. For our purposes, they provide an excellent summary of the table’s overall impact on the database. We can use sorts to find the table with the most or least rows, the biggest table on disk, combine the two to seek out tables with abnormally large rows, and so on.

The relpages column refers to the amount of database pages the table consumes. In Postgres, this defaults to 8KB, so in multiplying by 8, we can see how much physical space the table uses. In the vast majority of cases, this is more than sufficient and a quick shortcut when doing ad-hoc analysis of tables, but there are cases when an installation is compiled with a different default page size. So to be pedantic, we should repeat the same query using a couple provided decoding functions:

SELECT relname AS tablename,
       pg_relation_size(oid) AS bytes_used,
       pg_total_relation_size(oid) AS total_bytes,
       pg_size_pretty(pg_relation_size(oid)) AS pretty_size,
       pg_size_pretty(pg_total_relation_size(oid)) AS pretty_total,
       reltuples AS approx_rows
  FROM pg_class
 WHERE relname = 'picard';
-[ RECORD 1 ]+---------
tablename    | picard
bytes_used   | 68190208
total_bytes  | 90726400
pretty_size  | 65 MB
pretty_total | 87 MB
approx_rows  | 1e+06

To provide a lowest common denominator, the pg_relation_size function always returns the size of the object in bytes. The pg_size_pretty function just makes that value human readable by transforming it into the number of megabytes, gigabytes, or so on. The reason we used both in the query is to demonstrate the functionality along with the fact we don’t want to use human readable output in calculations.

But what about pg_total_relation_size? This function is a neat trick that combines the table size along with any related content, like indexes, TOAST data, and so on. Normally the only way to get that data would be to join pg_class with itself multiple times with reltoastrelid or pg_index. That query is pretty unwieldy, so a function call is much easier to use. If we had omitted or changed our WHERE clause, we could have divined size and row counts for whole swaths of tables in the database.

And what about pg_index? Its job is to act as a swing table between a table and its indexes as described by pg_class, as well as describe which columns are in each index, and other identifying information. But this particular catalog table is somewhat opaque, using multiple vector columns and trees to describe objects which can contain multiple other objects. It’s usually much easier to use pg_indexes instead, which converts all of that into something human readable. For example:

  FROM pg_indexes
 WHERE tablename = 'picard';
-[ RECORD 1 ]----------------------------------------------------------
schemaname | public
tablename  | picard
indexname  | picard_pkey
tablespace | 
indexdef   | CREATE UNIQUE INDEX picard_pkey ON picard USING btree (id)

There isn’t very much information here, actually. But we do get to use the table name to obtain all indexes without any joins, and we can also see the index definition itself. For more complicated dives, like finding all tables that use a particular column in their indexes, especially if we also want to include composite indexes, we’d have to grit our teeth and use pg_index instead. Though even then, it’s not so bad:

SELECT c.relname AS tablename, i.relname AS indexname
  FROM pg_index x
  JOIN pg_class c ON (c.oid = x.indrelid)
  JOIN pg_class i ON (i.oid = x.indexrelid)
  JOIN pg_attribute a ON (
         a.attrelid = x.indrelid AND a.attnum = ANY (x.indkey)
 WHERE a.attname = 'id';
 tablename |  indexname  
 picard    | picard_pkey

While this is hardly a novice-friendly example of multiple catalog tables, it does demonstrate that inquisitive data mining is possible. There’s a surprising amount of wealth stored in Postgres metadata if you’re brave enough to dig for it. Some masochistic DBAs have even constructed unholy amalgams of several catalog tables to derive potential index candidates, optimal column re-ordering for existing indexes, or even which indexes to remove entirely.

How is that even possible? Well, beyond the basic architecture metadata, Postgres collects statistics it stores in yet more tables and views. We’ll leave using these as an exercise for a future article, because they deserve to be explored in depth. Until then, explore the catalog and see what’s buried in its depths; you might be surprised what you find.

Tags: , , , ,

« Older Posts

Newer Posts »