PG Phriday: Derivation Deluge

April 29th, 2016 | Published in Database, Tech Talk | 8 Comments

Having run into a bit of a snag with Postgres-XL, and not wanting to be dead in the water with our project, I went on a bit of a knowledge quest. Database scaling is hard, so I expected a bunch of either abandoned or proprietary approaches. In addition, as a huge fans of Postgres, compatibility or outright use of the Postgres core was a strict prerequisite.

So, what options are out there? Is there even anything worth further investigation? Maybe more importantly, what do you do when you’re under a bit of a scheduling constraint? Projects need to move forward after all, and regardless of preferences, sometimes concessions are necessary. The first step was obviously the list of databases derived from Postgres.

At first glance, that’s a pretty big list. If we look carefully though, we can see that quite a few of those projects were abandoned years ago. Others are commercial, not based on scalability, or both. Being commercial isn’t automatically a disqualification, but most of the commercial options were forked from practically ancient versions of Postgres and never kept up compatibility, or don’t mention the version at all. Amazon Redshift fits that profile, being based on Postgres 8.0, which few would want to use these days. Fujitsu Enterprise is another, which doesn’t even list which version they’re based on, nor do they provide a download for testing purposes.

What’s left? It’s hard to tell from the Wiki page, so I just started with the projects that include some kind of data scaling not based on replication. These candidates present a longer list than I’d initially anticipated, which is always a good problem to have!

Let’s scrutinize the nominees.


It’s not really a secret that CitusDB and Postgres-XL are both tackling the same problem, and are currently the top two contenders. Unlike Postgres-XL and its approach of extending SQL syntax to directly embrace data distribution, CitusDB is actually just a Postgres extension.

As a result, it’s a bit more janky. There’s no CREATE TABLE ... DISTRIBUTE BY magic. Instead, and like other Postgres extensions, we call functions to populate metadata and control CitusDB operation. Fortunately, the CitusDB documentation is amazingly thorough. Not only are there downloads with cut-and-paste tutorials for evaluation, they also provide Docker containers with fully operational test clusters.

I also have to admit that I originally wrote this off entirely, based on my experience with pg_shard last year. I opened a ticket complaining about missing transaction support, and they never closed it. I didn’t realize that was because pg_shard was merged into the new Citus extension, along with a veritable raft of other modifications and fixes.

My easy tutorial install didn’t exhibit the problems I had with pg_shard, so this warrants deeper testing. I’ve got VMs galore, and itchy scripting fingers.


I’ve known about Greenplum for a long time. There was a lot of excitement when Pivotal announced that they were opening the source. A parallel scaling Postgres? No way!

Well, this comes with a giant caveat. If we look at the dates listed in the Wiki, Greenplum is listed as starting in 2005. They’re not kidding, and unfortunately it seems Pivotal executed a “fork it and forget it” maneuver. The documentation admits Greenplum is based on Postgres 8.2, with elements of functionality from 8.3.

Like Amazon’s Redshift, this immediately disqualifies Greenplum from consideration for anyone using a newer version. Our own databases are on 9.4 pending an upgrade plan; there’s no way we could justify such a massive downgrade, even for horizontal scaling improvements. EnterpriseDB had a similar problem when they started selling their version of 8.3; they were far behind for years before they managed to reduce their version lag by only a few months. Greenplum never even bothered.

This may be an amazing product, but we can’t use it to replace existing Postgres 9.4 databases that need scaling. Will Greenplum catch up now that it’s been open-sourced? I can’t say. It would definitely be cool, but I’m not holding my breath. Incidentally, this is one of the reasons all of those projects on the Wiki have definitive end dates. Keeping up with Postgres after forking is extremely difficult if you don’t merge your enhancements back into core. It’s all too easy to fall hopelessly behind and become nothing but an academic concern.


Hadoop is the… uh, “new” kid on the block regarding big data. It’s designed to leverage multiple systems or VMs to spread storage mining, which would be a great way to address a VLDB system. So in 2009, a university project spawned HadoopDB to turn Postgres into a SQL interface and aggregator for Hadoop.

Sadly, that’s where the story ends. The Postgres Wiki says it’s still active, but for all intents and purposes, it has been abandoned. The quick start guide hasn’t been updated since 2009, and Thomas Koch did a presentation as late as 2011 denoting it as an effective proof of concept, but not much else.

In the end, it’s a really cool “glue” between Hadoop and Postgres. But without updates to enhance the interlinks, speed, efficiency, and bugs, it’s not suitable for a production environment. The project lived on in Hadapt before being acquired by Teradata and renamed to presto. That means there’s some potential to contact Teradata and make an existing Hadoop datastore more user friendly. The job of converting an existing Postgres cluster to a Hadoop equivalent is left as an (onerous) exercise for the user.


The Postgres-X2 project is a bit of a conundrum. Unlike Postgres-XL which is active and backed by 2ndQuadrant, Postgres-X2 seems to be a direct continuation of the abandoned Postgres-XC codebase. As a result, they’re still stuck on Postgres 9.3. Further, they likely have similar issues as we encountered with Postgres-XL, or worse due to the project’s stagnancy. After exploring the github repository, it turns out the last update to any code was two years ago.

Maybe there’s another repository elsewhere, but this project should be considered dead unless they pull a Holy Grail and claim otherwise.


I first encountered Stado back when it was GridSQL, one of many EnterpriseDB projects. It works by abstracting several database instances through a port proxy, distributing writes and reads arbitrarily based on its own internal magic. It uses Postgres as a filesystem of sorts, and connecting to Postgres directly reveals this in greater detail. Object names are the result of hash functions, and even databases are only tangentially linked to the desired given nomenclature.

Stado is all about metadata, and Postgres is its chosen host. Because I had experience with a previous incarnation, I made an exception and considered it undead for testing purposes, even though the Wiki says it died in 2011. It’s just a Java proxy after all, so what could it hurt to see if it still works with recent Postgres releases?

As it turns out, it can hurt a lot. It seems my memory of GridSQL was a little hazy, as what’s going on here isn’t merely a distribution proxy. It’s transformative and extremely restrictive, throwing up errors for “unknown” keywords such as SCHEMA. No schema support means there’s no way we can use it, which is something of a shame. The performance metrics were encouraging back in the day, and the concept it represents is sound.

Consider the PMPP extension, for example. When I looked at it late last year, I loved the simplicity. Take a query, broadcast it to every known Postgres node, and present the results. Wrap the function in another query, and it can be re-aggregated to merge everything together. I was hoping Stado did this automatically, and that was all. Nope. Oh well.

The Winner?

If someone could write something that worked like I thought Stado did, I’d probably kiss them. It would require manipulating the Postgres query planner or a proxy of some kind, but that’s all I really want. Send a query to multiple nodes, let them execute it independently in parallel, keep track of aggregate functions used, and apply them to the appropriate columns in the final result. It seems so simple, but the underlying complexity is clearly more involved.

The thing is, large warehouse databases usually contain data that’s already been in another source. Primary key collisions are unlikely, as some kind of engine (Postgres-XL, ETL, etc.) has already distributed data according to some kind of hash function. I just want a query that can invoke the cluster in parallel. That’s all. Extensions like PMPP do half of the job, but short of rewriting existing applications to leverage it properly, it’s only really usable for new projects.

So I’ll probably be looking into CitusDB a bit more. It seems to work the way I want, and adds shard redundancy as an extra benefit. I’ll put it on some VMs and unceremoniously thrash it after dumping hundreds of GB into its lap and record the ensuing chaos. Hopefully these tests go better than when I subjected pg_shard to the same treatment.

Otherwise, the state of Postgres scaling is incomplete, and there are no projects I know of that will suit our requirements. As a Postgres DBA, I probably try too hard to use it as a hammer on every nail, but it’s just so close as to be immensely frustrating.

Wish me luck!

Tags: , , , , , ,

PG Phriday: Growing Pains

April 22nd, 2016 | Published in Database, News, Tech Talk | 1 Comment

Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well. Performance tests showed huge improvements, initial deployments uncovered no outright incompatibilities, and the conversion was underway.

Then we started to use it.

Much to our chagrin, dev and test environments aren’t always indicative of real scenarios and utilization patterns. Once Postgres-XL reached our semi-production environment, everything started to go wrong. Here’s a list of what happened, and why our 43TB (and growing) database can’t use Postgres-XL until it matures a bit more. As a note, all of the examples listed here were observed using the latest 9.5r1 release.

I also want to strongly emphasize that this is not an indictment of Postgres-XL. Postgres greatly needs the capabilities it adds to truly break into the VLDB market. Sometimes though, projects aren’t as capable as we originally hoped, and subtle issues aren’t always readily apparent until software is field-tested. We’re still not ruling it out.

That said, on to the adventure.

Trigger Support

In short, there isn’t any. This at least, was something we knew about. In our case, it wasn’t a problem because a VLDB installation is commonly a warehouse of some description. These tend to depend on ETL or other source scripts to supply data that’s already prepared and doesn’t rely on post-insert transformations.

For anyone else, this could be a roadblock. Trigger support isn’t always essential to running an active database, but they’re common enough that a conversion process will have to contend with replacing them. This missing feature practically relegates Postgres-XL to warehouse use all by itself. Indeed, like us, many may consider triggers unnecessary in a horizontally distributed database or NoSQL databases would have never garnered so much attention.

But it is a missing feature a Postgres user would expect. This is the first lesson that Postgres-XL is a variant of Postgres with similar functionality, but it isn’t Postgres.


Again, there isn’t any. This is one of those things we only found after scripts started failing. Savepoints allow transactions to retain integrity through several phases if the state and progress are verified. If something unexpected happens, the whole transaction doesn’t have to be abandoned outright. Clever database developers use this as something of a state machine to make multiple attempts, or perform alternate actions in case of failures within a transaction.

Perhaps due to internal transaction implementation details, Postgres-XL can’t retain this functionality. Distributing transactions and involving a transaction manager while also maintaining a state machine within it, is definitely not a trivial problem. That makes this a completely understandable omission, and since the feature is rarely utilized, many won’t even notice it’s missing.

Yet it was something we had to route around. Another thing to keep in mind when converting to Postgres-XL.

Sub-queries and DISTINCT ON Woes

This one caught us totally off-guard. The documentation on SELECT doesn’t seem to mention this, but it is indeed an issue with Postgres-XL. Observe:

CREATE TABLE bar (id INT, nid INT);
SELECT FROM generate_series(1,100) a(id);
  FROM foo f1
  JOIN foo f2 USING (id)
ERROR:  Postgres-XL does NOT currently support ORDER BY IN subqueries

But is that error accurate? If we remove the INSERT preamble so the query executes on its own, we merely see the expected output of rows. So even if the DISTINCT ON is being internally transformed into a sub-query, it isn’t doing so consistently. The error is wrong anyway, because this works just fine:

  FROM foo a

Not only is that an ordered sub-query, it’s a JOIN on an ordered sub-query, an arguably more complex scenario. I can only assume this was missing functionality, but in implementing the necessary code changes to remove the warning, they missed a spot.

In any case, this really threw a wrench into our functions that acted as ETL components. We took the error at face value, and groaned at trying to rewrite DISTINCT ON without using sub-queries or ORDER BY. Ironically an even more convoluted query came to our rescue:

WITH sub AS (
    FROM foo f1
    JOIN foo f2 USING (id)

This works because CTEs instantiate the results as a temporary table, which doesn’t trigger the error. Then the INSERT proceeds normally. Huh. Well, that’s a bit odd, but easily circumvented.

Then our luck ran out.


Now we’re going from missing features and wonky implementations to outright bugs. The next issue that reared its ugly head was related to temporary tables. Well, maybe. We started seeing this in the logs:

ERROR:  could not open relation with OID 0

A quick Google led to this thread, which seemed to indicate a problem with temporary tables in stored procedures called two times consecutively within a session. Unfortunately the test cases provided in the thread do not replicate the problem on our installation. Regardless, there was a good chance it was related, so we removed temp tables anyway. Our solution leveraged UNLOGGED tables instead, because they have stable OIDs and were much less likely to invoke whatever edge case we were triggering. So our functions turned into this:

-- Functions like this caused the OID 0 errors.
  SELECT FROM generate_series(1,100) a(id);
LANGUAGE plpgsql;
-- Using UNLOGGED tables and functions like this worked fine.
-- Of course, now concurrent function calls won't work.
  SELECT FROM generate_series(1,100) a(id);
LANGUAGE plpgsql;

And what would you know? Success! It wasn’t ideal, but we could limp along with the workaround until they fixed the bug. Or so we thought.

We Need ACID

Probably the most amusing part of this whole adventure is that the errors we kept encountering were almost immediately consecutive. Once we worked around one issue, we merely cleared the way for another completely unrelated problem. This time, we saw this error in the logs:

ERROR:  catalog is missing 25 attribute(s) for relid 36564

That’s not encouraging. Back to Google we went, giving us another thread detailing the problem. This time, the issue seems related to autovacuum somehow. It’s also probable that repeatedly truncating UNLOGGED tables wrought havoc on the Postgres system catalog, and Postgres-XL wasn’t properly propagating something. The theory at the end of the thread is unfortunately incorrect. Disabling autovacuum only reduces the bug’s prevalence, as the original reporter noted when I asked how to correct (or remove) the broken tables.

Regardless of the underlying cause, this was the final straw. Once those errors showed up, that meant the system catalog was corrupt. The UNLOGGED tables used by our functions were completely invalid, and worse, we couldn’t even drop them to start over. The only way out is a full dump/restore, a rather drastic measure because some ephemeral table became an actual Ghost in the Shell. And the real worry here—that the missing attributes would affect a real table—became the overriding concern.

Because the D in ACID stands for Duability, and this bug illustrates that the current incarnation of Postgres-XL has none. Database metadata is sacrosanct, because without it, all that remains is indecipherable binary blobs taking up disk space. There’s no way I’d dump 40TB of data on an installation, when history suggests some of that data could mysteriously go missing.

The End of the Road

And that’s the real misfortune. No matter how much we desperately wanted to deploy Postgres-XL into our warehouse, we literally can’t. Doing so at this point would be irresponsible and dangerous. Some of the problems we encountered are understandable and workarounds exist for most. But the outright deluge of roaches in our sundae, with a rotten cherry on top, is impossible to ignore.

I have every confidence in 2ndQuadrant eventually resolving the worst roadblocks. Before Postgres-XC became Postgres-XL, it practically languished in Development Limbo, accumulating code smell and copious edge cases from managing node interactions. I’ll continue installing every subsequent release in our dev and QA environments until there’s a legitimate reliability breakthrough. I still maintain that Postgres-XL is the way forward for highly distributed horizontal VLDB installations.

Just… not yet.

Tags: , , , ,

PG Phriday: Postgres Password Practices

April 15th, 2016 | Published in Database, Tech Talk | 2 Comments

Connecting to a Postgres database can be a headache for end users and DBAs alike. Not because of the work involved, but the general irritation of managing passwords—an issue common to any system that requires authentication controls.

The user wants to say, “Who am I? None of your business!”

While the database is more comfortable with, “Go away, before I taunt you a second time!”

Well, there’s some middle ground everyone can enjoy, and a few Postgres-specific caveats which add a bit of flavor to the whole interaction.

Let’s start with the pg_hba.conf configuration file that coordinates authentication for Postgres. The first format I recommend for most initial setups relies on OS authentication and looks like this:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer

The peer method keeps Postgres out of the loop entirely. The assumption here is that the OS already verified the user, so if they connect as themselves, there’s no reason for further interrogation. Unfortunately, this only works with local connections made from the physical server where Postgres is running. Aside from local administrative scripts, it’s extremely unlikely for users to authenticate this way.

So what do we do for TCP/IP connections made from desktops using a tool like pgAdmin, or applications interacting with the database from multiple remote servers? The easiest and probably most common approach is to simply use MD5 authentication against a user tracked within Postgres itself. That process works for any user created like this:

-- Make a group to manage localacct accounts.
CREATE ROLE localacct;
-- Create the user directly.
CREATE USER some_dude WITH PASSWORD 'foobar';
GRANT localacct TO some_dude;
-- Create blank user, then assign a password.
-- Omit the second part for a user with no password.
CREATE USER some_dude;
ALTER USER some_dude WITH PASSWORD 'foobar';
GRANT localacct TO some_dude;

When a user has a local Postgres-managed password, it’s easy to verify through a direct connection. It’s just a quick encrypted hash check, and the connection can be rejected or accepted in short order. In order for this to work, we also need to modify the pg_hba.conf file and tell Postgres to reload the configuration files. In the end, our auth config should look like this:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer
host      all    +localacct    md5

By specifying +localacct as the user, we’re telling Postgres that any user in the localacct group should authenticate using MD5. This trick is mentioned in passing within the documentation, but I haven’t seen it used very often in the field. The major benefit it provides, is the ability to control authentication using SQL statuments in the future instead of further modifications to the pg_hba.conf file itself. I went into more detail about this approach a couple of years ago.

Notice also that we limited the connections to the 10.* address mask. The reason for this is to reduce potential connection vectors. If at all possible, always limit connections to, at the very least, the network local to your corporate firewall or local organization. Exposing Postgres directly to the internet is an extremely bad idea. Databases primarily act as a storage and retrieval location for data, much like a filesystem. It makes sense for applications, local users, and GUIs, but why would we expose a raw SQL interface to the world? Beyond that, there are inherent risks we’ll discuss a bit later.

In any case, that covers direct local connections and authenticated users. Another common use case is corporate authentication through some kind of shared password management like Active Directory, Samba, LDAP, or something else. Employees that need database access could have a second password managed by the DBA, or we could just pass the buck and let shared authentication handle it. A very common way to do this is through PAM. Companies with an infrastructure team probably have PAM set up to verify local UNIX access through the primary auth server.

As it turns out, Postgres can leverage that stack very easily. Let’s make another modification to our auth config file:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer
host      all    +localacct    md5
host      all    all     pam

While PAM is only one of the methods supported by Postgres, it’s also one of the easiest to leverage regarding delegation. Like the peer method, Postgres trusts whatever response PAM returns regarding the password a user or application supplied. There is however, one giant caveat that necessarily follows the fact PAM must receive the password unmolested: it’s transferred via plain text.

When Postgres uses md5 authentication, the internal connection libraries understand the encryption method Postgres supports. Because of this, the user’s password can be encrypted before it’s ever transferred over the network, and thus it is never exposed. Postgres checks the password hash in its local catalog, and that’s the end of the story. With PAM, the plaintext password must be transmitted over the network connection so PAM can use its own verification process that is completely unknown to Postgres.

This is probably fine over a corporate subnet, but it’s still probably a very good idea to use SSL connections. This at least encrypts the communication protocol itself, so password exposure is greatly reduced. Setting up a bunch of keys everywhere is somewhat inconvenient, but at least can be automated to a degree using configuration management and desktop automation tools.

Given that caveat, why even bother? Well, MD5 is horribly broken. If the Postgres password hashes are ever stolen, MD5 is trivial to crack, and most passwords will be exposed in short order. That in turn, gives an attacker application or superuser-level direct access to potentially sensitive data. There are solutions to this of course, but implementing them would require modifying the Postgres client libraries.

Unfortunately Postgres’ extension system does not apply to the authentication process for reasons stated earlier. The Postgres client library must understand the candidate method so passwords can be encrypted before hashes are transmitted and verified; extensions only work within the database engine itself. As a result, Postgres can’t use bcrypt, scrypt, pbkdf2, or any enhanced hashing system for authentication purposes. That means it remains a potential attack vector if exposed to the public internet.

There’s something else we can do to prevent password exposure, too. Postgres maintains a very handy pgpass configuration file that exists on the client side. When properly set up, users no longer have to supply passwords manually. These files are secured to local access to the user, and are very helpful for automation purposes for application passwords. When using .pgpass, passwords won’t be in application configuration files, and thus won’t accidentally be committed to source management tools like git, svn, or mercurial.

The configuration like for our some_dude user might look like this:

# hostname:port:database:username:password

The application could be installed on the app_engine user account on a UNIX system, but would still be able to connect as some_dude without supplying a password during the actual connection process. This Python snippet, for example would work just fine, even though we set up Postgres to expect an md5 password:

import psycopg2
conn = psycopg2.connect(database='cool_app', user = 'some_dude')

This also allows us to use a configuration management tool to broadcast the .pgpass file to multiple servers as a configuration element. So larger organizations could deploy an application to any number of servers, and passwords would never be stored in source control. Password changes are as simple as syncing the new .pgpass file to all applicable servers. We can take the PAM auth idea even further and create managed application users to remove Postgres from the password loop entirely. Then, our final pg_hba.conf file would look something like this:

local     all    all                     peer
host      all    all    pam

Once we do that, passwords are entirely controlled by the corporate organization itself. Standard password rotations and configuration changes are entirely invisible to Postgres, end-users, and even applications. Nobody except a handful of administrators has access to any password but their own. This indemnifies everyone, even the DBA, from leaking passwords because the automated layer is the only thing with full access to everything. Individuals only know their own passwords, applications follow the same model, and never the two shall meet.

No matter what approach we use, probably the most important thing is that Postgres be behind a firewall of some kind. No matter what kind of front-end application or GUI presents that data to the world, the Postgres port (5432 by default) should be under lock and key. Even should we ignore the inherent insecurity of MD5 as an encryption method in the modern age, PAM or some other method could expose passwords if SSL connections are not strictly observed.

This is the main reason my examples show 10.* network addresses in the examples. Doing so implies only private network access to Postgres from other systems on that same private network. This is in reality, a minimum level. In practice we should go much further, such as isolating production systems from development or QA systems that might run destructive tests or unverified code. Misconfigured systems in these environments might not even realize they’re operating on production data, given how IP addresses and virtual IPs can change, and the ease of copy-and-paste mishaps.

And if a developer ever needs application access to fix a bug even though nobody knows the password? Easy:

GRANT some_app_read TO some_dude;

At most, this should be done in a production context for a limited interval. In dev or QA environments, maybe it makes sense to expose test application data to everyone, but production should be as locked down as possible. No passwords are exchanged here, and the developer still has all necessary components to track down possible issues and obtain test cases for verification and automated test frameworks in future application iterations.

In the end, as long as us DBAs and users are cognizant of the risks, and utilize the tools above, we can access our Postgres database with relative ease. If taken to its logical conclusion, we don’t even need to really worry about tracking which password goes with which thing; that’s all automatic. It’s not quite a world without passwords, but it’s one without typing passwords, and that’s close enough for me.

Tags: , , ,

PG Phriday: JOIN the Club

April 8th, 2016 | Published in Database, Tech Talk | No Comments

I’ve been going on and on about esoteric Postgres features for so long, sometimes I forget my own neophyte beginnings. So let’s back way off and talk about how JOINs work. What are they? What do they do? Why would you use one? What can go wrong? It’s easier than you might imagine, but there are definitely some subtleties to account for.

To make this more relatable, let’s start with a club of pet owners that likes to keep track of everything:

  type_id    SERIAL PRIMARY KEY,
  type_name  VARCHAR NOT NULL
CREATE TABLE pet_owner
  owner_id    SERIAL PRIMARY KEY,
  owner_name  VARCHAR NOT NULL
CREATE TABLE pet_record
  record_id  SERIAL PRIMARY KEY,
  type_id    INT NOT NULL REFERENCES pet_type (type_id),
  owner_id   INT NOT NULL REFERENCES pet_owner (owner_id),
  pet_name   VARCHAR NOT NULL
INSERT INTO pet_type (type_name)
VALUES ('Cat'), ('Dog'), ('Horse');
INSERT INTO pet_owner (owner_name)
VALUES ('Catherine L. Ady'), ('Peter Venkman'), ('Wilbur Post');
INSERT INTO pet_record (type_id, owner_id, pet_name)
VALUES (1, 1, 'Fluffy'), (1, 1, 'Mr. Whiskers'),
       (1, 2, 'Garfield'), (2, 2, 'Spinee'),
       (3, 3, 'Mr. Ed');

Now that we have some data, we want to retrieve it in a way that’s relevant and descriptive. I could go on a long and exhausting description regarding JOIN theory, but those already exist and probably do a better job anyway. Instead, I’ll just say there are two types of JOIN: INNER and OUTER.

INNER JOINs are the default in Postgres. Because it’s the default, we don’t actually need to use the INNER keyword. Legacy systems don’t always allow this, so some ORMs and end users still use INNER JOIN explicitly, but it isn’t necessary or even encouraged with Postgres. The INNER keyword simply means that rows must exist in both tables to appear in the results.

For example, let’s list all pets and their pet type:

SELECT p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t ON (t.type_id = p.type_id);
   pet_name   | type_name 
 Fluffy       | Cat
 Mr. Whiskers | Cat
 Garfield     | Cat
 Spinee       | Dog
 Mr. Ed       | Horse

This is the least complicated type of JOIN there is. In this particular case, the type_id column exists in both pet_type and pet_record. We can use that information to join the tables together so they resemble a single table. From that pool of columns, we can pick and choose the ones that interest us in the SELECT section.

Postgres also supports some shorthand syntax that simplifies the process:

SELECT p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t USING (type_id);

The primary caveat to USING is that the JOIN columns aren’t explicitly listed. If there are multiple tables with similarly named columns involved in a JOIN, Postgres might complain about ambiguity, and then we’d have to revert to the regular syntax. Still, it’s a nice shortcut for most queries.

If that’s an INNER JOIN, what then is an OUTER JOIN? This is a bit more difficult to visualize, but the logic behind it is this: if a row from one table does not exist in the other, represent the missing data as NULL. In other words, since we’re combining two tables to be a larger, virtual table, don’t remove records simply because they’re missing.

As an example, let’s add another pet type and then count how many pets of each type exist. First with a regular JOIN, and then with an OUTER JOIN:

INSERT INTO pet_type (type_name) VALUES ('Cow');
-- First with a standard JOIN
SELECT t.type_name, COUNT(p.record_id) AS total
  FROM pet_type t
  JOIN pet_record p USING (type_id)
 GROUP BY t.type_name;
 type_name | total 
 Cat       |     3
 Horse     |     1
 Dog       |     1
-- Now use an OUTER JOIN
SELECT t.type_name, COUNT(p.record_id) AS total
  FROM pet_type t
  LEFT JOIN pet_record p USING (type_id)
 GROUP BY t.type_name;
 type_name | total 
 Cow       |     0
 Cat       |     3
 Horse     |     1
 Dog       |     1
-- Here's the "table" the OUTER JOIN made
SELECT t.*, p.*
  FROM pet_type t
  LEFT JOIN pet_record p USING (type_id);
 type_id | type_name | record_id | type_id | owner_id |   pet_name   
       1 | Cat       |         2 |       1 |        1 | Fluffy
       1 | Cat       |         3 |       1 |        1 | Mr. Whiskers
       1 | Cat       |         4 |       1 |        2 | Garfield
       2 | Dog       |         5 |       2 |        2 | Spinee
       3 | Horse     |         6 |       3 |        3 | Mr. Ed
       4 | Cow       |           |         |          |

The idea here is that nobody owns a cow, but we still want the pet type listed so we can see that nobody owns a cow. Otherwise, we might not even consider that owning a cow is even an option. But we didn’t even use the OUTER keyword! What’s with this LEFT nonsense? It just so happens that, like INNER, the OUTER keyword is also optional in Postgres. What actually makes a JOIN an OUTER JOIN are the LEFT, RIGHT, and FULL keywords.

Looking at the virtual table produced by the JOIN itself, it’s clear that we only have information supplied by the table that was on the left side of the join. Since this is an OUTER join type, Postgres substituted a bunch of blank data for the missing row in the pet_record table on the right, since its data was optional. If we wanted to represent the same query as a RIGHT join, we just need to move things around a bit:

SELECT t.*, p.*
  FROM pet_record p
  RIGHT JOIN pet_type t USING (type_id);

Now the records on the right are the source of the JOIN and those on the left are represented by NULL if they don’t exist. Otherwise, the output is exactly the same as before. Because of this, it’s fairly uncommon to encounter RIGHT JOIN in the wild. Western countries use writing systems that read left-to-right, so it’s likely joins will also follow that trend.

In that case, how is FULL different? Combine the idea of LEFT and RIGHT, and you get FULL. Essentially when joining two tables with FULL, rows that don’t exist on the right are represented by NULL, and rows that don’t exist on the left are NULL as well. This kind of output makes sense when combining reports or other aggregated output, since some summaries won’t have an equivalent on either side, but we still want the data.

In instances like our relational diagram, it’s not possible to observe the result of a FULL JOIN on the IDs because at least one ID always exists. But… we can fake it by joining on a column they don’t share, and never matches:

SELECT t.*, p.*
  FROM pet_record p
  FULL JOIN pet_type t ON (p.pet_name = t.type_name);
 type_id | type_name | record_id | type_id | owner_id |   pet_name   
       1 | Cat       |           |         |          | 
       4 | Cow       |           |         |          | 
       2 | Dog       |           |         |          | 
         |           |         2 |       1 |        1 | Fluffy
         |           |         4 |       1 |        2 | Garfield
       3 | Horse     |           |         |          | 
         |           |         6 |       3 |        3 | Mr. Ed
         |           |         3 |       1 |        1 | Mr. Whiskers
         |           |         5 |       2 |        2 | Spinee

It’s like we just haphazardly glued the tables together into one giant table. But examining the output itself, the NULL values always occur when there’s no match for the column(s) we used for the join itself. Thus FULL is operating exactly as advertised.

OUTER JOIN is normally used for optional relationships, like in our example where there might be a type of pet nobody owns. It can also be used to find club members that currently own no pets. The point here is that a regular JOIN would exclude these situations and potentially produce erroneous data in a report. OUTER JOIN fills a very defined role, and even though it’s not as dominant, it’s indispensable.

As a final note on JOINs, there is one easily made mistake that generally results in something of a disaster. Imagine we wanted to know who owned each pet, and we rush the query with some cut-and-paste magic. This is what we find:

SELECT o.owner_name, p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t ON (t.type_id = p.type_id)
  JOIN pet_owner o ON (t.type_id = p.type_id);
    owner_name    |   pet_name   | type_name 
 Catherine L. Ady | Fluffy       | Cat
 Catherine L. Ady | Mr. Whiskers | Cat
 Catherine L. Ady | Garfield     | Cat
 Catherine L. Ady | Spinee       | Dog
 Catherine L. Ady | Mr. Ed       | Horse
 Peter Venkman    | Fluffy       | Cat
 Peter Venkman    | Mr. Whiskers | Cat
 Peter Venkman    | Garfield     | Cat
 Peter Venkman    | Spinee       | Dog
 Peter Venkman    | Mr. Ed       | Horse
 Wilbur Post      | Fluffy       | Cat
 Wilbur Post      | Mr. Whiskers | Cat
 Wilbur Post      | Garfield     | Cat
 Wilbur Post      | Spinee       | Dog
 Wilbur Post      | Mr. Ed       | Horse

Oh no! There’s no way every pet is owned by every person! What we have here is a Cartesian Product, and is the bane of databases everywhere. In our case, it happened because we joined the pet_owner table on type_id instead of owner_id. Since the join clause can be anything, Postgres dutifully joined every row in the owner table to every row in the record table. This operation is multiplicative, and so the complexity grows exponentially based on the size of the source tables.

We got off lucky because our tables are small example sets. Imagine what would have happened if we were joining two million-row tables! Producing, transmitting, and receiving one trillion rows instead of a million might just cause some problems. Even small results of a couple thousand rows can be amplified tremendously into millions of results due to a malformed JOIN. However, this example actually provides some valuable insight about what’s going on during the join itself.

Consider the overzealous output again. The rows we want are there, it’s just that the invalid ones weren’t pruned before they were returned to us. There are clearly some efficiency models in place to avoid exhaustive combinations, but this really is how joins work. They combine every row from one table with every row in another table, and weed out the results that don’t match the JOIN criteria. Because of that, the output can be anything from a boring and expected list of pet owners and their pets, or a nonsensical combination of everything. It’s all in the clauses.

Here’s what would have happened if we didn’t suffer from cut-and-paste-itis:

SELECT o.owner_name, p.pet_name, t.type_name
  FROM pet_record p
  JOIN pet_type t ON (t.type_id = p.type_id)
  JOIN pet_owner o ON (o.owner_id = p.owner_id);
    owner_name    |   pet_name   | type_name 
 Catherine L. Ady | Fluffy       | Cat
 Catherine L. Ady | Mr. Whiskers | Cat
 Peter Venkman    | Garfield     | Cat
 Peter Venkman    | Spinee       | Dog
 Wilbur Post      | Mr. Ed       | Horse

Much better. This is one of the main reasons databases like Postgres are so versatile. Not only will they give you exactly what you ask for, but the diversity of the results covers much of what is possible through Set Theory. The mathematics of Set Theory are justifiably impenetrable to someone without a degree in Mathematics—and even some that do—yet the power lurking there is undeniable. For everyone else, the basics are more than sufficient.

That kind of adaptability is a double-edged sword to be sure, but knowing the pitfalls is the best way to avoid them. Postgres is one of those things that’s easy to use and hard to master precisely because of the Math it represents. Now that you have some of the tools necessary to use Postgres, feel free to try and master it.

I still haven’t, but that doesn’t mean I won’t stop trying.

Tags: , ,

PG Phriday: 5 Reasons Postgres Sucks! (You Won’t Believe Number 3!)

April 1st, 2016 | Published in Database, News, Tech Talk | 6 Comments

I’ve been a Postgres DBA since 2005. After all that time, I’ve come to a conclusion that I’m embarrassed I didn’t reach much earlier: Postgres is awful. This isn’t a “straw that broke the camel’s back” kind of situation; there is a litany of ridiculous idiocy in the project that’s, frankly, more than enough to stave off any DBA, end user, or developer. But I’ll limit my list to five, because clickbait.

1. Security Schmecurity

Look, every good Postgres DBA knows that the pg_hba.conf file that handles connection security should have these lines:

local  all  all       trust
host   all  all  all  trust

And then the Postgres developers wonder why it’s so easy to exploit. Here I am, Joe T. Hacker, and I can just do this:

psql -U postgres -h production_server secure_database

And boom! I’m in as the superuser, and I can start dropping tables, replacing tables with whatever I want, screw with data, whatever. Sloppy. Just sloppy.

2. Super Users? Super Losers

Oh yeah, and after I connect to steal sensitive data and then clean up after myself, I can’t even do this:

secure_database=# DROP DATABASE secure_database;
ERROR:  cannot DROP the currently OPEN DATABASE
secure_database=# DROP USER postgres;
ERROR:  CURRENT USER cannot be dropped

Ugh! Are you kidding me? This is basic stuff, people. As a superuser, I should be able to circumvent all security, durability, and even internal consistency. That the backend code doesn’t acknowledge what a superuser really is, merely proves that the database owner is just another user. Useless.

Sure, I could use the provided tool and do it from the command line like this:

dropdb -U postgres -h production_server secure_database

But I’m already connected! Oh, or I could switch databases and then do it like this:

secure_database=# \c postgres
secure_database=# DROP DATABASE secure_database;

But if I’ve hacked in, every keystroke counts. It’s not even necessary! They could just queue the action and apply it after I disconnect, right? This kind of amateur work is why I can’t justify being a Postgres DBA anymore.

3. Awkward JOIN Syntax is Awkward

I cut my teeth as a DBA starting in 1999 with Oracle. Now that is a real database engine! One of the really cool things Oracle does is overload the OUTER JOIN syntax with a neat shortcut. Not only is the OUTER decorator optional, but so are LEFT and RIGHT! Let’s compare the only syntax Postgres supports with the snazzy method Oracle provides, and see which one is obviously way better:

-- Postgres (bleh)
SELECT a.col1, b.col2
  FROM a
  LEFT JOIN b ON (b.a_id =;
-- Oracle (Nice!)
SELECT a.col1, b.col2
  FROM a, b
 WHERE = b.a_id (+);

See that (+) bit? Just slap that on any columns that should join two tables, and it will automatically control how the join works. If it’s on the right, it’s a LEFT JOIN, and if it’s on the left, it’s a RIGHT JOIN. See? Totally intuitive. Sure, Oracle can use the same syntax Postgres childishly demands, but who would purposefully do that? Gross.

4. JSON and the Kitchen Sink

Everyone keeps touting JSON support as some huge benefit to Postgres. Even I did it once or twice. But upon pondering it more seriously, where does it end? What’s the next interface method du jour that we’ll be cobbling onto Postgres next?

First it was XML, then JSON, and next will be what, YAML? They might as well, since you can define your own types anyway.

5. Extensions are the Worst Idea Ever

This is easily the worst of the bunch, and follows naturally from the previous point. Ever since Postgres 9.1, it’s been possible to easily write extensions that are effectively bolted directly to the database engine. Any random schmo can waltz up, whip up some code in basically any language, and it will be able to do any of this. Suddenly Postgres has some “new functionality.”

New functions? New index types? Even override the planner, or fork arbitrary background workers!?

Are you kidding me!? But it gets even worse! There’s a site called PGXN with a related utility for making extensions easy to find and install. I don’t even want to imagine the shenanigans! Any random idiot, including me, could just put an extension there. Just convince a DBA to install it like this:

pgxn install awesome_tool_notanexploitorbotnetiswear

Suddenly they’re pwn3d. Do you want to be pwn3d? I didn’t think so. That’s the kind of risk you run by using Postgres as your primary database engine.


I for one, have seen the light. I can’t believe I was so misled by Postgres for over a decade. How am I even employable at all, having advocated a hacker’s playground for most of my professional career? How can any of us in the Postgres community live with ourselves, knowing we’ve effectively enabled effortless system compromises in the name of functionality?

I guess I’ll go back to using Oracle like the whipped dog I am. It’s what inspired me to be a DBA, and I abandoned it in favor of this travesty. I just pray Larry Ellison will forgive me for my transgressions.

I apologize for everything, and I sincerely hope the rest of the Postgres community joins me in supplication. If our end-users and developers are magnanimous, there might yet be a future for us.

Tags: , , , , ,

« Older Posts