PG Phriday: Postgres Password Practices

Connecting to a Postgres database can be a headache for end users and DBAs alike. Not because of the work involved, but the general irritation of managing passwords—an issue common to any system that requires authentication controls.

The user wants to say, “Who am I? None of your business!”

While the database is more comfortable with, “Go away, before I taunt you a second time!”

Well, there’s some middle ground everyone can enjoy, and a few Postgres-specific caveats which add a bit of flavor to the whole interaction.

Let’s start with the pg_hba.conf configuration file that coordinates authentication for Postgres. The first format I recommend for most initial setups relies on OS authentication and looks like this:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer

The peer method keeps Postgres out of the loop entirely. The assumption here is that the OS already verified the user, so if they connect as themselves, there’s no reason for further interrogation. Unfortunately, this only works with local connections made from the physical server where Postgres is running. Aside from local administrative scripts, it’s extremely unlikely for users to authenticate this way.

So what do we do for TCP/IP connections made from desktops using a tool like pgAdmin, or applications interacting with the database from multiple remote servers? The easiest and probably most common approach is to simply use MD5 authentication against a user tracked within Postgres itself. That process works for any user created like this:

-- Make a group to manage localacct accounts.
CREATE ROLE localacct;

-- Create the user directly.
CREATE USER some_dude WITH PASSWORD 'foobar';
GRANT localacct TO some_dude;

-- Create blank user, then assign a password.
-- Omit the second part for a user with no password.
CREATE USER some_dude;
ALTER USER some_dude WITH PASSWORD 'foobar';
GRANT localacct TO some_dude;

When a user has a local Postgres-managed password, it’s easy to verify through a direct connection. It’s just a quick encrypted hash check, and the connection can be rejected or accepted in short order. In order for this to work, we also need to modify the pg_hba.conf file and tell Postgres to reload the configuration files. In the end, our auth config should look like this:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer
host      all    +localacct    10.0.0.0/8    md5

By specifying +localacct as the user, we’re telling Postgres that any user in the localacct group should authenticate using MD5. This trick is mentioned in passing within the documentation, but I haven’t seen it used very often in the field. The major benefit it provides, is the ability to control authentication using SQL statuments in the future instead of further modifications to the pg_hba.conf file itself. I went into more detail about this approach a couple of years ago.

Notice also that we limited the connections to the 10.* address mask. The reason for this is to reduce potential connection vectors. If at all possible, always limit connections to, at the very least, the network local to your corporate firewall or local organization. Exposing Postgres directly to the internet is an extremely bad idea. Databases primarily act as a storage and retrieval location for data, much like a filesystem. It makes sense for applications, local users, and GUIs, but why would we expose a raw SQL interface to the world? Beyond that, there are inherent risks we’ll discuss a bit later.

In any case, that covers direct local connections and authenticated users. Another common use case is corporate authentication through some kind of shared password management like Active Directory, Samba, LDAP, or something else. Employees that need database access could have a second password managed by the DBA, or we could just pass the buck and let shared authentication handle it. A very common way to do this is through PAM. Companies with an infrastructure team probably have PAM set up to verify local UNIX access through the primary auth server.

As it turns out, Postgres can leverage that stack very easily. Let’s make another modification to our auth config file:

# TYPE    DB     USER          ADDRESS       METHOD
local     all    all                         peer
host      all    +localacct    10.0.0.0/8    md5
host      all    all           10.0.0.0/8    pam

While PAM is only one of the methods supported by Postgres, it’s also one of the easiest to leverage regarding delegation. Like the peer method, Postgres trusts whatever response PAM returns regarding the password a user or application supplied. There is however, one giant caveat that necessarily follows the fact PAM must receive the password unmolested: it’s transferred via plain text.

When Postgres uses md5 authentication, the internal connection libraries understand the encryption method Postgres supports. Because of this, the user’s password can be encrypted before it’s ever transferred over the network, and thus it is never exposed. Postgres checks the password hash in its local catalog, and that’s the end of the story. With PAM, the plaintext password must be transmitted over the network connection so PAM can use its own verification process that is completely unknown to Postgres.

This is probably fine over a corporate subnet, but it’s still probably a very good idea to use SSL connections. This at least encrypts the communication protocol itself, so password exposure is greatly reduced. Setting up a bunch of keys everywhere is somewhat inconvenient, but at least can be automated to a degree using configuration management and desktop automation tools.

Given that caveat, why even bother? Well, MD5 is horribly broken. If the Postgres password hashes are ever stolen, MD5 is trivial to crack, and most passwords will be exposed in short order. That in turn, gives an attacker application or superuser-level direct access to potentially sensitive data. There are solutions to this of course, but implementing them would require modifying the Postgres client libraries.

Unfortunately Postgres’ extension system does not apply to the authentication process for reasons stated earlier. The Postgres client library must understand the candidate method so passwords can be encrypted before hashes are transmitted and verified; extensions only work within the database engine itself. As a result, Postgres can’t use bcrypt, scrypt, pbkdf2, or any enhanced hashing system for authentication purposes. That means it remains a potential attack vector if exposed to the public internet.

There’s something else we can do to prevent password exposure, too. Postgres maintains a very handy pgpass configuration file that exists on the client side. When properly set up, users no longer have to supply passwords manually. These files are secured to local access to the user, and are very helpful for automation purposes for application passwords. When using .pgpass, passwords won’t be in application configuration files, and thus won’t accidentally be committed to source management tools like git, svn, or mercurial.

The configuration like for our some_dude user might look like this:

# hostname:port:database:username:password
*:5432:cool_app:some_dude:foobar

The application could be installed on the app_engine user account on a UNIX system, but would still be able to connect as some_dude without supplying a password during the actual connection process. This Python snippet, for example would work just fine, even though we set up Postgres to expect an md5 password:

import psycopg2

conn = psycopg2.connect(database='cool_app', user = 'some_dude')

This also allows us to use a configuration management tool to broadcast the .pgpass file to multiple servers as a configuration element. So larger organizations could deploy an application to any number of servers, and passwords would never be stored in source control. Password changes are as simple as syncing the new .pgpass file to all applicable servers. We can take the PAM auth idea even further and create managed application users to remove Postgres from the password loop entirely. Then, our final pg_hba.conf file would look something like this:

# TYPE    DB     USER    ADDRESS    METHOD
local     all    all                     peer
host      all    all       10.0.0.0/8    pam

Once we do that, passwords are entirely controlled by the corporate organization itself. Standard password rotations and configuration changes are entirely invisible to Postgres, end-users, and even applications. Nobody except a handful of administrators has access to any password but their own. This indemnifies everyone, even the DBA, from leaking passwords because the automated layer is the only thing with full access to everything. Individuals only know their own passwords, applications follow the same model, and never the two shall meet.

No matter what approach we use, probably the most important thing is that Postgres be behind a firewall of some kind. No matter what kind of front-end application or GUI presents that data to the world, the Postgres port (5432 by default) should be under lock and key. Even should we ignore the inherent insecurity of MD5 as an encryption method in the modern age, PAM or some other method could expose passwords if SSL connections are not strictly observed.

This is the main reason my examples show 10.* network addresses in the examples. Doing so implies only private network access to Postgres from other systems on that same private network. This is in reality, a minimum level. In practice we should go much further, such as isolating production systems from development or QA systems that might run destructive tests or unverified code. Misconfigured systems in these environments might not even realize they’re operating on production data, given how IP addresses and virtual IPs can change, and the ease of copy-and-paste mishaps.

And if a developer ever needs application access to fix a bug even though nobody knows the password? Easy:

GRANT some_app_read TO some_dude;

At most, this should be done in a production context for a limited interval. In dev or QA environments, maybe it makes sense to expose test application data to everyone, but production should be as locked down as possible. No passwords are exchanged here, and the developer still has all necessary components to track down possible issues and obtain test cases for verification and automated test frameworks in future application iterations.

In the end, as long as us DBAs and users are cognizant of the risks, and utilize the tools above, we can access our Postgres database with relative ease. If taken to its logical conclusion, we don’t even need to really worry about tracking which password goes with which thing; that’s all automatic. It’s not quite a world without passwords, but it’s one without typing passwords, and that’s close enough for me.