PG Phriday: Cluster Control

It has occurred to me that I may have been spending a bit too much time being excited about new Postgres features and developments in the community. One of the intents of this weekly article was for educational purposes, so this week, let’s get back to basics. To that end, the topic for this week boils down to the tools available for managing Postgres instances, and how to use them. Surprisingly, it’s not as straight-forward as you might think.

Having used Postgres for almost 15 years now, it’s easy to forget (or mentally block!) the early struggles. With new tools and wrappers always being developed, it’s not always clear what the best practices for managing a Postgres cluster actually are. Indeed, it often depends on how Postgres is installed.

Let’s start with a basic source-code installation. I won’t cover that process, but we can assume that after the smoke clears, the binaries are somewhere in /usr/bin or /usr/local/bin, and are thus in our execution path. Given that, let’s say we have an available mount point at /data and want to create a new cluster there. Here’s how that might go:

sudo mkdir /data/pgsql
sudo chown postgres:postgres /data/pgsql
sudo su - postgres

initdb -D /data/pgsql/my_db
pg_ctl -D /data/pgsql/my_db start

We now have an active Postgres instance at /data/pgsql/my_db. It’s extremely common to use the postgres OS user for this kind of thing, hence all of our sudo commands to prepare. It’s entirely possible to do this as a regular user, but I usually don’t recommend that approach.

In any case, this type of installation essentially depends on the pg_ctl command-line tool. It does everything related to controlling a Postgres instance. But it’s also annoying to use the -D parameter all the time when using Postgres tools, so there are several environment variables that can also do the job. This lets us prime our environment with .bashrc, for example. Let’s stop the instance:

export PGDATA=/data/pgsql/my_db

pg_ctl stop -m fast

Why the “-m fast” part? By default, Postgres is exceedingly polite. If we had simply asked it to stop, it would patiently wait for any pending transactions to complete before stopping. But if there are several users connected, some may leave transactions idle, or a really long process may be running, or maybe we’re just in a hurry. Using a fast shutdown tells Postgres to abort any transactions and stop as soon as it can—safely, of course. There’s no data loss, but a few users or applications may grumble a bit.

At the system level, it’s very common for a global service management to wrap much of this process. To start or stop Postgres on CentOS or RedHat, we’d do something like this:

sudo service postgresql-9.4 start
sudo service postgresql-9.4 stop

On a Debian or Ubuntu system, it would be this:

sudo service postgresql start
sudo service postgresql stop

And this is where things start to get complicated. These two major Linux flavors can’t even agree on what to name the service control mechanism, and both have extremely differing views on handling multiple versions. Both however, go to great lengths to “hide” the Postgres-provided binaries so that only the known wrappers can access them unless we cheat and add the binary location directly to our PATH variable.

By and large, that isn’t actually necessary, but it does add a major complication: standardization. As in, there isn’t any. Depending on distribution, wrappers will take different parameters, install Postgres in varying locations, and have divergent control methods. Let’s dive a little deeper into Debian/Ubuntu and their slightly more comprehensive wrappers.

Ubuntu assumes any number of Postgres versions may be installed at any one time. As such, starting and stopping the postgresql service on an Ubuntu system will cascade that operation to any configured instances hosted on the machine. What if we only want to manage a single instance? First, we need to know which instances are even available. This is where the cluster tools come in.

Because a single Postgres instance can represent several individual databases, it’s not uncommon to refer to it as a cluster. Debian and thus Ubuntu took that concept and ran with it and made something that’s actually pretty handy. Let’s get a look at a test system with multiple running clusters:

sudo su - postgres

pg_lsclusters

Ver Cluster Port Status Owner    Data directory               Log file
9.4 main    5432 online postgres /var/lib/postgresql/9.4/main /var/log/postgresql/postgresql-9.4-main.log

Neat, eh? This tool gives us a lot of information that’s immediately useful. With multiple clusters running on one system, we need to know which port each is running on. Knowing where the data resides, and where to find logs has obvious merit. And to differentiate each, the clusters are named. Let’s repeat that instance creation from above using a couple more wrapper tools:

pg_createcluster 9.4 my_db -D /data/pgsql/my_db
pg_ctlcluster 9.4 my_db start
pg_lsclusters

Ver Cluster Port Status Owner    Data directory               Log file
9.4 main    5432 online postgres /var/lib/postgresql/9.4/main /var/log/postgresql/postgresql-9.4-main.log
9.4 my_db   5433 online postgres /data/pgsql/my_db            /var/log/postgresql/postgresql-9.4-my_db.log

The pg_createcluster tool takes the place of initdb on Debian-based systems like Ubuntu. It’s aware of the other wrappers and recommended settings, locations, and so on. It also did us the favor of automatically incrementing the port so the clusters can run simultaneously. And pg_ctlcluster supplants pg_ctl for similar reasons. It should be fairly clear the tools are inherently designed such that the version is a required parameter. They’re pretty dedicated to the concept of hosting multiple clusters and Postgres versions on a single server.

Since they’re wrappers, they also accept many of the same parameters. If we wanted to stop the my_db cluster quickly, we could use a similar command as before:

pg_ctlcluster 9.4 my_db stop -m fast

Note that there’s one major omission here: we didn’t specify the data directory. Of course, the wrappers know where all of the clusters live, so we don’t need to. The version and name uniquely identify all clusters under this architecture, removing the necessity of memorizing cluster locations or hacking our environment with variables.

All of this comes with something of a caveat related to configuration. Normally, Postgres configuration files reside within the data directory itself, making the database a self-contained entity that is not tied to any one server. If the data is on a SAN for instance, it could be mounted on a replacement server and started with no modifications. But to have wrappers, the wrappers need a standard location to find information regarding the clusters they can control. On systems that use these tools, that’s the /etc/postgresql directory.

Here’s what the configuration directories look like on our test system:

find /etc/postgresql -type d

/etc/postgresql
/etc/postgresql/9.4
/etc/postgresql/9.4/my_db
/etc/postgresql/9.4/main

It’s a simple organization to remember, and closely resembles the version + name format all of the tools require. All of the configuration files the Postgres documentation or online articles refer to, actually live here instead. Most UNIX systems expect configuration files somewhere in the /etc directory, so that’s not unexpected. But it does introduce a rather unusual complication.

Some Postgres tools unrelated to this organization expect configuration files to be in the Postgres data directory. If we were to restore a backup of the data directory on an unrelated server, we would be unable to start it, and we would have no idea who should be allowed to connect. To compound the issue, the cluster tools themselves would have no way to administer it, because the /etc files would be missing.

The easy solution would be to include the corresponding /etc/postgresql folder in the backup process itself. We could (and should) also use configuration management to distribute these files to replacement or restoration servers. But that doesn’t fix external tools that look for config files in the data directory, so I tend to also do a bit of file linking. For example:

for x in /etc/postgresql/9.4/my_db/*.conf; do
  ln -s $x /data/pgsql/my_db
done

This way we don’t have to worry; the necessary files are all there and we don’t lose any benefits provided by the cluster tools.

In any case, I hope any developers reading this are now better equipped to experiment with local instances. It’s unfortunate there’s not enough standardization across the distributions in this regard, but at least now you have a starting point and some insight at how it got this way. Enjoy!