PG Phriday: DIY in the CLI (Part 1)

On a higher level, Postgres has a bevy of libraries, interfaces, and clients for accessing a database instance. From language APIs to GUIs like pgAdmin, or SaaS entries like JackDB, every flavor of interaction is covered. And yet, that’s only a small part of the story. For those who dare to tread into the watery depths, there’s also the world of dark incantations that is the command-line.

While most are aware of psql, the Postgres command-line client for accessing databases, there are far more creatures lurking in the black oblivion which deserve more visibility. In the spirit of getting the most from our tools, let’s take a look at these and consider how they might be useful in scenarios that aren’t restricted to boring old DBA considerations. To that end, let’s cover some of the standard client applications that anyone might use.

To begin, there are several commands that cover some standard SQL operations. Things like creating or dropping databases, users, or procedural languages, are all possible without actually connecting to Postgres. Normally, we’d have to use the equivalent SQL commands:

CREATE DATABASE foo;
DROP DATABASE foo;

CREATE USER kitty_cat WITH PASSWORD 'M30w9|_|rr';
DROP USER kitty_cat;

CREATE LANGUAGE plpythonu;
DROP LANGUAGE plpythonu;

While these are all possible to use by forcing them through psql -c, that makes automation testing and other tasks somewhat awkward, or force sysadmins to learn SQL for no reason. So Postgres provides functional wrappers we can use instead. They’re named about what we might expect, too:

createdb foo
dropdb foo

createuser kitty_cat --pwprompt
dropuser kitty_cat

createlang plpythonu
droplang plpythonu

As would be expected, all of these commands include the optional parameters from their SQL equivalents. This means a couple of handy things: flags can be listed with -? or --help, and longer descriptions are available via comprehensive man pages. The Postgres online documentation is great, and having all of this information at our fingertips is even better.

Of course, we’re not limited to databases, users, and languages. There are a few commands to streamline basic maintenance as well, and as might be expected, these also have SQL equivalents. Well, mostly. Consider these three scenarios:

  1. We want to VACUUM every database every night following a busy day of transactions. We’ve disabled autovacuum because we have a very active database, and have instead, opted for nightly maintenance.
  2. Over the years, our indexes on the user_session and login_state tables have gotten a bit fragmented, and we want to rebuild them.
  3. We have clustered multiple tables, sorting their disk pages along beneficial indexes. Due to MVCC storage, tables that experience updates need periodic re-clustering to maintain this ordering. To do this, we need to run CLUSTER regularly.

Assuming we have two databases, named ’trading’ and ‘operations’ for example, we could do these things using SQL commands:

\c trading
VACUUM ANALYZE;
CLUSTER;
REINDEX TABLE user_session;
REINDEX TABLE login_state;

\c operations
VACUUM ANALYZE;
CLUSTER;

This is all perfectly serviceable. Yet for administrators or end users that don’t really care about SQL syntax, or for operations we want to automate, there is an easier way. We could get the same result using three command-line tools:

vacuumdb --all --analyze
reindexdb --table=user_session --table=login_state trading
clusterdb --all

# Or with short options:

vacuumdb -az
reindexdb -t user_session -t login_state trading
clusterdb -a

Unlike their SQL equivalents, the command-line tools can combine operations or target multiple objects. We leveraged that to reindex both tables with a single command, and vacuum or cluster all databases in our instance. This is about the time our tools become more than functional wrappers of the SQL commands. Parameters like --all illustrate client-level looping at the very least, and provide a great segue into heavier utilities.

Backing up databases and restoring them, for instance, leverages multiple high and low level Postgres interfaces. Due to this complexity, there is no SQL analog. The Postgres backup and restore utilities have also matured a lot over the years, gaining parallel dumping functionality as recently as 9.3.

Taking this into account, we could clone one of the previously mentioned databases using provided tools in two different ways:

# Just copy a target database by itself:

pg_dump --jobs=2 --format=directory --file=backup_dir trading
createdb trading_clone
pg_restore --jobs=2 --dbname=trading_clone backup_dir

# Copy the whole freaking instance at the binary level.

pg_basebackup -D clone_dir

Just so we’re clear, both approaches have pluses and minuses as well as applicable scenarios beyond the scope of this article. We’re merely illustrating very basic usage. Beyond that, the first method is a variant of the common dump/restore pattern used since time immemorial to perform Postgres upgrades until pg_upgrade hit the scene in 9.0. One of the primary reasons it fell out of favor was due to the growing trend of immense databases.

Even using parallel functionality, dumping the contents of every table in a database 200GB or larger will be extremely slow. Restoring that information is even worse, as we not only have to import all of the same data, but all indexes and constraints must be rebuilt from scratch. Before pg_restore was capable of parallel restores in 8.4, restores were even more frustrating. This, along with the new online replication capabilities, is the origin of pg_basebackup.

By default, pg_basebackup merely utilizes the Postgres data replication stream and reconstructs the host instance in a specified directory. This mans we can start Postgres from that directory as a replica, or use pg_upgrade to test newer versions while the old one keeps running. This encourages online failbacks in case of failed upgrades, multiple upgrade test runs, or running applications on new versions to test for edge cases and compatibility concerns.

Principally, it allows us to separate instance management from content management. We still need pg_dump and pg_restore, but they’re no longer primarily backup tools. This is espicially true for pg_dump. It has multiple export formats which can theoretically be used to transfer table data into other storage systems. Or maybe we just want to export and inspect raw schema creation commands.

What else is left on the client end of things? For those that want to build Postgres extensions, or get information about the Postgres binaries installed on a server, we have pg_config. Its output is basically a bunch of environment settings that were used to build Postgres itself. This is especially useful with distributed builds that have long lists of arbitrary configure flags. After all, here’s how Ubuntu compiles it:

pg_config --configure

'--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-tclconfig=/usr/lib/x86_64-linux-gnu/tcl8.6' '--with-includes=/usr/include/tcl8.6' 'PYTHON=/usr/bin/python' '--mandir=/usr/share/postgresql/9.4/man' '--docdir=/usr/share/doc/postgresql-doc-9.4' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/9.4' '--bindir=/usr/lib/postgresql/9.4/bin' '--libdir=/usr/lib/x86_64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-tap-tests' '--enable-debug' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' 'CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I/usr/include/mit-krb5 -fPIC -pie -DLINUX_OOM_SCORE_ADJ=0 -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -L/usr/lib/mit-krb5 -L/usr/lib/x86_64-linux-gnu/mit-krb5' '--with-krb5' '--with-gssapi' '--with-ldap' '--with-selinux' 'CPPFLAGS=-D_FORTIFY_SOURCE=2

Some of that might be nice to know on occasion.

We also have pg_isready, a “new” addition with 9.3. It’s just a quick connection check with an exit status that follows standard exit codes. This is good for basic monitoring, but not much else. On the other hand, we no longer have to run a bogus command through psql and deal with the much more verbose client-oriented output. On an automation front, that’s a major step forward.

And finally, there’s benchmarks. Perhaps we’re testing new hardware, or a new Postgres version, or maybe we’re just curious. Having recently escaped existence as a “mere” contrib tool in 9.5, now anyone and their dog can beat up Postgres for fun and profit. It’s great to obtain data for articles like this, too.

Here’s a quick example:

createdb pgbench
pgbench --initialize --scale=100 pgbench
pgbench --client=4 --jobs=2 --select-only --time=10 pgbench

starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 4
number of threads: 2
duration: 10 s
number of transactions actually processed: 168447
latency average: 0.237 ms
tps = 16844.169409 (including connections establishing)
tps = 16853.718425 (excluding connections establishing)

Generally read/write tests should be larger than memory to really stress a system, but we’d need a much larger scale for that on most modern systems. At scale 100 we get about 1.5GB over 10M rows of data, so on a small 8GB test VM, we’d need a scale of 600 or higher to even start hitting the disks.

Beyond the basic test framework, we can also execute arbitrary SQL scripts. This makes it possible to stress-test transactional databases with multiple parallel clients, running anything from simple SELECT statements on our own tables, to intricate stored procedures that update fact tables. As such, it’s one of the most valuable tools in the arsenal of both users and DBAs, provided either even know it exists.

There are a couple more tools I’ve left out, but they’re related to logical and stream replication and are better covered in a more in-depth manner. Beyond that, take a look at your Postgres binary directory; you might be surprised what’s hiding there.

Because distributions like to hide all the cool stuff, we may have to go looking for some of these utilities. Debian and Ubuntu users can find binaries in /usr/lib/postgresql/9.x/bin, while RHEL variants like CentOS should look in /usr/pgsql-9.x/bin.

Explore often, but do so cautiously!