PG Phriday: Who Died and Made You Boss?!

Page content

Postgres is great, but it can’t run itself in all cases. Things come up. Queries go awry. Hardware fails, and users leave transactions open for interminable lengths of time. What happens if one of these things occur while the DBA themselves has a hardware fault? While they’re down for maintenance, someone still has to keep an eye on things. For the last PG Phriday of the year completely unrelated to my upcoming surgery, let’s talk about what happens when your DBA becomes inoperative due to medical complications.

This is Fake Postgres DBA 101!

Getting Around

When in doubt, SSH is the name of the game. Database accounts are either locked-down or lack sufficient permissions to do everything, but usually the postgres user itself has total access. Invoke your favorite SSH client to connect to the host running the database in question, and use sudo to become the postgres user:

ssh my-db-host
sudo su -l postgres

Afterwards, it’s a good idea to add our public key to the postgres user’s .ssh/authorized_keys file so we can log in as the postgres user without the intermediate sudo. Here’s a good guide for doing that. If there’s configuration management like salt or Puppet involved, that file is probably part of the stack and needs to be modified there or it’ll be overwritten.

Either way, we’re in. If we’re lucky enough to be on a Debian derivative like Ubuntu or Mint, we can use the pg_lsclusters command to see which database instances are running on this server. Here’s what that looks like on a sample VM:

pg_lsclusters 

Ver Cluster Port Status Owner    Data directory          Log file
9.5 main    5432 online postgres /data/pgsql/main/data   /var/log/postgresql/postgresql-9.5-main.log
9.6 96test  5440 online postgres /data/pgsql/96test/data /var/log/postgresql/postgresql-9.6-96test.log

The next thing we need is the psql command-line client—the nexus of Postgres interoperability. Postgres instances monitor the listed port for connection attempts, so if we wanted to target a specific instance, we’d want to refer to this output. The default is 5432 if we don’t pass any value.

Let’s get a list of all databases in the 96test Postgres instance on this system.

psql -p 5440 -l

                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 examples  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 sensors   | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(5 rows)

I’ve covered the template databases, and they can safely be ignored. The only “real” databases in this installation are examples, postgres, and sensors. The postgres database is a default and is often used as scratch space; hopefully nobody is using it as their actual production database.

Each of these is distinct and can not interact with the data contained in the others. Admins familiar with other database engines might find this odd, but that’s how Postgres works. Usually databases that require lots of data sharing use schemas to keep tables in namespaces within the database. And of course, I have a long-winded explanation of this as well.

Either way, we have a list of databases from the instance we want to examine. How do we connect? Well, if the -l flag shows us a list of available databases, what happens if we remove it and append the name of a database?

psql -p 5440 sensors

psql (9.6.1)
Type "help" for help.

sensors=# SELECT 'Hello World!';

   ?column?   
--------------
 Hello World!
(1 row)

sensors=# help

You are using psql, the command-line interface to PostgreSQL.
Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

Sweet! we connected to the sensors database, successfully executed a basic query, and got a bit of assistance from the client software itself. The psql client is an extremely robust tool for interacting with a Postgres database. There are a lot of shortcut commands, and entering \? lists all of them. Be ready to scroll!

The one that really matters is \d and its variants. It’s short for “describe” and does exactly that. It can retrieve lists of tables, schemas, views, indexes, or any other database object. It can also provide greater detail about the object in question. Let’s use it to list the available schemas, see the contents of a schema, and get more insight on a particular table.

sensors=# \dn

  List of schemas
  Name  |  Owner   
--------+----------
 logs   | postgres
 public | postgres
(2 rows)

sensors=# \dt logs.*

           List of relations
 Schema |    Name    | Type  |  Owner   
--------+------------+-------+----------
 logs   | sensor     | table | postgres
 logs   | sensor_log | table | postgres
(2 rows)

sensors=# \d logs.sensor

                                           Table "logs.sensor"
    Column     |            Type             |                         Modifiers                          
---------------+-----------------------------+------------------------------------------------------------
 sensor_id     | integer                     | not null default nextval('sensor_sensor_id_seq'::regclass)
 location      | character varying           | not null
 reading       | bigint                      | not null
 modified_date | timestamp without time zone | not null
Indexes:
    "sensor_pkey" PRIMARY KEY, btree (sensor_id)
    "idx_sensor_location" btree (location)

The \dn command shows the namespaces in the current database. From that list, we see that the logs schema might have something interesting in it. So we then rely on \dt to list all of the tables the logs schema contains. From there, we can just use the regular \d describe command to get all of the information Postgres has about the structure of the logs.sensor table, complete with indexes, constraints, triggers, and so on.

Now that we can properly introspect to see where we are, it’s time to look deeper.

Peering Into the Void

A common action upon connecting to a database is to view connections and see what they’re doing. This is when we turn to the Postgres system catalog, a series of tables and views that reflect the current state of the database. The pg_stat_activity view will tell us anything we need to know regarding user connections. For example:

SELECT pid, datname, usename, state, query_start
  FROM pg_stat_activity;

  pid  | datname | usename  |        state        
-------+---------+----------+---------------------
 10392 | sensors | postgres | active
 16202 | sensors | postgres | idle in transaction

\x
Expanded display is on.

SELECT * FROM pg_stat_activity WHERE pid=16202;

-[ RECORD 1 ]----+------------------------------------
datid            | 16384
datname          | sensors
pid              | 16202
usesysid         | 10
usename          | postgres
application_name | psql
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2016-12-16 08:58:44.584559-06
xact_start       | 2016-12-16 08:58:46.34946-06
query_start      | 2016-12-16 08:59:01.336305-06
state_change     | 2016-12-16 08:59:01.336895-06
wait_event_type  | 
wait_event       | 
state            | idle in transaction
backend_xid      | 
backend_xmin     | 
query            | SELECT * FROM logs.sensor LIMIT 10;

The first query is a pretty basic terse listing of the activity of all clients. We noticed one of them was idle within a transaction and decided to view everything Postgres knew about that connection. From the various listed times, we can see that it’s only been idle for a few seconds, so there’s no cause for alarm. We can also see the last query the user executed against the database, even if it completed long ago. This is all useful debugging information.

Viewing all of that as a single row would have been extremely inconvenient, so we utilized another of the psql commands and used \x to enable extended output. When this option is active, psql presents every column within a result as a key/value row pair. It’s not especially convenient for hundreds or thousands of rows, but it’s pretty indispensable when viewing the results of a wide column list.

A good combination of the columns in the pg_stat_activity view might look something like this:

-- This query for 9.6

SELECT pid, datname, usename, state, client_addr, wait_event,
       now() - query_start AS duration,
       substring(query, 1, 30) AS query_part
  FROM pg_stat_activity
 WHERE state != 'idle'
 ORDER BY now() - query_start DESC
 LIMIT 10;

-- This query for 9.2 to 9.5

SELECT pid, datname, usename, state, client_addr, waiting,
       now() - query_start AS duration,
       substring(query, 1, 30) AS query_part
  FROM pg_stat_activity
 WHERE state != 'idle'
 ORDER BY now() - query_start DESC
 LIMIT 10;

  pid  | datname | usename  |        state        | client_addr | wait_event |    duration     |           query_part           
-------+---------+----------+---------------------+-------------+------------+-----------------+--------------------------------
 22503 | sensors | postgres | idle in transaction | 10.2.2.91   |            | 00:38:21.952753 | SELECT * FROM logs.sensor LIMI
 10392 | sensors | postgres | active              |             |            | 00:00:00        | SELECT pid, datname, usename, 

This query only reports non-idle connections prioritized by activity duration. From here, we can see that the connection that was idle in transaction is still idle several minutes later. Nothing is waiting on it, but maybe it’s time to clean up a bit anyway.

The Terminator

Maybe that idle connection is blocking someone and it needs to go away. There are two ways to make that happen directly within Postgres. The first and safest option is to try and cancel the offending query with pg_cancel_backend. If that doesn’t work, we escalate to pg_terminate_backend which actually breaks the connection and rolls back any pending transactions. Let’s try those now:

SELECT pg_cancel_backend(22503);

 pg_cancel_backend 
-------------------
 t

SELECT pid, state, query
  FROM pg_stat_activity
 WHERE pid = 22503;

  pid  |        state        |                query                
-------+---------------------+-------------------------------------
 22503 | idle in transaction | SELECT * FROM logs.sensor LIMIT 10;
(1 row)

SELECT pg_terminate_backend(22503);

 pg_terminate_backend 
----------------------
 t

SELECT pid, state, query
  FROM pg_stat_activity
 WHERE pid = 22503;

 pid | state | query 
-----+-------+-------
(0 rows)

The pg_cancel_backend function didn’t “work” because it only cancels the currently operating query. If there’s a transaction in place, the user will simply return to their database prompt. An application that was stuck may be fixed by canceling a stuck query. If the application is threaded, it may have started a transaction and is injecting commands as another portion of the program does work elsewhere. If that process gets interrupted, the transaction may never complete.

That’s when we roll out pg_terminate_backend, which discards such polite approaches. This is the kill shot most DBAs are familiar with. Queries are canceled. Transactions are rolled back. Connections are broken. Resources are reclaimed. This isn’t quite cold-blooded murder, though. From the perspective of Postgres, we’re merely pointing a rifle at the offending connection and politely requesting it to vacate the premises.

Unlimited Cosmic Power

For DBAs more comfortable with GUIs, pgAdmin might be a better interface than psql. Version 3 is a stand-alone client for Windows, Linux, or OSX. In smaller environments, it’s not entirely uncommon to already have connection capabilities to servers hosting Postgres instances. Just connect and thrill in exploring the drilldown menus, object creation scripts, query interface, and all of the other niceties it provides.

Many of the operations that really matter, like terminating unruly connections, are only available to superusers. There are a couple ways to get this kind of access. In a pinch, it’s easy to connect to the host server as above and grant superuser access directly to ourselves from psql:

ALTER USER sthomas WITH SUPERUSER;

\du sthomas

           List of roles
 Role name | Attributes | Member of 
-----------+------------+-----------
 sthomas   | Superuser  | {}

Hopefully though, the current DBA set aside a dedicated role for this kind of thing. Giving users direct superuser access is usually frowned upon. Imagine we have a sysdba role with superuser capabilities and we want to share in the glory. This would be the correct approach:

ALTER USER sthomas WITH NOSUPERUSER;

GRANT sysdba TO sthomas;

\du s*

                  List of roles
 Role name |       Attributes        | Member of 
-----------+-------------------------+-----------
 sthomas   |                         | {sysdba}
 sysdba    | Superuser, Cannot login | {}

However we should note that since it’s the role with superuser access, we aren’t actually superusers. Like with sudo, we need to “activate” our new powers before they’ll work. Postgres allows users to adopt roles they’re members of. If we wanted to perform superuser actions in Postgres from pgAdmin, we could issue this command from a query pane:

SET ROLE sysdba;

Is there more? Oh, there’s a lot more. But this is enough for a competent techie to start digging around and becoming familiar with the territory. Quick, seize control while the DBA isn’t around to stop you! They’ll never know! Viva la resistance!

Of course, an experienced Postgres DBA will know and revoke your abilities later, but it’s fun to pretend. Take the opportunity to learn more anyway; just be careful on production systems.