PG Phriday: Who Died and Made You Boss?!
Postgres is great, but it can’t run itself in all cases. Things come up. Queries go awry. Hardware fails, and users leave transactions open for interminable lengths of time. What happens if one of these things occur while the DBA themselves has a hardware fault? While they’re down for maintenance, someone still has to keep an eye on things. For the last PG Phriday of the year completely unrelated to my upcoming surgery, let’s talk about what happens when your DBA becomes inoperative due to medical complications.
This is Fake Postgres DBA 101!
Getting Around
When in doubt, SSH is the name of the game. Database accounts are either locked-down or lack sufficient permissions to do everything, but usually the postgres user itself has total access. Invoke your favorite SSH client to connect to the host running the database in question, and use sudo
to become the postgres user:
ssh my-db-host
sudo su -l postgres
Afterwards, it’s a good idea to add our public key to the postgres user’s .ssh/authorized_keys
file so we can log in as the postgres user without the intermediate sudo. Here’s a good guide for doing that. If there’s configuration management like salt or Puppet involved, that file is probably part of the stack and needs to be modified there or it’ll be overwritten.
Either way, we’re in. If we’re lucky enough to be on a Debian derivative like Ubuntu or Mint, we can use the pg_lsclusters
command to see which database instances are running on this server. Here’s what that looks like on a sample VM:
pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
9.5 main 5432 online postgres /data/pgsql/main/data /var/log/postgresql/postgresql-9.5-main.log
9.6 96test 5440 online postgres /data/pgsql/96test/data /var/log/postgresql/postgresql-9.6-96test.log
The next thing we need is the psql
command-line client—the nexus of Postgres interoperability. Postgres instances monitor the listed port for connection attempts, so if we wanted to target a specific instance, we’d want to refer to this output. The default is 5432 if we don’t pass any value.
Let’s get a list of all databases in the 96test Postgres instance on this system.
psql -p 5440 -l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
examples | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sensors | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(5 rows)
I’ve covered the template databases, and they can safely be ignored. The only “real” databases in this installation are examples, postgres, and sensors. The postgres database is a default and is often used as scratch space; hopefully nobody is using it as their actual production database.
Each of these is distinct and can not interact with the data contained in the others. Admins familiar with other database engines might find this odd, but that’s how Postgres works. Usually databases that require lots of data sharing use schemas to keep tables in namespaces within the database. And of course, I have a long-winded explanation of this as well.
Either way, we have a list of databases from the instance we want to examine. How do we connect? Well, if the -l
flag shows us a list of available databases, what happens if we remove it and append the name of a database?
psql -p 5440 sensors
psql (9.6.1)
Type "help" for help.
sensors=# SELECT 'Hello World!';
?column?
--------------
Hello World!
(1 row)
sensors=# help
You are using psql, the command-line interface to PostgreSQL.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
Sweet! we connected to the sensors
database, successfully executed a basic query, and got a bit of assistance from the client software itself. The psql
client is an extremely robust tool for interacting with a Postgres database. There are a lot of shortcut commands, and entering \?
lists all of them. Be ready to scroll!
The one that really matters is \d
and its variants. It’s short for “describe” and does exactly that. It can retrieve lists of tables, schemas, views, indexes, or any other database object. It can also provide greater detail about the object in question. Let’s use it to list the available schemas, see the contents of a schema, and get more insight on a particular table.
sensors=# \dn
List of schemas
Name | Owner
--------+----------
logs | postgres
public | postgres
(2 rows)
sensors=# \dt logs.*
List of relations
Schema | Name | Type | Owner
--------+------------+-------+----------
logs | sensor | table | postgres
logs | sensor_log | table | postgres
(2 rows)
sensors=# \d logs.sensor
Table "logs.sensor"
Column | Type | Modifiers
---------------+-----------------------------+------------------------------------------------------------
sensor_id | integer | not null default nextval('sensor_sensor_id_seq'::regclass)
location | character varying | not null
reading | bigint | not null
modified_date | timestamp without time zone | not null
Indexes:
"sensor_pkey" PRIMARY KEY, btree (sensor_id)
"idx_sensor_location" btree (location)
The \dn
command shows the namespaces in the current database. From that list, we see that the logs
schema might have something interesting in it. So we then rely on \dt
to list all of the tables the logs schema contains. From there, we can just use the regular \d
describe command to get all of the information Postgres has about the structure of the logs.sensor
table, complete with indexes, constraints, triggers, and so on.
Now that we can properly introspect to see where we are, it’s time to look deeper.
Peering Into the Void
A common action upon connecting to a database is to view connections and see what they’re doing. This is when we turn to the Postgres system catalog, a series of tables and views that reflect the current state of the database. The pg_stat_activity
view will tell us anything we need to know regarding user connections. For example:
SELECT pid, datname, usename, state, query_start
FROM pg_stat_activity;
pid | datname | usename | state
-------+---------+----------+---------------------
10392 | sensors | postgres | active
16202 | sensors | postgres | idle in transaction
\x
Expanded display is on.
SELECT * FROM pg_stat_activity WHERE pid=16202;
-[ RECORD 1 ]----+------------------------------------
datid | 16384
datname | sensors
pid | 16202
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2016-12-16 08:58:44.584559-06
xact_start | 2016-12-16 08:58:46.34946-06
query_start | 2016-12-16 08:59:01.336305-06
state_change | 2016-12-16 08:59:01.336895-06
wait_event_type |
wait_event |
state | idle in transaction
backend_xid |
backend_xmin |
query | SELECT * FROM logs.sensor LIMIT 10;
The first query is a pretty basic terse listing of the activity of all clients. We noticed one of them was idle within a transaction and decided to view everything Postgres knew about that connection. From the various listed times, we can see that it’s only been idle for a few seconds, so there’s no cause for alarm. We can also see the last query the user executed against the database, even if it completed long ago. This is all useful debugging information.
Viewing all of that as a single row would have been extremely inconvenient, so we utilized another of the psql
commands and used \x
to enable extended output. When this option is active, psql
presents every column within a result as a key/value row pair. It’s not especially convenient for hundreds or thousands of rows, but it’s pretty indispensable when viewing the results of a wide column list.
A good combination of the columns in the pg_stat_activity
view might look something like this:
-- This query for 9.6
SELECT pid, datname, usename, state, client_addr, wait_event,
now() - query_start AS duration,
substring(query, 1, 30) AS query_part
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY now() - query_start DESC
LIMIT 10;
-- This query for 9.2 to 9.5
SELECT pid, datname, usename, state, client_addr, waiting,
now() - query_start AS duration,
substring(query, 1, 30) AS query_part
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY now() - query_start DESC
LIMIT 10;
pid | datname | usename | state | client_addr | wait_event | duration | query_part
-------+---------+----------+---------------------+-------------+------------+-----------------+--------------------------------
22503 | sensors | postgres | idle in transaction | 10.2.2.91 | | 00:38:21.952753 | SELECT * FROM logs.sensor LIMI
10392 | sensors | postgres | active | | | 00:00:00 | SELECT pid, datname, usename,
This query only reports non-idle connections prioritized by activity duration. From here, we can see that the connection that was idle in transaction is still idle several minutes later. Nothing is waiting on it, but maybe it’s time to clean up a bit anyway.
The Terminator
Maybe that idle connection is blocking someone and it needs to go away. There are two ways to make that happen directly within Postgres. The first and safest option is to try and cancel the offending query with pg_cancel_backend
. If that doesn’t work, we escalate to pg_terminate_backend
which actually breaks the connection and rolls back any pending transactions. Let’s try those now:
SELECT pg_cancel_backend(22503);
pg_cancel_backend
-------------------
t
SELECT pid, state, query
FROM pg_stat_activity
WHERE pid = 22503;
pid | state | query
-------+---------------------+-------------------------------------
22503 | idle in transaction | SELECT * FROM logs.sensor LIMIT 10;
(1 row)
SELECT pg_terminate_backend(22503);
pg_terminate_backend
----------------------
t
SELECT pid, state, query
FROM pg_stat_activity
WHERE pid = 22503;
pid | state | query
-----+-------+-------
(0 rows)
The pg_cancel_backend
function didn’t “work” because it only cancels the currently operating query. If there’s a transaction in place, the user will simply return to their database prompt. An application that was stuck may be fixed by canceling a stuck query. If the application is threaded, it may have started a transaction and is injecting commands as another portion of the program does work elsewhere. If that process gets interrupted, the transaction may never complete.
That’s when we roll out pg_terminate_backend
, which discards such polite approaches. This is the kill shot most DBAs are familiar with. Queries are canceled. Transactions are rolled back. Connections are broken. Resources are reclaimed. This isn’t quite cold-blooded murder, though. From the perspective of Postgres, we’re merely pointing a rifle at the offending connection and politely requesting it to vacate the premises.
Unlimited Cosmic Power
For DBAs more comfortable with GUIs, pgAdmin might be a better interface than psql
. Version 3 is a stand-alone client for Windows, Linux, or OSX. In smaller environments, it’s not entirely uncommon to already have connection capabilities to servers hosting Postgres instances. Just connect and thrill in exploring the drilldown menus, object creation scripts, query interface, and all of the other niceties it provides.
Many of the operations that really matter, like terminating unruly connections, are only available to superusers. There are a couple ways to get this kind of access. In a pinch, it’s easy to connect to the host server as above and grant superuser access directly to ourselves from psql
:
ALTER USER sthomas WITH SUPERUSER;
\du sthomas
List of roles
Role name | Attributes | Member of
-----------+------------+-----------
sthomas | Superuser | {}
Hopefully though, the current DBA set aside a dedicated role for this kind of thing. Giving users direct superuser access is usually frowned upon. Imagine we have a sysdba
role with superuser capabilities and we want to share in the glory. This would be the correct approach:
ALTER USER sthomas WITH NOSUPERUSER;
GRANT sysdba TO sthomas;
\du s*
List of roles
Role name | Attributes | Member of
-----------+-------------------------+-----------
sthomas | | {sysdba}
sysdba | Superuser, Cannot login | {}
However we should note that since it’s the role with superuser access, we aren’t actually superusers. Like with sudo
, we need to “activate” our new powers before they’ll work. Postgres allows users to adopt roles they’re members of. If we wanted to perform superuser actions in Postgres from pgAdmin, we could issue this command from a query pane:
SET ROLE sysdba;
Is there more? Oh, there’s a lot more. But this is enough for a competent techie to start digging around and becoming familiar with the territory. Quick, seize control while the DBA isn’t around to stop you! They’ll never know! Viva la resistance!
Of course, an experienced Postgres DBA will know and revoke your abilities later, but it’s fun to pretend. Take the opportunity to learn more anyway; just be careful on production systems.