PG Phriday: Constipated Connections

We’ve all had it happen. One day, we run a query or function and we wait for the result. And we wait. And we wait. Eventually, we realize something is wrong and find a DBA and yell at them.

“Hey Postgres dude!” we scream. “The database is slow!”

Or maybe we can’t even get a connection. Postgres just keeps saying something about too many clients. This application isn’t launching and there are ten managers breathing down our neck and we panic.

“Hey Postgres dude!” we cry. “The database is down or something!”

Is it, though?

Maybe. Sometimes. It happens, after all. Other times, the situation isn’t so obvious and we have to dig a little deeper. All too often, the application stack is its own worst enemy and actually caused the problem to begin with. Let’s explore how that happens, starting with the usual setup:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - (s.id || 's')::INTERVAL
  FROM generate_series(1, 100) s(id);

ANALYZE sensor_log;

We now have a hilariously simple table with a mere 100 rows. Surely, nothing can go wrong here!

That’s a bold statement though, and entirely wrong. To explain why, we first need to explore a bit deeper into how Postgres handles locking. We’ve discussed MVCC in the past, so a quick summary is that it provides us with natural guarantees:

  • Readers do not block readers.
  • Readers do not block writers.
  • Writers do not block readers.
  • Writers may block writers.

Just so we’re complete, let’s test each of these assertions. Just run this on two separate connections:

BEGIN TRANSACTION;
SELECT * FROM sensor_log
 WHERE id = 10;

-- Then type this to clear the transaction
ROLLBACK;

Before running the ROLLBACK, we should get no delays from either connection. Each should produce the same results, and do so extremely quickly. This is no surprise, or at least it shouldn’t be. Readers aren’t modifying the data, so why should they block other readers?

Once we introduce a process that modifies data however, the scenario changes quite a bit. To better illustrate this, and perhaps provide a bit more insight into how MVCC works in Postgres, we’ll run another simulation. This time, we’ve included the ctid column, one of many system columns Postgres uses for it own purposes.

Here’s a quick interaction between one connection that just wants to read from sensor_log, and another that is writing to it.

-- On connection 1

BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 10;

  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50

-- On connection 2

BEGIN TRANSACTION;
UPDATE sensor_log
   SET reading = 89
 WHERE id = 10;

SELECT ctid,* FROM sensor_log
 WHERE id = 10;

  ctid   | id | location | reading |    reading_date     
---------+----+----------+---------+---------------------
 (0,101) | 10 | 10       |      89 | 2016-07-28 23:59:50

-- Then back to connection 1

SELECT ctid,* FROM sensor_log
 WHERE id = 10;

  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50

-- Then type this to clear the transaction on both connections.

ROLLBACK;

No blocking, right? Can a writer block a reader? Apparently not. Having an open transaction and reading a row in some databases will result in the write waiting until the read is complete. After the second connection modifies the data, we can see that the row is different between the two, yet there was no call for Postgres to block either operation.

Now let’s observe the ctid in those results. The particular encoding for this is (page, block), giving the physical location of the row in the table data file. After modifying the row in the second connection, we can see that the ctid changed. This is MVCC in action; the updated row is at the end of the table, yet the previous row also exists unmolested. That old row is what the first connection will see until the second connection commits the change. Neat, right?

But then we must consider the final case, where we have two independent entities writing data to the database. In circumstances where both connections are interested in the same rows, we need implicit and explicit locking to prevent inconsistent results.

With that in mind, let’s try something a bit more invasive:

-- On connection 1

BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 10
   FOR UPDATE;

  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50

-- On connection 2, this should hang

BEGIN TRANSACTION;
UPDATE sensor_log
   SET reading = 89
 WHERE id = 10;

-- Execute this on connection 1 to clear the write lock.

ROLLBACK;

The second connection hung after attempting to execute the update, didn’t it? By setting FOR UPDATE, we’ve declared our intent to modify the row we’re viewing, so Postgres treats it like any other update. With that lock in place, the second connection will wait indefinitely for the transaction to commit before making its own modifications. This is where most database engines necessarily stop when resolving write precedence.

If we return to connection two after running the rollback, we can see that it’s no longer waiting and has completed the update. But a major problem occurs if connection one never finishes its transaction. Imagine someone running ad-hoc queries against the database forgot to commit the last transaction and left for the night? Or maybe an application got stuck in a loop somewhere down the line after starting a transaction and making multiple modifications to several tables.

What then? Well, there are two possible outcomes for this roadblock:

  1. Connection two waits forever for a lock it can never obtain.
  2. The application times out without canceling the previous attempt, so it grabs a new connection and tries again. It does this until all available connections are exhausted.

Of these two results, the first is inevitable. In most cases, the problem just ends here and is where most users notice something is wrong. If we’re the second connection, we can’t tell if the database is slow or something is preventing us from proceeding. The second situation is what automation brings us. Once all connections are utilized, Postgres will refuse to make more unless it’s for a superuser.

If we examine the Postgres system catalog, we can see the problem explicitly by checking from a third connection:

SELECT pid, state, waiting, now() - query_start AS duration,
       substring(query, 1, 15) AS query
  FROM pg_stat_activity
 ORDER BY query_start;

 pid  |        state        | waiting |    duration     |      query      
------+---------------------+---------+-----------------+-----------------
 5564 | idle in transaction | f       | 00:01:48.564482 | SELECT ctid,* F
  574 | active              | t       | 00:01:43.940607 | UPDATE sensor_l
 5688 | active              | f       | 00:00:00        | SELECT pid, sta

Beware those idle in transaction connections, for they are the bane of every Postgres DBA. Unfortunately there’s no built-in way to find and correct these aside from scheduling a script to enforce a certain threshold by cancelling or killing connections that overstay their welcome.

Given what we now know, we can go even further than this. Let’s cause a deadlock!

-- On connection 1

BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 10
   FOR UPDATE;

  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50

-- Then connection 2

BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 11
   FOR UPDATE;

  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,11) | 11 | 11       |      11 | 2016-07-28 23:59:49

-- Now back to connection 1

UPDATE sensor_log
   SET reading = 911
 WHERE id = 11;

-- And finally connection 2 again

UPDATE sensor_log
   SET reading = 911
 WHERE id = 10;

ERROR:  deadlock detected

Anyone following along probably noticed a short delay before the deadlock detected error popped up. This delay is the deadlock_timeout configuration setting, which has a default of one second. This means there’s some process in Postgres that checks for circular lock dependencies every second, and will cancel the transaction of one offender when detected. It’s so easy to replicate because we simulated the basic criteria:

  1. Transaction A has locked object x.
  2. Transaction B has locked object y.
  3. Transaction A wants a lock on object y.
  4. Transaction B wants a lock on object x.

This circular loop of dependencies is what causes a deadlock, and without resolving it, both connections would have hung indefinitely. Some databases would indeed do just that. Still, our example is a bit contrived. The more common way for this to happen is between two separate tables being modified by different jobs.

Something like this:

-- On connection 1

BEGIN TRANSACTION;
UPDATE table_a SET col = 'whatever' WHERE ...;
UPDATE table_b SET col = 'whatever' WHERE ...;

-- While simultaneously on connection 2

BEGIN TRANSACTION;
UPDATE table_b SET col = 'whatever' WHERE ...;
UPDATE table_a SET col = 'whatever' WHERE ...;

If there are a series of operations we want to perform on multiple tables or rows, they should always come in the same order every time. In the above example, depending on the WHERE clause, we may or may not get a deadlock. There’s no way to tell, and there may be no evidence in development or QA systems simply because the volume is insufficient to trigger a deadlock.

This is most common when there are multiple access vectors. Imagine an application and a job server that were developed by different teams. Separated by a role barrier, operation order might not even be a consideration. Yet without a defined and consistent approach to handling database interactions, interlocked systems that work fine in development may become a deadlock salad in production.

Systems with a lot of row contention will see frequent periods of inexplicably long waits for seemingly simple write operations. Given the right (or wrong!) conditions, this will escalate into a deadlock or two… hundred. In either case, the excessive application-driven locking massively delays standard database operation are often why the “database is slow” or it appears inaccessible for short periods.

So go ahead and yell at your DBA; they’re just as likely to yell back. And that’s OK! Nobody really cares as long as somebody fixes it. So far as preventing the problem in the first place… well, that’s an article for a different day.