Maintaining a Postgres database can involve a lot of busywork. This is especially true for more robust architectures that allocate at least one replica for failover purposes. It’s still fairly common for a DBA to create a replica to accommodate emergency or upgrade scenarios, only to have to repeat the process when it came time to revert to the original master system. It’s not safe to simply subscribe the original primary to the newly promoted secondary, so this leaves either creating a new clone, or using
rsync to synchronize all of the files first.
This not only makes the DBA’s job harder, it also complicates disaster recovery tests or other scenarios that require reversing the replication stream. That’s where Postgres 9.5 and its inclusion of the
pg_rewind utility comes in. The only real criteria for use is that the
wal_log_hints setting is enabled in
Given that stipulation, imagine we have a 9.5 Postgres instance located at
/data/testdb. From here, let’s create some fake data with
pgbench and create a new replica:
pgbench -i -s 100 postgres pg_basebackup -D /data/testdb-clone -R
Next we modify the
postgresql.conf file for the clone and change the
port parameter so both instances can run simultaneously. Anything will work here, so long as it’s different from the parent instance. Let’s just say the new port is 5433 for this test.
Then comes the fun part: promoting and modifying the replica. A 30-second
pgbench test should modify enough data such that the replica and parent are no longer even minimally interchangeable.
pg_ctl -D /data/testdb-clone start pg_ctl -D /data/testdb-clone promote pg_ctl -D /data/testdb stop -m fast pgbench -p 5433 -T 30 postgres
The reason we stopped the old primary after promoting the new one is to simulate them being out of sync. Even one transaction that differs on either instance is enough to cause this effect. In fact, the previous two code blocks will be used again later, so keep them in mind.
To have some comparison, we need to test the “old” method of using
rsync. Our assumption here is that the database is too large to use
pg_basebackup. Let’s see it in action:
pg_ctl -D /data/testdb-clone stop -m fast time rsync -a /data/testdb-clone/ /data/testdb/ real 0m15.019s user 0m9.804s sys 0m10.909s
Nothing especially complicated, right? 15 seconds isn’t bad, considering we’re dealing with about 2GB of test data. If this were a database that measured in TBs instead, we’d be extremely grateful for any files rsync chose not to transfer. But what if we used
pg_rewind instead? How would that even work? Oddly enough, it’s actually pretty similar.
pg_ctl -D /data/testdb-clone stop -m fast time pg_rewind --source-pgdata=/data/testdb-clone \ --target-pgdata=/data/testdb servers diverged at WAL position 0/9E0004D8 on timeline 1 rewinding from last common checkpoint at 0/9E000430 on timeline 1 Done! real 0m0.679s user 0m0.024s sys 0m0.653s
We’re not quite done yet. Executing
pg_rewind merely “reverts” the old primary to a point where it can safely consume transaction logs from the new primary. After this, it needs to run in recovery mode and apply any pending changes as if it were a newly allocated replica. This means we need a new
recovery.conf pointing at the replica we promoted at the beginning of this process.
standby_mode = 'on' primary_conninfo = 'user=postgres port=5433'
Note that we’re only using the
postgres user to simplify our examples. We don’t recommend using the primary superuser for replication if it can be avoided. Beyond that, the only remaining steps are to start the new replica and see how long it takes to catch up.
pg_ctl -D /data/testdb start sleep 5 egrep 'standby|ready' /data/testdb/pg_log/postgresql-*.log 2015-11-06 17:56:49 GMT LOG: entering standby mode 2015-11-06 17:56:50 GMT LOG: database system is ready to accept read only connections
Wow! That’s quite a difference! Add that up and we spend about two seconds for the whole process. One to rewind the data, and another to re-apply the data modified since the systems were last in sync. Further, we can use the same process on
testdb-clone to safely convert it back into a replica.
Part of running a Postgres cluster of any description is coordination. Previously, swapping between two systems frequently was extremely slow and annoying for larger databases. This made maintenance and failovers to secondary pairs or DR tests much more complicated than necessary. Neither of those scenarios promote a healthy high availability architecture.
We want to swap to the secondary to perform maintenance as often as necessary, since that’s its role. We want to test our disaster recovery environment to prove its functionality. When doing so requires nearly a full synchronization in both directions every time, we’re much less likely to schedule these events very frequently. The
pg_rewind utility finally gives us a safe way to do that.
As a final note, while directly included as part of Postgres 9.5,
pg_rewind is actually available for 9.4 and 9.3 as well. I highly recommend integrating it into any robust HA stack; the QA and Infrastructure departments will be glad when the DR test can finish ahead of schedule for once!