Finally Done With High Availability

Well, my publisher recently informed me that the book I’ve long been slaving over for almost a year, is finally finished. I must admit that PostgreSQL 9 High Availability Cookbook is somewhat awkward as a title, but that doesn’t detract from the contents. I’d like to discuss primarily why I wrote it.

When Packt first approached me in October of 2013, I was skeptical. I have to admit that I’m not a huge fan of the “cookbook” style they’ve been pushing lately. Yet, the more I thought about it, the more I realized it was something the community needed. I’ve worked almost exclusively with PostgreSQL since at late 2005 with databases big and small. It was always the big ones that presented difficulties.

Back then, disaster recovery nodes were warm standby through continuous recovery at best, and pg_basebackup didn’t exist. Nor did pg_upgrade, actually. Everyone had their own favorite backup script, and major upgrades required dumping the entire database and importing it in the new version. To work with PostgreSQL then required a much deeper understanding than is necessary now. Those days forced me to really understand how PostgreSQL functions, which caveats to acknowledge, and which needed redress.

One of those caveats that still called out to me, was one of adoption. With a lot of the rough edges removed in recent releases of PostgreSQL, came increased usage in small and large businesses alike. I fully expected PostgreSQL to be used in a relatively small customer acquisition firm, for instance, but then I started seeing it in heavy-duty financial platforms. Corporate deployments of PostgreSQL require various levels of high availability, from redundant hardware, all the way to WAL stream management and automated failover systems.

When I started working with OptionsHouse in 2010, their platform handled 8,000 database transactions per second. Over the years, that has increased to around 17k, and I’ve seen spikes over 20k. At these levels, standard storage solutions break down, and even failover systems are disruptive. Any outage must be as short as possible, and be instantly available with little to no dependency on cache warming. Our backup system had to run on the warm standby or risk slowing down our primary database. Little by little, I broke the database cluster into assigned roles to stave off the total destruction I felt was imminent.

I was mostly scared of the size of the installation and its amount of activity. Basic calculations told me the database handled over a billion queries per day, at such a rate that even one minute of downtime could potentially cost us tens of thousands in commissions. But I had no playbook. There was nothing I could use as a guide so that I knew what to look for when things went wrong, or how I could build a stable stack that generally took care of itself. It was overwhelming.

This book, as overly verbose as the title might be, is my contribution to all of the DBAs out there that might have to administer a database that demands high availability. It’s as in-depth as I could get without diverging too much from the cookbook style, and there are plenty of links for those who want to learn beyond the scope of its content. The core however, is there. Anyone with a good understanding of Linux could pick it up and weave a highly available cluster of PostgreSQL systems without worrying, or having to build too many of their own tools.

If I’ve helped even one DBA with this high availability book, I’ll consider my mission accomplished. It’s the culmination of years of experimentation, research, and performance testing. I owe it to the PostgreSQL community—which has helped me out of many jams—to share my experience how I can.

Thanks, everyone!