Postgres

Page content

I’ve been an active Postgres user and advocate since 2001, but didn’t really dive in until 2005. Over the years, I’ve contributed a respectable amount of utilities, presentations, and even a book or two.

Books

While not extensive, I do have published books related to Postgres. It’s my way of being useful beyond stuff I blog about on Friday.

PostgreSQL High Availability Cookbook - Third Edition

In 2014, I wrote a book to help other Postgres DBAs and users put together a solid cluster environment that could practically survive an outright Apocalypse. When I first started using Postgres in serious environments, I would have done nearly anything for a book like this. When Packt first approached me, I jumped at the opportunity to give back to the community. The most recent 3rd edition includes critical discussions on architecture, updated monitoring stacks such as TIG, Multi-Master replication systems, zero-downtime upgrades, and backup management. Content was also updated for Postgres version 12.

Instant PostgreSQL Backup and Restore How-to

This is more of a pamphlet than a book, but goes over the basics of backing up or restoring a Postgres database. Unfortunately it’s only recent up to 2013, meaning features of Postgres 9.4 and beyond are not covered. It also doesn’t really cover utilities like Barman or pgBackRest, both of which I highly recommend in favor of relying on the supplied tools.

Presentations

On occasion, I’ve been known to speak at Postgres conferences and other venues. Here are a few of the slide sets for those. I seem to have lost a few, but will post them here when I locate the slide sources. If slide sets were not officially released or were considered proprietary, I’ll still list the talk.

What’s new in Postgres 14 - PDF

Presented at the Chicago Postgres User Group on January 18th, 2022.

This was basically just a select overview of the “high points” from the Postgres 14 release notes with examples where necessary. Material covered here includes Performance enhancements, new monitoring capabilities in system catalogs and views, various replication features, and additional functionality added to the release. It’s a good summary for those who just want to know the yearly scoop on Postgres.

The Do’s and Don’ts of Postgres High Availability

Presented at Postgres Build 2021 on December 1st 2021, the synopsis had this to say:

When most engineers think about High Availability, they tend to focus on keeping a resource online at all costs. Often omitted from consideration are the implications of Split Brain, polling intervals, Witness nodes, failure conditions, and so on. It’s easy to read a manual, follow the directions, and have a quasi-functional HA stack at the end.

But is it correct? How does the stack fulfill the actual recovery time and objectives set by the company? Are we safe because we have a synchronous standby nodes? What does “safe” even mean? Is Quorum enough to safeguard against Split Brain? Do we know the Postgres settings that can reduce node rebuild time? What is the true cost of a false failover? Is it better to go without a Primary for a certain amount of time than risk these effects? What about backup restoration? Monitoring?

To practice High Availability with Postgres, we need a strong grasp of the theory as well as how Postgres replication and storage backend contributes. A good HA stack tries to answer as many questions as possible before it is ever built. CAP, PACELC, RPO, RTO, SLA, DCS, MVCC, Pacemaker, EFM, repmgr, Patroni, Stolon, pg_auto_failover, Barman, pgBackRest, the list of techniques, tools, and combinations, is both interminable and overwhelming. It’s almost as if you could write a book on the subject and barely scratch the surface. I know, because I’ve done it.

But at the center of it all, is the kernel of right and wrong, the “Do’s” and “Don’ts”. The rest, such as how we get there, derives from them. If we can answer a few simple questions about what we’re trying to achieve with our Postgres stack within the capabilities it provides, the rest practically becomes a trivial exercise.

Best Practices for Bulk Data Loading in PostgreSQL

This was a company Webinar presented in November of 2020 on the topic of loading data into Postgres. Unfortunately the video was lost during the merger with EDB. It may seem a trivial topic, but I covered everything from INSERTs to COPY statements, with everything in between such as multi-statement inserts. Toward the end, I bring in batches, parallelism, and how various Postgres settings can be tweaked for even faster performance. It’s even possible to forgo certain table structures (temporarily) for an additional improvement.

Enhancing Postgres High Availability - YouTube (partial)

This was a company Webinar presented in July of 2020 covering several aspects of High Availability and Postgres. This is more of an architectural discussion and covers a lot of theoretical topics such as PACELC, Consensus, Quorum, Split Brain, Network Partitions, and much more. I cover the importance of connection routing and the specific role of tools like HAProxy in more advanced stacks. This was also the preview of HARP, the High Availability Routing for Postgres stack I designed for BDR which introduced consensus-driven failover in Multi-Master Postgres.

The video link is only a 7-minute compilation. The full video was lost in the merger with EDB and is no longer available.

Help! My Database is Empty! -

Presented at Postgres Open 2019 on September 12th, let’s just go with the abstract on this one.

Do you know what ETL is, and hate it with a fiery passion? Maybe you’re sick of exporting to various reporting systems, just so Frank from Accounting can complain about everything being slow. Or perhaps there’s an app that’s manually slurping in data from various sources, only to glue it back together using a bunch of loops and other code. Gross.

Let’s use Postgres to play the ultimate shell game, and show how an empty database… may be a great tool.

Banking on Postgres - YouTube

This was a company Webinar presented in April of 2019 where I discussed the various considerations banks and financial institutions will have to take into account when running on Postgres. I go over architecture, failover techniques, backups, various server parameters, VACUUM settings, and a few other odds and ends.

Conflicts and You - The Implications of Multi-Master Operation

Presented at 2QPGConf 2018 on December 5th, this was a talk that covered the ins and outs of Multi-Master conflicts. Topics such as the various types of conflicts, how they might arise, how they can be avoided, and how some particularly onerous divergent conflicts can complicate matters. I covered some basics on cluster theory as well, and special data types such as Conflict Free Replicating Datatypes (CRDTs) to merge data rather than overwrite it. This presentation also includes my infamous “Don’t do that” slide which summarizes the best ways to get the most from BDR while avoiding conflicts.

The Evolution of Postgres High Availability

Presented at Postgres Open 2018 on September 12th, let’s just go with the abstract on this one.

Postgres has embodied several iterations of availability in its storied history. Continuous file-based recovery, mirrored storage, failover automation tooling of various description, replication; it’s an endless march of progress.

This talk will focus on the journey Postgres has taken, how each step engendered the next. We’ll discuss the strengths and weaknesses of each landmark, and the mad scramble toward the elusive zenith. And most importantly, we’ll see where this wild ride will eventually lead. Is BDR and Multi-Master the deliverance we’ve been waiting for?

There’s only one way to find out.

Data Integration with PostgreSQL - YouTube

This was a company Webinar presented February 13th, 2018 where I covered the use of Postgres as a kind of application middleware. It was a bit of a worksheet where MySQL and Postgres interacted and exchanged data, and I explained how query pushdown and other data marshaling techniques actually worked. I also explained more advanced topics such as representing a REST interface with a foreign table through MultiCorn. I even brought in Kafka and MongoDB for an unholy amalgam where Postgres was merely the glue facilitating queries among truly disparate sources.

BDR Around the Globe

Presented at 2QPGConf 2017 on November 9th, this was a walk around of how Postgres BDR is intended to work in a full cluster. It covers standard bi-directional replication topics like latency, conflict management, distributed sequences, basic deployment, and how DDL replication is managed.

The Immortal Postgres Web

Presented at Postgres Open 2017 on September 8th, let’s just go with the abstract on this one.

N elephants went out to play, out on a spider’s web one day. They had such enormous fun, that they lived until the end of time had come.

In the world of High Availability, ensuring Postgres stays online has normally been an adventure in wrestling the ultimate failover stack: DRBD + LVM + Pacemaker. But this unweildy monstrosity has defeated many a worthy DBA. What if there was a stack that wasn’t nearly impossible to set up (or even comprehend), and was arguably even more versatile? Thanks to Patroni and its ilk, it’s easier than ever to make a Postgres cluster practically indestructible. And even though it was generally designed for a single cluster, it’s possible to harness it into a meandering web of multiple clusters. Each a thundering herd of primaries and followers, each leveraging a carefully cultivated architecture of shared resources.

In this session, we’ll have a long talk about Postgres, Patroni, HAProxy, and Zookeeper. We’ll discuss why we selected these components, and how they compare to HA stacks of the past. How we can organize these pieces such that they can support a single Postgres cluster, or ten thousand. No more VIPs or migrant DNS entries. No more network headaches to try and force everything onto the same WAN or VPN. Connect to any Postgres cluster from any datacenter, and always reach a writable (or read-only) target. Kill the primary, and the cluster will survive. Add a node to the pool, and it will grow stronger.

This is Postgres’ Final Form.

Elephant Herd as a Service - PDF

Presented at Postgres Open in 2016. This covers a Postgres management suite named ElepHaaS. I wrote this specifically to manage large clusters of Postgres servers. At the time I wrote it, we had over 120 Postgres instances on about 80 servers. It’s not really feasible to perform managed failovers, replica rebuilds, and any number of other actions once cluster count gets this high.

Deep Database Dive - PDF

Presented at UIUC in 2016. The professor of the Database Design and Management course at UIUC asked me to put together a sample database design from start to finish. Along the way, I also go into some basic fundamentals students might want to know, but may not consider. It’s the first time I’ve “taught” a university class.

Bones of High Availability - PDF

Presented at Postgres Open in 2015. This is essentially a quick summary of my 2014 book on the same subject. The presentation starts with a blank slate and constructs all of the elements of what makes a scalable and robust Postgres architecture.

Monitoring PostgreSQL With Style - PDF

Presented at Postgres Open in 2014. What better way to monitor Postgres states by mining the system catalog with arbitrary queries? This presentation goes over combining Graphite and collectd to build a client-server aggregate system for monitoring Postgres statistics. It’s certainly a pretty handy stack. This information can also be found in chapter 5 of Postgres High Availability Cookbook.

High Availability with PostgreSQL and Pacemaker - PDF

Presented at Postgres Open in 2012. Pacemaker is an extremely versatile piece of automation software. There are lots of these presentations on combining it with Postgres and DRBD to create a fully automated Postgres failover pair. The cool thing about this is that it operates below the filesystem itself, meaning Postgres doesn’t even know it’s running on another server if a failover occurs. Not even streaming replication is as up-to-date as this approach unless you’re using synchronous replication. In which case, there’s much greater overhead. This information can also be found in chapter 9 and 10 of Postgres High Availability Cookbook.

NVRam for Fun and Profit - PDF

Presented at Postgres Open in 2011. Back before Fusion-io became a subsidiary of SanDisk, they made some of the first NVRAM-based PCIe storage devices. These things were ridiculously fast, operating at almost half the speed of RAM. At the time, our SAN wasn’t up to the task of serving the morning rush of users, so we found something that was. The summary of this presentation is: Holy crap you need one of these!