With all of the upheaval in the Postgres world thanks to advancements in extensions, foreign data wrappers, and background workers, it’s getting pretty difficult to keep track of everything! One of these rapidly moving targets is Postgres-XL and its role in helping Postgres scale outward. Large warehouses have a critical need for horizontal scaling, as the very laws of physics make it effectively impossible to perform aggregate queries on tables consisting of several billion rows.
A lot of DBAs are quite adamant regarding ACID compliance. I count myself among them. But unlike the other parts of the acronym, there are times when data durability isn’t actually a high priority. Data staging holding areas, temporary tables that need visibility across sessions, and other transient information do not require zealous protection. As a DBA it feels weird saying it, but there’s just some data we simply don’t care about losing.
It has occurred to me that I may have been spending a bit too much time being excited about new Postgres features and developments in the community. One of the intents of this weekly article was for educational purposes, so this week, let’s get back to basics. To that end, the topic for this week boils down to the tools available for managing Postgres instances, and how to use them. Surprisingly, it’s not as straight-forward as you might think.
A couple days ago, Robert Haas announced that he checked in the first iteration of parallel sequence scans in the Postgres 9.6 branch. And no, that’s not a typo. One of the great things about the Postgres devs is that they have a very regimented system of feature freezes to help ensure timely releases. Thus even though 9.5 just released its second beta, they’re already working on 9.6.
So what is a sequence scan, and why does this matter?
Maintaining a Postgres database can involve a lot of busywork. This is especially true for more robust architectures that allocate at least one replica for failover purposes. It’s still fairly common for a DBA to create a replica to accommodate emergency or upgrade scenarios, only to have to repeat the process when it came time to revert to the original master system. It’s not safe to simply subscribe the original primary to the newly promoted secondary, so this leaves either creating a new clone, or using rsync to synchronize all of the files first.