PG Phriday: How Far We've Come
With extremely fortuitous timing for my first article following the holidays, Postgres 9.5 was officially been released into the wild just yesterday. I tend to think about past releases when new versions come out, and consider everything that has changed since the early days. How early? I’ve personally been using Postgres since 2001, when my new employer bellyached about their Postgres 6.5 database crashing frequently and generally making their lives more difficult. So as a newly minted advocate of the LAMP stack, I purged it in favor of MySQL.
Saying “things are different now,” would be the understatement of the year. To understand just how different, let’s review advancements in our favorite database software in just the last five years.
Postgres 9.0 came out shortly after my birthday in September of 2010. With it, came probably the biggest feature that launched Postgres into the Big Leagues: replication. Before then, all we had were warm standbys held in a continuous recovery state suitable only for disaster recovery or backup purposes. Unbelievably, that’s just the tip of the iceberg. In fact, let’s take a quick look at how performance has changed between 9.0 and 9.5 on a spare 16-core, 64GB VM we have lying around.
As usual, these numbers are from pgbench at a scale of 100 for 10 million rows in the main account table. In each case, tests are read-only prepared queries and ran for one minute—only the client count changed between runs.
Clients | 9.0 (tps) | 9.5 (tps) | % diff |
---|---|---|---|
1 | 9100 | 8700 | -4.4 |
2 | 17700 | 17200 | -2.8 |
4 | 31000 | 33400 | 7.7 |
8 | 50900 | 58900 | 15.7 |
16 | 56800 | 79400 | 39.8 |
32 | 59100 | 87600 | 48.2 |
64 | 57300 | 76500 | 33.5 |
128 | 51900 | 71300 | 37.4 |
As we can see, there’s a pretty consistent 2-5% decrease in performance for low client counts. That seems a bit odd to me, but maybe a dev will pipe in later to explain a potential cause. Otherwise, something magical happens as concurrency increases, demonstrating where newer versions really shine. Not only does the performance gap steadily widen with client count, but the drop after exceeding our available CPUs is far less drastic. Also paradoxically, our best performance isn’t reached until we have twice as many clients as CPUs.
Some of these differences could be attributed to quirks within our virtual environment, but the overall picture is fairly clear at this point. Of course, if performance didn’t improve over five years of advancements, something is terribly wrong.
But what about everything else? It’s easy to lose the plot after so many years, but here’s a quick list of the major features added in every release since 9.0:
Introduced in 9.0
- Streaming replication and hot standby. I can’t overstate how huge this was for Postgres. Now we can have as many online readable database replicas as we need, vastly improving scalability potential.
- Anonymous code blocks. Now we can randomly use various procedural languages to perform actions without having to wrap it in a function first. This makes advanced scripting much easier.
- The pg_upgrade utility hits core. No more dump/restore for upgrades, a massive boon for extremely large instances.
VACUUM FULL
rewrites tables instead of being a glorified (and abominably slow) defragmentation.
Introduced in 9.1
- Synchronous replication for those who really must have 100% data representation across multiple instances. I don’t use this because a replica can cause delays on the primary, but this is fine for non-OLTP systems.
- Foreign tables. Being able to address remote data as if it were local greatly simplifies several use cases. This is where Postgres starts becoming middleware and worming its way into tons of other projects.
- The extension API! Adding features to Postgres gets hilariously easy. I’ve written three extensions myself, and I’m just some random dude. People who actually know what they’re doing have used this to bring Postgres to the next level in business contexts.
- Unlogged tables. These are so great, they got their own article.
Introduced in 9.2
- Index-only scans. Now any query that only uses data in an indexed column doesn’t need the extra random disk read to retrieve the actual data row. Under the right circumstances, this is another excellent performance boost.
- Cascading replication. Want a huge constellation of replicas, and don’t want to overwhelm the primary server? Chain! Coordinating this is another issue entirely, but that’s a story for another time.
- The JSON datatype. This has been the feature that got Postgres in the press most recently with all of the Javascript-based engines and storage systems. With JSON as a tight data-exchange format, this is another massive win for the community.
- The pg_stat_statements extension. Install this now if you haven’t already. It’s like running pgBadger all the time without all of the irritatingly slow log analysis.
Introduced in 9.3
- Materialized views. Views that get used frequently are sometimes slow and annoying to use based on their complexity. So why not store their contents in a table that can be indexed and refreshed at will? There were extensions that did this, but having it in core makes it far more accessible.
- Updatable views. Updating table rows through simple views is something users had been requesting for a while.
- Writable foreign data wrappers, pushing Postgres even further in the direction of middleware.
- New event triggers. This makes it possible to block, audit, or otherwise control manipulation of database objects instead of just table contents.
- Data checksums. This requires re-initializing the database, but includes actual integrity calculations within the data itself, making it much easier to catch hardware-based corruption.
Introduced in 9.4
- The JSONB datatype. With a binary representation of JSON, it becomes faster and easier to manipulate internally. This includes potential for extremely advanced indexing and key path dereferencing.
- Change settings in
postgresql.conf
throughALTER SYSTEM
. No more fidgety config-file parsing! - Concurrent refresh of materialized views. Before this, refreshing a view prevented using it.
- Logical WAL decoding. Another Great Leap Forward for Postgres replication. With the right wrapper or software, replication can now span Postgres versions, or even database engines. Database write queries and contents themselves are exposed for capture, making things like multi-master (bi-directional replication) possible in the future.
- Background workers. Now database sessions can fork worker processes to carry out specific tasks. This greatly enhances advanced features of extensions, and adds potential for parallel query functionality in future versions.
Introduced in 9.5
- “Upsert” is now possible. Transforming an
INSERT
into anUPDATE
can result in a race condition that’s hard to resolve in asynchronous contexts experienced by databases. Tons of users wanted this capability without annoying workarounds like functional exception loops, and now it’s here. - Import foreign schemas. Before this, interacting with a foreign database meant manually creating every table locally. Now, we can just import the whole schema at once!
- Row-level security. This makes it possible to control what content within a table is visible based on who is reading it. Locking down data access gets even easier!
And this isn’t even including all of the performance, back-end, and syntax improvements. Merely the major bullet-points alone are intensely sobering to read; Postgres has come a long, long way in five short years. I’ve even argued in the past that Postgres has become more of an advanced middleware than a simple database engine. Based on all the above feature additions, it’s easy to see why. Postgres can scale to dozens of machines, and within that web, facilitate communication through JSON or foreign data wrappers with dozens of data formats, using several programming languages.
And every year, that web and the materials that comprise it continues to expand. Yet with all of that, performance improves as its functionality evolves. That’s not exactly unprecedented, but if there is any reason to use Postgres, it’s certainly an enviable beginning.