PG Phriday: Why Postgres
There are a smorgasbord of database engines out there. From an outside perspective, Postgres is just another on a steadily growing pile of structured data storage mechanisms. Similarly to programming languages like Rust and Go, it’s the new and shiny database systems like MongoDB that tend to garner the most attention. On the other hand, more established engines like Oracle or MySQL have a vastly larger lead that seems insurmountable. In either case, enthusiasm and support is likely to be better represented in exciting or established installations.
So why? Why out of the myriad choices available, use Postgres? I tend to get asked this question by other DBAs or systems engineers that learn I strongly advocate Postgres. It’s actually a pretty fair inquiry, so why not make it the subject of the first PG Phriday for 2017? What distinguishes it from its brethren so strongly that I staked my entire career on it?
Postgres isn’t new. It didn’t enjoy the popularity that practically made MySQL a household name as part of the LAMP stack. It didn’t infiltrate corporations several decades ago as the de facto standard for performance and security like Oracle. It isn’t part of a much larger supported data environment like SQL Server. It isn’t small and easy like SQLite. It’s not a distributed hulk like Hadoop, or enticingly sharded like MongoDB or Cassandra. It’s not in-memory hotness like VoltDB.
It’s just a regular, plain old ACID RDBMS.
It does after all, have all of the basics many expect in an RDBMS:
- Tables, Views, Sequences, etc.
- Functions in various languages
- Point In Time Recovery
Certain… other database platforms weren’t so complete. As a consequence, Postgres was preferred by those who knew the difference and needed that extra functionality without breaking the bank. It’s not much, but it’s a great way to develop a niche. From there, things get more interesting.
For the most part, being boring but reliable was a fact of life until the release of 9.0 when Postgres introduced streaming replication and hot standby. Postgres was still a very capable platform before that juncture, but built-in high-availability made it more viable in a business context. Now the secondary copy could be online and supply query results. Now the replica would lag behind the primary by a small handful of transactions instead of entire 16MB segments of transaction log files.
Postgres had finally joined the rest of the world in that regard. MySQL used a different mechanism, but that was one of its selling-points for years before Postgres. The primary distinction is that Postgres streams the changes at a binary level, meaning very little calculation is necessary to apply them. As a result, Postgres replicas are much less likely to fall behind the upstream primary.
The second I tested this feature in 2010, any lingering doubts about the future of Postgres vanished.
Cult of Extensibility
The next huge—and arguably most important—advancement in Postgres accompanied the release of 9.1: extensions. The true implications here are hard to overstate. Not all of the internal API is exposed, but extensions make it possible for practically any inclined individual to just bolt functionality onto Postgres. When Postgres 9.2 added foreign data wrappers, even arbitrary alternative backends became a possibility.
Hijack the query planner to route data through video card CPUs? Got it. Add basic sharding and distributed query support? No problem. Interact with Cassandra, SQL Server, or even Facebook? Easy peasy. Store data in analytic-friendly column structure? Child’s play.
Perl has the dubious honor of being labeled the Swiss Army Chainsaw of languages because it enables a practitioner do anything. Extensions convey almost that same capability to Postgres. And while a badly written extension can crash your database, good ones can elevate it beyond the imaginations and time constraints of the core developers.
Extensions that provide enough added value have even inspired fully supported internal adaptations, as in the case of materialized views in Postgres 9.3. What other database does that?
Consider what happens when these features are combined.
- Create a materialized view that refers to a remote table.
- Refresh the above view before using it in a report.
- Alternatively, siphon updated rows from the view into a more permanent aggregate summary table.
- Get local data processing performance in ad-hoc analytics over heterogeneous platforms.
Now Postgres can be the central nexus for a constellation of various data environments and provide snapshot analysis for the live data. Without a convoluted ETL infrastructure. Technically the materialized views or intermediate aggregate tables aren’t strictly necessary, so Postgres wouldn’t even need to store actual data. Such a configuration would be hilariously slow, but now the ironic scenario exists where Postgres can power a database empty of actual contents.
The 9.2 release transformed Postgres into a platform, and one of the reasons I don’t use the SQL part of PostgreSQL anymore.
Developers, Developers, Developers!
The folks hacking on the Postgres code are both insanely skilled and notoriously available. It’s almost a running joke to guess which of the core devs will answer a basic SQL question first. There’s practically a race to answer questions in the mailing lists regardless of sophistication or perceived merit, and anyone subscribed to the list can participate.
Their dedication to fostering community interaction is unrelenting. While not quite as organized as the Linux kernel developers thanks to Linus’ role as benevolent dictator, they’ve pushed Postgres forward every year. Due to their strict commit-fests and automated testing and code review, they’ve delivered a stable update roughly every year since 2008. Is there another database engine that can boast the same?
And every release has at least one headliner feature that makes upgrading worth the effort. Every. Last. Version.
- 8.4: Window functions + CTEs
- 9.0: Streaming replication
- 9.1: Foreign tables, extensions
- 9.2: Cascading replication, JSON support
- 9.3: Materialized views, event triggers, data checksums
- 9.4: JSONB, background workers, logical WAL decoding
- 9.5: Upsert
- 9.6: Parallel execution
- 10.0?: Standby quorum, native table partitioning
While it would be wrong to demand that kind of dedication and quality, appreciating it is quite a different story. The community pushes Postgres forward because the devs give it a voice. That’s rare in any project.
In the end, I consider it a privilege to even participate from the sidelines. Is it perfect? Of course not; I’ve pointed out serious flaws in Postgres performance that have yet to be successfully addressed. Yet given the alternatives, and what Postgres really delivers when it’s fully leveraged, I can’t even think of a better commercial RDBMS.
Why Postgres? Maybe the better question is: why not?