These days with multiple Postgres database environments a commonality, it’s not unheard of to copy data from one to another. Perhaps a production extract is necessary to properly vet data in a staging environment. Maybe the development environment needs to update its data to reflect an especially pernicious and intractable edge case someone observed. In any of these scenarios, we are likely to extract data from multiple tables to import it elsewhere. The question is: how?
In light of recent events where GitLab suffered a massive database loss, this is a great opportunity to examine what happened from a Postgres perspective. Since Simon Riggs over at 2ndQuadrant has already chimed in on improvements Gitlib might consider in their procedures, maybe we should walk the conversation back slightly.
Foreign tables have been a headline feature of Postgres ever since the release of version 9.2. Combined with extensions, they’re the secret sauce that allows Postgres to pull data from other database engines, flat files, REST interfaces, and possibly every gas station, residence, warehouse, farmhouse, hen house, outhouse, and doghouse in the area.
Not a lot of people remember what Postgres was like before version 8.4. In many ways, this was the first “modern” release of the database engine. CTEs, Window Functions, column level permissions, in-place upgrade compatible with subsequent versions, collation support, continuous query statistic collection; it was just a smorgasbord of functionality.
Of these, CTEs or Common Table Expressions, probably enjoy the most user-level exposure; for good reason. Before this, there was no way to perform a recursive query in Postgres, which really hurts in certain situations. Want to display all related child threads in an online discussion? How about fetching the components of an organization chart by following management assignments? Better get ready for a lot of queries in a loop.
In addition to that, complicated queries were difficult to logically simplify. Reporting queries are especially prone to frequent sequences of aggregates and subqueries. It’s not uncommon to build a query that’s several pages long in this kind of context. Optimizing such an unwieldy beast is often difficult or even impossible simply due to all of the components and confusing nesting.
CTEs changed these things for the better and in the eyes of many, finally brought Postgres to parity with Oracle and its long-established recursive query support. So let’s explore what CTEs really deliver, and how they can improve our Postgres experience—caveats and all.
There are a smorgasbord of database engines out there. From an outside perspective, Postgres is just another on a steadily growing pile of structured data storage mechanisms. Similarly to programming languages like Rust and Go, it’s the new and shiny database systems like MongoDB that tend to garner the most attention. On the other hand, more established engines like Oracle or MySQL have a vastly larger lead that seems insurmountable. In either case, enthusiasm and support is likely to be better represented in exciting or established installations.
So why? Why out of the myriad choices available, use Postgres? I tend to get asked this question by other DBAs or systems engineers that learn I strongly advocate Postgres. It’s actually a pretty fair inquiry, so why not make it the subject of the first PG Phriday for 2017? What distinguishes it from its brethren so strongly that I staked my entire career on it?