Back in 2005, I started a new job with a company that did work for other businesses. Their data model was designed by developers and they had no DBA, as is pretty common in smaller organizations. A critical part of our main application relied on an event log that captured customer activity and relayed their final decision to the client for reconciliation. One day someone noticed that there was a bug in the system that resolved the final decision from the event stack, and panic ensued.
Sure, managing a Postgres cluster is great. We’ve got the software installed, more tool wrappers than we can wave several sticks at, and a running cluster that’s ready to do our bidding. But what do we do with it? In fact, what is a Postgres cluster anyway? How are databases involved? What about schemas? There’s critically important nomenclature that new users may find somewhat non-intuitive, so let’s demystify things a bit.
Say that three times fast! Joking aside, managing database object access is a headache for users and DBAs alike. Postgres isn’t really an exception in that regard, but it does provide a few tools to greatly reduce the pain involved. The crazy thing is that few people even know this feature exists. I’ve known about it for a while myself, but it always slips my mind because it feels so wrong not to explicitly grant permissions.
I recently noted that the COPY command in Postgres doesn’t have syntax to skip columns in source data when importing it into a table. This necessitates using one or more junk columns to capture data we’ll just be throwing away. During that, I completely forgot that friendly devs had contributed alternative file handling methods as Foreign Data Wrappers. Most people think of foreign wrappers as a method for interacting with remote databases.
I’ve maintained since about 2011, that the problem with scaling Postgres to truly immense installations could be solved by a query coordinator. Why? Most sharding systems utilize an application-level distribution mechanism, which may or may not leverage an inherent hashing algorithm. This means each Postgres instance can be treated independently of all the others if the distribution process is known. On a cleverly architected system, the application is algorithm aware, and can address individual shards through a driver proxy or accessor class.