Politically Bankrupt

Dear subhuman filth,

I know you’re probably too busy fornicating with your toothless inbred sister to read this, and your unkempt diaper-strewn trailer likely isn’t compatible with such technological advancements such as the internet, but we need to talk, provided you’re even capable of understanding English sentences that don’t include phrases such as “Y’all” or “Hold my beer!”, you hopelessly ignorant Redneck. Stop beating your pitbull with your fourth extra copy of the Holy Bible for one goddamn second, and listen here.

It’s your fault we’re stuck with this incomprehensibly idiotic, corrupt, rapist, Nazi, buffoon of a president. While you’re busy sodomizing hapless men in the woods and telling them to “squeal like a pig, boy!”, Donald Trump is personally, literally, doing the same to our country. While you slack-jawed country yokels chug bathtub moonshine out of chipped mason jars that still contain the remnants of last night’s “good eatin'”, Trump is visiting each country and urinating on our reputation. And every time you contemptuous reprobates “saddle on up” to the crusty old outhouse across the dirt yard from the dilapidated shanty you drunkenly refer to as your trailer’s attached garage, Trump sells your worthless support to the highest bidder that happens along and wants to help screw the country into the ground.

It’s thanks to you, and your bible-thumping, white privileged, racist, Nazi, homophobic, science denying, flat Earth, misogynist, traditional bullshit beliefs, that a man who is literally Hitler is now in the process of transforming the United States into a Fascist dictatorship. Everything you are, have been, or ever will be, is an affront to all that is Good in the world, and your mere existence is a pox upon the taint of Humanity. Jesus Tapdancing Christ, how it’s possible you have enough functioning brain matter to remember how to breath is a mystery even the greatest minds could never solve. Please do the entire planet a favor, and go die in a fire, preferably by igniting that questionable slurry of chemicals you and your buddies pour into that lake in your back yard and chuckle at the “purdy colors”.

Sincerely yours,

The Sane Half of the Country

It’s incredibly sad this appears to be the level of discourse we’ve sunk to. Regardless of how I personally feel about Trump, who is quite likely the most incompetent and self-serving person to ever hold the office, the amount of hyperbole surrounding his administration is staggering. What’s worse, the indignant zeal, the sheer vehemence directed toward those who voted for him, is nothing short of appalling. Here’s a partial collection of headlines from a quick Google search on the topic:

Not only is this incredibly divisive, it’s promoting a narrative of one-sided moral superiority. And the character assassination doesn’t just stop at Trump or his voters. Dave Rubin is a married gay man, and has been called everything from alt-right to a Nazi. Milo Yiannopoulos, while an admitted provocateur, is also married to a black man and somehow apparently a racist Nazi. Jordan Peterson is a prolific clinical psychologist, and also an alt-right Nazi. The amount of articles denigrating the character of these people is as vitriolic as it is confusing. Carl Benjamin, perhaps better known as Sargon of Akkad, is a traditional liberal who has also been tarred by the same increasingly wide brush. And while Christina Hoff Sommers is what many may consider an “old school” Feminist, she too is a misogynist Nazi. Even Lindsay Shepherd, a graduate student who made the mistake of using a video featuring Jordan Peterson, is now an alt-right super-Nazi.

It’s become quite evident that social norms are beginning to consider the fundamental Liberal concepts of Free Speech and Democracy, as passé, or possibly even tools of oppression.

Like the disdain for anyone who would dare to vote for Trump, these articles either directly advocate the position, or paint an illustration of a turning tide: Free Speech enables Hate Speech, which is used by Nazis, ergo, it must be dismantled. What is bewildering to me, is how short-sighted and ham-fisted such a result would be. What starts out as apparently reasonable limits on Hate Speech now, will become arbitrary directives barring opposing political discourse in the future. Assuming Trump is the epitome of evil, imagine him with the power to declare the act of criticizing his administration as Hate Speech. That’s the kind of world we would eventually have; no social norm is perpetual, and the pendulum of opinion will forever swing from one extreme to another.

That’s how far the extremism has gone. The very bedrock of freedom in the United States, and those who happen to disagree with the Democratic party enough to vote for the opposition, are now indicative of White Supremacy and tantamount to the worst atrocities in the history of the world. In what is essentially Coke vs. Pepsi tribalism inherent to the Human species, the only thing we’ve learned is to further de-humanize those whose opinions are not our own.

What particularly saddens me most about this, is that the side which presumably stood for logic, science, and reason, is now perpetuating some of the coarsest assertions. We all remember the bilge Rush Limbaugh and his ilk regularly pumped out, with all their talk of Lib-tards. That the discourse of The Left would stoop to that level, and even in some cases burrow even further into the bedrock, serves mainly to suggest an enemy is fair game. In reality this is hardly surprising, as multiple decades of studies suggest othering is what enables a human being to trick, harm, or destroy another person. It’s the solution to cognitive dissonance against the act of harming a member of the tribe. Someone who isn’t considered human is fair game.

So the fact that these labels: Alt-Right, Nazi, and White Supremacist, are all being willfully applied to even factually inaccurate targets, isn’t that startling. It’s much easier to hurl insults at someone who’s barely human, or not even worthy of that designation. You can punch a Nazi, so if someone disagrees with you, and you want to punch them, they must ergo be a Nazi. Demand they be fired. Confound their business prospects. Ensure they live in a gutter, if they even survive at all. They deserve it, after all. None of this is really new.

What I dislike most of all however, is the hypocrisy. The current moral busybodies on the Left are no different from the Christian Right that plagued the 80’s and 90’s. They’re just as myopic, just as indignant, and just as willing to unilaterally impose their worldview under the guise of moral righteousness. And they’re just as reprehensible. The people who voted for Trump are human beings, many with families and concerns that diverge from the narrative. They’re not a mass of grunting troglodytes that interchangeably oppress women and lynch minorities. Yes, “even they” have largely moved beyond that in the interceding decades. But the continuous and divisive rhetoric may reverse that trend. “If everything I do is racist, well I guess I’m racist, then.” In for a penny, in for a pound, as they say.

Do I disagree with them? Absolutely. Will I treat them as human trash unworthy to even lick my chamber pot? Launch a moral crusade against them when they regurgitate something on Twitter that makes me roll my eyes? Castigate them at every possible opportunity simply to signal how virtuous I am? No. I’ll tell them I believe they’re wrong, and why. I’ll be a decent human being and take a reasonable approach that used to be considered an honorable tactic. Why spend so much energy on hate? What’s even the point?

Asserting someone is evil only serves to galvanize them against you, and you’ve gained nothing. Alternatively, a black man can convince hundreds of people to quit the KKK by simply being himself. How would that scenario have played out if he took a purely adversarial approach, I wonder? Did people really forget the old adage “You catch more flies with honey than you do with vinegar” so soon? Is the mad scramble for views, clicks, and smug superiority, so enamoring that it’s worth tearing apart the brief flirtation we’ve had with equality and acceptance?

I’d like to think it isn’t, but there’s a reason I often refer to myself as a misanthrope. Despite the progress we’ve made, we’re still animals, prone to our baser instincts and reactions. I purport that we’re largely not equipped for this brave new world of instantaneous communication and self-reinforcing filter bubbles that imply our own opinions are the majority in spite of reality. In a world absent of truly Free Speech, there’s nothing left to challenge incorrect assumptions, nobody around to remind you that another whole half of the world exists that thinks you’re wrong. Without being confronted regularly by intellectual diversity, it’s easy to consider opposing viewpoints as aberrant, offensive, or even evil. And things which are evil must be destroyed.

I don’t like where that’s going. I don’t like that it’s regularly accepted and even encouraged. We’re better than that, if we want to be. The trouble is, we must actively resist the temptation to consider ourselves above reproach, and that’s a rapidly diminishing resource. It’s much easier to turn those efforts outwards than accept our own fallibility, especially when everything and everyone in our purview says we’re right. I don’t like this world we’re making in our own image, and see no easy way to stop the momentum.

I just wish people would be civil. I don’t mean hugs, unicorns, and rainbows. Just to refrain from the tactics of othering, assigning blame, and character assassination. We can move forward, if only we’d bother to try.

Adventures in Server Sitting

To support more of my tinkering in an effort to test various Postgres cluster configurations, I decided it would be really nifty to have a virtual server. I could not only spin up VMs and containers to validate architectures, but experiment to my heart’s content with other potential technologies.

At first, I was going to buy an Antsle. But the fact such a thing existed made me wonder what other kinds of dedicated virtual device hardware might exist. That led me in endless random directions until I stumbled over a link to eBay that boasted a price that seemed both ridiculous and impossible. Apparently old servers are incredibly cheap and plentiful these days. After the shock wore off, I eventually settled on a Dell PowerEdge R710. This is what I ended up getting for about $300:

The iDRAC6 Enterprise is important, as it enables a virtual console over TCP through a Java app that launches from a web browser pointed to the configured IP address. No need for monitors, here! It’s an optional module and cheap by itself, but the particular model I bought had one already. Well, it would have, but they forgot to send that part. Though I fiddled with the server after it arrived, I couldn’t really go nuts with it without the iDRAC card, because that would mean dragging it over to my desk and actually hooking it up to one of the other inputs on my monitor. No thanks.

I called the seller’s support line and got them to send me the missing card. Having taken care of that, I did some more searches to decide how I wanted to configure the beast. For example, how should I configure the storage so that it’s expandable in the future? RAID 10? JBOD with software RAID and LVM for the sake of flexibility?

Naturally I searched Reddit for R710 threads, and stumbled into the Homelab reddit. Apparently it’s one of the most popular servers to start with because it’s so hilariously inexpensive and plentiful. A few helpful replies later and I had my answer: install two more 1TB drives and install ZFS.

But that left me with a new problem. ZFS likes to have direct control of devices, and the included PERC 6/i RAID card didn’t have device pass-through mode. Fortunately Homelab was helpful in that regard as well. It’s possible to flash a H200 RAID card to Host Bus Adapter (HBA) mode because it’s basically a re-branded LSI card. Well, as with all parts for the R710, such a card is also comparatively cheap, as are the cables necessary to connect it to the SAS backplane for the drives.

Which of course led to the next stumbling block. ZFS on Linux is still a bit sketchy as a boot device since its license isn’t compatible with the Linux Kernel. As such, it’s only included by Ubuntu, and I’d be one sketchy kernel away from having an unbootable system. It’s usually better to install Linux on a separate boot device and use the hard drives as a storage pool. Well, I’m not using the included DVD drive for anything, so out it goes. It’s being replaced with a 250GB Samsung 850 Evo. Not only can that act as a stable filesystem, but I can partition it as a ZFS read/write cache as well. After reserving 50GB for the OS, 200GB beats a 1GB RAID controller any day.

I finally got most of the parts and spent the weekend reconfiguring everything. I have to say, re-flashing a RAID controller into an unsupported configuration isn’t easy, especially when the available instructions are conflicting, haven’t been updated in a year, and weren’t written specifically for the R710. These were the most comprehensive instructions I could find, and supplied all of the files I’d need to pull off the entire surgery.

First I needed to get the iDRAC6 working so I could decouple the server from my monitor. That took over an hour because the included module includes its own dedicated LAN port which is required to access the virtual console. I didn’t know it gets bypassed unless you boot into the iDRAC and change its configuration to use the dedicated port instead. That took a lot of reboots. Did I mention that the R710 takes several minutes to boot? It’s true, and that’s something that’s apparently common with most server hardware, even if you don’t count the numerous prompts to configure various add-on components.

Well, I finally got that working, and moved on to the H200. I followed the instructions I found, and got stuck because they were wrong. For whatever reason, the R710 or the H200 don’t like the megarec.exe utility, and hung every time I tried to use it for the first two steps. Cue a few more reboots while I figured out it wasn’t just really slow at flashing the firmware and was actually stuck. With a bit more research, I eventually got everything working with these commands:

  1. Erase the existing BIOS and firmware, then replace it with a generic 6GB SAS profile.

    sas2flsh.exe -o -e 6
    sas2flsh.exe -o -f 6GBPSAS.fw -b mptsas2.rom
  2. Reboot.

  3. Flash the generic LSI HBA firmware.

    sas2flsh.exe -o -f 2118it.bin

I say I eventually got it to work, because the instructions I used were based on the megarec.exe utility working as described. Since I had to use sas2flsh.exe to erase the card’s firmware instead, I also removed the card’s BIOS. So even though the card had the right software, there was no BIOS to actually invoke it. That took several more reboots to diagnose and remedy.

But finally after three befuddling hours, everything was working. I didn’t fully remove the existing RAID card just in case all of this went sideways, so I’ll be finishing the operation later. At that point, I’ll remove the old controller so it stops complaining about having no cables attached. The 850 Evo hasn’t arrived yet, so I’m in no hurry. Once it comes in, it will replace the DVD drive, and then I can shove the server in my closet where it will supply me with VMs and containers until it dies.

The next leg of my adventure will be installing Proxmox to manage the VM army. I experimented with it today to verify all four drives function properly in a RAID-10 configuration with the new controller card, but that’s about all. All I know is that I managed to get things mostly working, and I’ve got a lot further to travel before I have something I can use as a virtual laboratory.

But hey, at least I’m closer than I was before.

PG Phriday: Community Edition

Postgres is one of those database engines that carves out a niche and garners adherents with various levels of religious zeal. The community, while relatively small when compared to that of something like MongoDB, is helpful almost to a fault. Members from the freshest minted newb to the most battle tested veteran will often trip over themselves to answer questions found in the various dedicated forums, mailing lists, and chat rooms. To that end, let’s answer one particular question that ties everything together: what exactly is available to someone who wants to participate with the Postgres universe these days?

First and foremost, the Postgres site has a community page to start our trip into those most arcane depths. For obvious reasons, this is probably the most comprehensive resource available for interacting with the Postgres community in one way or the other. Let’s explore a bit.

Easily Digestible

At first glance, it’s just two innocuous paragraphs with a sprinkling of links and a couple of sidebars. Delve into the mailing lists however, and our trip down the rabbit hole begins in earnest. There are quite a few categories of interest, and subscriptions work in the usual ways: regular delivery, digest, or lurk-only for those who only feel comfortable asking questions rather than answering them. I personally subscribe to these, but there’s no reason to restrict yourself to my esoteric interests:

  • pgsql-admin: Admins and admin wannabes; the perfect place to just talk shop with other DBAs.
  • pgsql-general: The potpourri of database discussion. They’re also not kidding when they say “Please note that many of the developers monitor this area.” Several of the most prolific committers take time to answer questions from this list.
  • pgsql-performance: I spent a lot of time with a database that handled over a billion queries per day, and needed this list in order to survive. After a while, I picked up a few things and started answering questions myself. Anyone can follow that path.
  • pgsql-hackers: The Postgres devs post here. And boy do they take advantage of every opportunity to do so. Threads range from proposed features to detailed critiques of potential patch-sets. These discussions have been known to go on for years, covering hundreds of back-and-forth messages before something finally becomes fully incorporated into Postgres. There’s no better way to keep up with the future of Postgres and watch the feature vetting process in action.

That is just a tiny fragment of what’s available there, and the message traffic is fast and furious.

Chattier Than a Chat Bot

It would seem that a sizable contingent of the Postgres community maintains careful watch of the #postgresql channel over at irc.freenode.net. But it wouldn’t be the New Web if there wasn’t also a lively group over at the Postgres Team Slack.

While the mailing lists are great and responsive, there’s something to be said about live responses. In the case of the Slack channel, there’s also the benefit of formatted code pasting. Unfortunately neither of these resources are readily archivable to be available in perpetuity, nor is there any easy delineation of topics since chat is a free-for-all. This makes it somewhat difficult to refer to past discussions with any reliability—such is the black-hole of transient conversations.

Despite that, being in a virtual room with other Postgres users makes it much easier to informally swap examples and ideas. If AOL or Yahoo chat rooms were still a thing, there would likely be Postgres communities there as well. Maybe I’m looking in the wrong spots, but I find this particular resource critically under-utilized. Social media has purportedly replaced chat rooms, but it’s a demonstrably different concept that hopefully makes a comeback.


Speaking of social media, there is a Postgres Facebook group, and of course a PostgreSQL Google+ group with a related PostgreSQL community. And we can’t leave out the PostgreSQL Twitter.

Beyond these, local communities often have their own user groups so people can actually get together and treat Postgres like a hobby. I joined the Chicago PUG back when it started, and try to attend whenever I’m in town on business. There’s something to be said for the personal touch. It’s not official, but many of these groups tend to coordinate their activities through Meetup, which is handy for users of that service.

For those who live in an area without a local Postgres group and don’t feel comfortable starting one, Postgres has a bevy of conferences running through the year. One or more of these might be nearby, so why not attend and meet some of the people who make Postgres tick? I don’t go to all of these, but Postgres Open started in Chicago, so I’ve tried to make a regular appearance when I have interesting (and presentable) material. I also hear good things about PGConf US over in New Jersey.

And that brings us to PgUS and its ilk. While Postgres has no official owner, there is a lot of coordination between the various enclaves to reach some sort of consensus as to the direction the project takes. Some of this is in the form of advocacy and education, so there’s no need to be a Postgres prodigy to contribute.

For those stark few who find database engines exciting, meet up with your fellow “non-robots” and spread the word!


Part of learning about Postgres and spreading its use is consuming the copious blogs, wikis, and manuals dedicated to the subject. You likely came here from Planet PostgreSQL, which acts as the official blogroll/RSS-feed for all of the various articles that regularly churn out from sources such as myself. If not, there is no better way to stay abreast of new Postgres techniques, features, or just useful knowledge lost to the annals of history. Many of the Big Names(tm) regularly post on their personal or company blogs, and all of these critical insights get syndicated in a single place for everyone to enjoy.

Lest we forget, there’s also the PostgreSQL Wiki. Any sufficiently vetted community member can make entries here to make the Postgres world a bit more complete. Unlike the official documentation which concentrates specifically on describing functionality in the context of reference material, the Wiki is more human friendly and elastic. I often refer to new release pages to act as a cheat sheet when sharing new features. And where else can they put a complete list of foreign data wrappers or procedural languages, when most are not even officially Postgres projects?

I have to admit I usually feel too timid to taint any official Postgres resource with my inane rambling, but someone did it for me a while ago, so maybe that fear is unwarranted. We’ll see what the future holds!


The Postgres community is part of the reason I am where I am, and know what I do. I learned enough on the job and through the mailing lists that I wrote a book on High Availability and Postgres to share the wealth. It is an invaluable resource for anyone who is willing to learn, and potentially a better one for those who are capable of contributing. The people are friendly, capable, and enthusiastic. That last one always makes me laugh considering the relatively dry nature of the underlying material: database software. We’re excited users and advocates of database software. If anyone had told me as a child that this would be my future, I would have just responded with: boring!

But it isn’t. It really isn’t, and I have no real answer as to why. Maybe it’s the challenge, or perhaps the community itself and its limitless encouragement. Whatever the case, Postgres is my bag. To that end, this will also be the last official PG Phriday on this site. Perhaps it was inevitable given my dedication to the cause, but I’m now a member of the team at 2nd Quadrant. As such, it only makes sense that my Postgres-related material originate from that unified front. Their blog is also part of the Planet PostgreSQL feed, so if you haven’t already, consider this a second entreaty to become a regular reader of that site.

Here’s to the future of Postgres, and everyone that works to ensure it’s a bright one!

PG Phriday: Who Died and Made You Boss?! (The Investigatining!)

The Postgres system catalog is a voluminous tome of intriguing metadata both obvious and stupendously esoteric. When inheriting a Postgres database infrastructure from another DBA, sometimes it falls upon us to dig into the writhing confines to derive a working knowledge of its lurking denizens. The trick is to do this before they burst forth and douse us with the database’s sticky innards and it experiences a horrible untimely demise.

Bane of unwatched databases everywhere.

To prevent that from happening, it’s a good idea to check various system views on occasion. The alternative isn’t always outright disaster, but why take the chance?

An ideal place to start is the pg_stat_activity view. It tells us what each session is (or was) doing, where it originated, who owns it, and a myriad of other juicy details. There’s just one problem:

SELECT pid, usename, query FROM pg_stat_activity;
  pid  | usename  |                       query                       
 11415 | postgres | <insufficient privilege>
   482 | sthomas  | SELECT pid, usename, query FROM pg_stat_activity;

While logged on as an unprivileged user, we can only see our own activity. The query column is protected such that only superusers can view its contents for all users. This is a security measure since it’s possible a query has sensitive information embedded somewhere. However, there are a lot of useful contextual or debugging elements in that field that a service monitor or other automated tool might find illuminating.

How do we give a user access to this data—automated or otherwise—without committing the greatest of sins by making them a superuser? One common technique is to simply wrap the view with a function that returns rows. Like this:

CREATE ROLE monitor;
CREATE OR REPLACE FUNCTION public.pg_stat_activity()
RETURNS SETOF pg_catalog.pg_stat_activity
AS $$
  SELECT * FROM pg_catalog.pg_stat_activity;
REVOKE ALL ON FUNCTION public.pg_stat_activity() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION public.pg_stat_activity() TO monitor;
GRANT monitor TO sthomas;

Note that we also created a role that we can use for this kind of elevated access. By granting access to the role, we have an abstract set of privileges we can grant to, or revoke from, other users. Why track down every single grant a user might have, when we can just revoke a few roles instead?

The function is set as a SECURITY DEFINER so it runs as the user who owns it. The presumption here is that we create the function as another superuser (usually postgres), and then the query column is no longer protected for users given access to the function. Be wary when doing this however! Did you notice that we prepended the pg_catalog schema in the SELECT statement itself? This prevents the user from changing their search path and tricking the system into giving them superuser access to a view that’s really a select on a sensitive table. Sneaky!

We also needed to explicitly revoke execution access from the PUBLIC domain of all users. By default, functions created in Postgres can be executed by anyone. There are ways to change this, but most DBAs either don’t know about it, or haven’t done so. As a consequence, there are a lot of functions out there that are unnecessarily promiscuous. Elevated functions especially need this step!

After granting access to the new role, we can attempt the previous activity query again:

SELECT pid, usename, query FROM pg_stat_activity();
  pid  | usename  |                       query                       
 11415 | postgres | GRANT monitor TO sthomas;
   482 | sthomas  | SELECT pid, usename, query FROM pg_stat_activity();

Success! It feels inherently wrong to deliberately circumvent that kind of security measure, but we do as needs must. At least we did it safely. Having access to the query column is important in several contexts, especially if our application stack isn’t particularly sensitive.

Now that we have curated access to session activity, we may also need to observe how the database is working with its hardware resources. When multiple queries access data in shared memory for instance, knowing the contents might help us size it properly. It’s possible to access this information by activating the pg_buffercache extension.

With that knowledge firmly in hand and a bit of window function magic, here’s a query that we might work. In this case, I created a benchmark database and ran a couple iterations on it to generate some buffer activity.

CREATE EXTENSION pg_buffercache;
       round(COUNT(*) * 8.0 / 1024, 2) AS mb_used,
       round(c.relpages * 8.0 / 1024, 2) AS object_mb,
       round(COUNT(*) * 100.0 / c.relpages, 2) AS object_pct,
       round(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS buffer_pct
  FROM pg_buffercache b
  JOIN pg_class c USING (relfilenode)
  JOIN pg_namespace n ON (c.relnamespace = n.oid)
 WHERE n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
 GROUP BY c.oid, c.relpages
 LIMIT 10;
          oid          | mb_used | object_mb | object_pct | buffer_pct 
 pgbench_accounts_pkey |    3.22 |     21.45 |      15.01 |      53.16
 pgbench_accounts      |    2.75 |    128.08 |       2.15 |      45.42
 pgbench_history       |    0.05 |      0.62 |       8.86 |       0.90
 pgbench_tellers       |    0.02 |      0.01 |     200.00 |       0.26
 pgbench_branches      |    0.02 |      0.01 |     200.00 |       0.26

The database here is only using the default 8MB buffer, so there isn’t a whole lot of room for actual database caching. However, we can see that the account table is extremely active, and only 15% of it is cached. From this information, it’s clear we should increase shared memory to accommodate the full size of the accounts primary key, and possibly a larger portion of the accounts table.

There are, of course, other ways we can view table activity in Postgres. Consider for instance that Postgres constantly aggregates statistics as tables are written to, or read from. The statistics collector is actually rather prolific.

This query is a great way of seeing which tables are the most active:

SELECT relname AS TABLE_NAME, SUM(n_tup_ins) AS inserts,
       SUM(n_tup_upd) AS updates,
       SUM(n_tup_del) AS deletes
  FROM pg_stat_user_tables
 ORDER BY SUM(n_tup_ins + n_tup_upd + n_tup_del) DESC
 LIMIT 10;
    TABLE_NAME    | inserts | updates | deletes 
 pgbench_accounts | 1000000 |   15431 |       0
 pgbench_tellers  |     100 |   15431 |       0
 pgbench_branches |      10 |   15431 |       0
 pgbench_history  |   15431 |       0 |       0

Remember that benchmark I mentioned earlier? We can actually see how many rows each table started with, how many were modified during the benchmark itself, and the fact that the history table was part of the benchmark transaction.

These statistics are lost a number of ways because they’re generally only relevant while the server is actually running. We can also manually reset them with the pg_stat_reset() function, in case we are doing more active forensics and want to see live accumulation.

And this is only the write data. What about reads? Elements such as sequential or index scans are equally important if we want to know how well our indexes are performing, or identify tables that might need more or better indexes.

Here’s a different use of the same stat table:

       s.seq_scan, s.idx_scan,
       s.idx_tup_fetch, c.reltuples AS ROW_COUNT
  FROM pg_stat_user_tables s
  JOIN pg_class c ON (c.oid = s.relid)
 ORDER BY s.seq_scan DESC, idx_scan DESC
 LIMIT 10;
    TABLE_NAME    | seq_scan | idx_scan | idx_tup_fetch | ROW_COUNT 
 pgbench_branches |    15433 |        0 |             0 |        10
 pgbench_tellers  |    15431 |        0 |             0 |       100
 pgbench_accounts |        3 |    30862 |         30862 |     1e+06
 pgbench_history  |        0 |          |               |     23148

Those numbers line up pretty well. The fact that the account table has three sequential scans is because I was trying to force the data into the shared buffers, so I read the entire contents of the table a few times. Since the branch and teller tables are so small, reading their entire contents is probably the most efficient approach, though it may warrant investigation later if the behavior persists.

If we focus on the account table where all the real activity is taking place, it’s index fetches at all times, except for my manual scans. For a 1-million row table, that’s exactly what we want to see. The fact that there’s a 1-1 relationship with the number of scans to fetches suggests they’re single fetches from the primary key, and also tells us we have good selectivity.

And now that we know a lot of information about our tables, it’s time to move on to the users of those tables.

It only seems that way

More specifically, consider roles. In Postgres, best practices are to create a role and grant it access for multiple tables, functions, or views. Then we would grant the role to specific users that require that access. This is a lot safer than having direct assignments because it’s far easier to revoke and share among related users. Say, multiple people in a reporting department for instance.

But what if we’re given a username and want to see a full list of what they can actually access? That’s not as obvious as it might seem at first glance. Let’s do some permissions magic starting with this example group nesting.

CREATE ROLE benchmark;
CREATE ROLE nested_benchmark;
GRANT benchmark TO nested_benchmark;
GRANT nested_benchmark TO sthomas;
SELECT table_schema, TABLE_NAME
  FROM information_schema.table_privileges
 WHERE grantee IN ('sthomas', 'nested_benchmark');
 table_schema | TABLE_NAME 
(0 ROWS)

What the what!? Just to be clear, the sthomas user does have full read access to all of the pgbench tables. So why aren’t they showing up in the list?

As handy as the information schema is, a critical flaw is that it only considers directly assigned privileges. Nested roles completely circumvent its ability to report access. We need a way to fix that.

Enter the wondrous recursive CTE syntax. We can create a view that “flattens” the role nesting:

CREATE OR REPLACE VIEW v_recursive_group_list AS
  SELECT r.rolname AS user_name, g.rolname AS group_name
    FROM pg_authid r
    JOIN pg_auth_members m ON (m.member=r.oid)
    JOIN pg_authid g ON (m.roleid=g.oid)
  SELECT ag.user_name, g.rolname AS group_name
    FROM pg_authid r
    JOIN pg_auth_members m ON (m.member=r.oid)
    JOIN pg_authid g ON (m.roleid=g.oid)
    JOIN all_groups ag ON (r.rolname = ag.group_name)
SELECT * FROM all_groups;

Again, this isn’t as difficult as it looks. We’re just bootstrapping the results with the existing list of user/role associations. The UNION combines each of these with successive levels of nesting until we’ve exhausted all of them. The result is a username/group row for every role that is granted to a user, or a role which was granted to that role, and so on.

That’s something we can combine with the information schema to actually derive the full list of objects a user can access. Let’s see how that works:

  FROM v_recursive_group_list
 WHERE user_name = 'sthomas';
 user_name |    group_name    
 sthomas   | nested_benchmark
 sthomas   | monitor
 sthomas   | benchmark
SELECT t.table_schema, t.table_name
  FROM v_recursive_group_list gl
  JOIN information_schema.table_privileges t ON (t.grantee IN (gl.user_name, gl.group_name))
 WHERE gl.user_name = 'sthomas';
 table_schema |    TABLE_NAME    
 public       | pgbench_tellers
 public       | pgbench_branches
 public       | pgbench_accounts
 public       | pgbench_history
 public       | pg_buffercache

Huzzah! Now we know what sessions are doing, how active tables are, and have a firm grasp of basic security and exposure. What else is left? How about locks?

There’s a deadlock somewhere in here…

Yes, locks. Sometimes session activity gets out of hand, or transactions a little to fast and furious, and things go somewhat awry. Being able to unravel that mess is essential to keeping a database operating smoothly.

If we combine the contents of pg_stat_activity and pg_locks, we can get a lot of additional information regarding session activity. Before we were only interested in the query that was running and maybe some surrounding context. Now we can see exactly which tables, indexes, views, and other objects are locked, what kind of lock is involved, and so on.

This is probably one of my favorite views to use for this kind of work:

CREATE OR REPLACE VIEW v_activity_locks AS
SELECT a.pid, s.mode, s.locktype, a.wait_event, a.state,
       a.usename, a.query_start::TIMESTAMP(0), a.client_addr,
       now() - a.query_start AS time_used, a.query, s.tables
  FROM pg_stat_activity() a
         SELECT pid AS pid, mode, locktype,
                string_agg(relname::text, ', ') AS TABLES
           FROM (SELECT l.pid, l.mode, l.locktype, c.relname
                   FROM pg_locks l
                   JOIN pg_class c ON (l.relation=c.oid)
                  WHERE c.relkind = 'r'
                  ORDER BY pid, relname) agg
          GROUP BY 1, 2, 3
       ) s USING (pid);
GRANT SELECT ON v_activity_locks TO monitor;
SELECT pid, usename, state,
       EXTRACT(minutes FROM time_used) AS minutes,
       SUBSTRING(query, 1, 30) AS query_part,
  FROM v_activity_locks;
  pid  | usename  | state  | minutes |          query_part           |              TABLES              
  1927 | postgres | idle   |         |                               | 
 28305 | postgres | idle   |      49 | SELECT c.oid::REGCLASS::TEXT, | 
 11415 | postgres | idle   |       4 | GRANT monitor TO sthomas;     | 
 16022 | sthomas  | active |       0 | SELECT pid, usename, state,   | pg_authid, pg_class, pg_database

Despite being horrifically ugly—and hence why it’s a view—I prefer it to simply joining the two tables. The magic is in the tables column, which lists every table that is locked by the session, in alphabetical order, all in the same result. No more tracing through multiple rows to see all of the participants in a locking mess, it’s all there for everyone to see. One line per session. Add a filter to remove idle queries. Maybe an extra predicate to only consider queries which have been active for more than a few seconds. In the end, we have a single query that can instantly identify areas worthy of additional forensics, and all involved resources.

While we’re thinking about locks, the previous view only really told us what resources a particular session was using and perhaps how it was doing so. If we sorted these results by the query_start column, it wouldn’t be too difficult to see whether or not one session was blocking another. But it’s not exactly a scenario that requires an alibi, and related activity can obscure our results if the database is particularly busy.

Thanks to the newly available pg_blocking_pids function in Postgres 9.6, we can actually see what is blocking a certain action. Before this function was introduced, the only way to figure out what was causing the block was to trace which connections were using the same resources and had their locks granted, versus those that didn’t. In a sufficiently busy system, this wasn’t necessary a causal relationship, but it provided a good starting point. Now we can see exactly what’s causing the block, and we can use that information to our benefit.

Here’s an example of a lingering modification in a transaction that caused a block for another session:

SELECT DISTINCT l1.pid AS blocker_pid, a.query AS blocker_query,
       a.usename AS blocker_user, a.client_addr AS blocker_client,
       l2.pid AS blocked_pid, a2.query AS blocked_query,
       a2.usename AS blocked_user, a2.client_addr AS blocked_client
  FROM pg_locks l1
  JOIN pg_stat_activity() a ON (a.pid = l1.pid)
  JOIN pg_locks l2 ON (l1.pid = ANY(pg_blocking_pids(l2.pid)))
  JOIN pg_stat_activity() a2 ON (a2.pid = l2.pid)
 WHERE l1.granted
   AND NOT l2.granted;
-[ RECORD 1 ]--+------------------------------------------
blocker_pid    | 11415
blocker_query  | ALTER TABLE pgbench_accounts ADD foo INT;
blocker_user   | postgres
blocker_client | 
blocked_pid    | 12337
blocked_query  | SELECT COUNT(*) FROM pgbench_accounts ;
blocked_user   | sthomas
blocked_client |

Instead of using the pg_locks view and pg_stat_activity() function, we could use the v_activity_locks view to see both the blocked session and what blocked it on the same row. This would be indispensable in a really busy system, especially if a single connection has blocked several others. Such an occurrence would practically glow and demand immediate attention. Heck, it would even make an ideal automated system check.

And isn’t that what all of this really boils down to? Some of this is something that requires direct observation. Tracing user access makes sense during security audits, and the table stats can help with optimizing the database itself in many cases. But the rest is really just novel ways of combining the system views to produce quantifiable data that stands out in some way. How many queries have been running for over a minute? How many sessions have been waiting on a lock for over ten seconds?

Those first approximations are the bloodhound to a seasoned DBA. It would be silly to run all of these manually on a regular basis, but when there is a problem, all bets are off. These queries and views barely even scratch the surface of what’s really available in the Postgres system catalog. The Postgres metadata isn’t just there to reflect object organization, and curiosity goes a long way when exploring it.

So follow that bloodhound to the crime scene. There are more than enough tools available to identify, apprehend, and even punish the culprit. Or maybe just explore the stats and tweak to wring extra performance from the existing tables. This should be a good start for doing either, and infinitely more.

PG Phriday: Design Pattern Workshop

Recently on the pgsql-performance mailing list, a question popped up regarding Postgres RAM usage. In this instance Pietro wondered why Postgres wasn’t using more RAM, and why his process was taking so long. There were a few insightful replies, and they’re each interesting for reasons that aren’t immediately obvious. Let’s see what is really going on here, and perhaps answer a question while we’re at it.

Pietro presents several postgresql.conf settings, but here are the ones that matter:

shared_buffers = 24GB
effective_cache_size = 72GB
work_mem = 512MB
maintenance_work_mem = 1GB

The configured system has 96GB of total RAM. Keeping that in consideration, we see that shared_buffers is set to 24GB, which is a quarter of the server’s available memory. The only potential issue here is that it’s possible a checkpoint could be a very painful experience if a significant portion of that memory becomes dirty. Since there are no complaints of high IO waits, let’s just assume that isn’t happening.

Common advice suggests setting effective_cache_size to an amount equivalent to half of system RAM + shared_buffers. A setting of 72GB looks high, but Postgres does not allocate any of that on its own. This parameter merely influences the query planner, since higher values imply an increased probability a disk page is in either Postgres or OS caches. This is an optimization setting, not a memory allocation.

A work_mem value of 512MB is extremely high. In some cases however, that’s perfectly acceptable. The amount specified here will be used for all internal query execution operations that require a memory allocation. This means sorts, hashes, bitmaps, materializations, and so on. Thus a single query can instantiate multiple 512MB segments for each necessary operation. Effectively the user has greatly reduced the likelihood Postgres will utilize disk storage for any query operations.

And finally we have maintenance_work_mem. This is so often set to 1GB, it might as well be the default. This is the amount of RAM Postgres uses when doing things like creating indexes, vacuuming, and anything else that might need vast amounts of working space to proceed. We don’t know Pietro’s procedures, but we can make some inferences later. This setting really only matters if some part of his data onboarding includes adding new indexes or triggers an autovacuum.

Then we see how memory is performing once Postgres is running:

              total        used        free      shared  buff/cache   available
Mem:            94G         28G         46G         17M         19G         64G
Swap:           15G          0B         15G
Total:         109G         28G         61G

That’s not a lot of usage. Of that 28GB, we know Postgres has 24 simply due to the setting for shared_buffers. This means that the actual queries are only consuming 4GB at most.

This is particularly relevant because if we skip to the end of the post, we see this output from top:

 9686 postgres  20   0 24.918g 214236  12628 R 100.0  0.2   2872:38 postgres
 9687 postgres  20   0 24.918g 214212  12600 R 100.0  0.2   2872:27 postgres
 9688 postgres  20   0 25.391g 709936  12708 R 100.0  0.7   2872:40 postgres
 9691 postgres  20   0 24.918g 214516  12900 R 100.0  0.2   2865:23 postgres
 9697 postgres  20   0 24.918g 214284  12676 R 100.0  0.2   2866:05 postgres

All 16 of the backends are pegged at 100% CPU, and have been for almost 48 hours each (2872/60). That definitely corroborates Pietro’s claim that this workload has been running for 48 hours. This suggests little to no waiting for disk IO during the entire process.

So will more RAM help? Probably not. Pietro says he’s doing “time series expansion and some arithmetics”. This kind of procedure doesn’t generally require vast amounts of RAM, as calculations don’t often invoke sorting or other temporary memory structures. So what’s really going on?

The second reply gets at the real heart of the matter immediately:

You’d be better off to ask for help in optimizing your queries IMHO.

Why would Scott say that?

Well, there are a lot of ways to do the same thing. In the database world, some of these are highly preferable to others. How exactly is Pietro expanding those time series, for example? He could be doing something like this:

DO $$
  a INT;
  b INT;
  new_a INT;
  new_b INT;
  FOR a, b, ... IN SELECT col_a, col_b, ...
    new_a = some_calculation(a, b, c, ..., n);
    new_b = other_calculation(a, b, c, ..., n);
    INSERT INTO target_table VALUES (new_a, new_b, ...);
$$ LANGUAGE plpgsql;

If the necessary logic is complicated enough, that might be the right approach. However, it is extremely expensive to loop through each individual row from a driver query, allocate variables, perform calculations, and invoke a single insert. This kind of process would peg a CPU at 100% while still being highly inefficient by multiple orders of magnitude.

There’s also the possibility Pietro knows this, and the calculations were fairly simple. So maybe he decided the process could be inlined a bit more. Instead of that anonymous block (or something equivalent on the application side), he decides to write helper functions to do the calculations instead:

CREATE OR REPLACE FUNCTION some_calculation(a INT, b INT, ...)
  retval INT;
  -- Some calculations here
  RETURN retval;

Being a smart guy, Pietro knows immutable function results are cached much better than otherwise, and strict functions won’t even run if a parameter is null. With these two declarations, if any invocations take the same parameters, or a parameter is empty, Postgres may not even execute the function at all. Avoiding a jump operation is a huge optimization all by itself, so consider the benefits of preventing the overhead of a function call and its associated logic.

By defining the calculations in this manner, his process would look like this instead:

INSERT INTO target_table
SELECT some_calculation(col_a, col_b, ...),
       other_calculation(col_d, col_e, ...),
  FROM ...
 WHERE ...;

Or maybe Pietro knows the value of data staging. Caching is great, but in sufficiently complicated examples of business logic, transforming it into immutable functions may be somewhat difficult. In addition, some “calculations” may rely on the results of another query. This is where temporary and unlogged tables, or CTEs come in.

Consider the fact databases natively communicate using Set Theory. They are highly optimized, and specifically engineered to bulk-process rows in ways user-generated applications simply can’t match in all but the edgiest of cases. So the final iteration of the data onboarding might resemble this:

CREATE TEMP TABLE expand_series AS
CREATE TEMP TABLE relate_series AS
  FROM expand_series
  JOIN relate_series ON (...);
INSERT INTO target_table
  FROM basic_tally
  JOIN ...
 WHERE ...;

Or maybe Pietro likes CTEs more, so uses those instead of multiple temporary table steps. Postgres will fully materialize everything anyway. The primary benefit to temporary tables is that repeated execution of multiple steps won’t require re-running all previous elements. If this were some kind of one-time report, a CTE would arguably make more sense as it operates atomically.

Regardless, this kind of iterative transformation allows us to fully embrace Set Theory. Don’t operate on one row, but all of them. If we can organize calculations into distinct categories or steps, then we’re actually requesting the database to transform the data as a whole for each operation. For this entire data set, add some derived date column. For this entire data set, produce a standard deviation variant along this window. For this entire data set, derive these additional columns from these formulae.

That’s how databases think, and Postgres is no different in this regard. By looping through each individual result, we’re actively disrupting that process and literally making it hundreds or thousands of times slower. We could have all the memory in the world, and even find some way to fully utilize it, and it wouldn’t influence that universal truth.

Now, we don’t know what Pietro is actually doing, just that he’s convinced more RAM might help. The evidence he provided suggests that isn’t the case. Having sixteen processes running at full blast for two days either means he has a massive amount of data, or there’s some unaddressed inefficiency in his overall approach.

I suspect the latter only because it’s so common. People don’t usually think like databases, and really, who can blame them? What kind of nutcase would willingly do this for a living?

Figure 1: A Nutcase