PG Phriday: Tackling Intractable ACLs

February 5th, 2016 | Published in Database, Tech Talk | 2 Comments

Say that three times fast! Joking aside, managing database object access is a headache for users and DBAs alike. Postgres isn’t really an exception in that regard, but it does provide a few tools to greatly reduce the pain involved. The crazy thing is that few people even know this feature exists. I’ve known about it for a while myself, but it always slips my mind because it feels so wrong not to explicitly grant permissions.

What does that mean? Well, consider we have a schema with some existing tables, and a role we want to have read permission on tables in that schema. Here’s one as an example:

CREATE TABLE acl_app.t1 (foo INT);
CREATE TABLE acl_app.t2 (bar INT);
CREATE ROLE acl_read;
GRANT USAGE ON SCHEMA acl_app TO acl_read;

If this were a preexisting schema, normally we would grant read permission to tables like this:

GRANT SELECT ON acl_app.t1 TO acl_read;
GRANT SELECT ON acl_app.t2 TO acl_read;

And that’s also the usual suggestion for grants after tables are created. Create the table, then grant the permissions. It’s fairly straight-forward, and an expected part of database administration. But what about when we have an existing table with dozens or hundreds of tables? Doing the grants manually as shown above would be monumentally irritating! In fact, for many database systems, the only way forward is to use system catalogs to generate a script, and then execute the output in the database. Postgres lets you do that:

  SELECT 'GRANT SELECT ON acl_app.' || tablename ||
         ' TO acl_read;'
    FROM pg_tables
   WHERE schemaname = 'acl_app'
) TO '/tmp/grant_perms.sql';
\i /tmp/grant_perms.SQL

But the kind Postgres devs have also provided us with some extremely useful shorthand, because while usable, the script approach is something of an ugly hack. Here’s how that looks:

   TO acl_read;
\z acl_app.*
 Schema  | Name | TYPE  |     Access privileges     
 acl_app | t1   | TABLE | postgres=arwdDxt/postgres
         |      |       | acl_read=r/postgres       
 acl_app | t2   | TABLE | postgres=arwdDxt/postgres
         |      |       | acl_read=r/postgres

As we can see here, the permissions show that the acl_read role can read the tables in this schema. Unfortunately this is a one-time operation. All subsequent tables created in the future will not be readable by the acl_read role. Here’s what happens if we create a new table now:

CREATE TABLE acl_app.t3 (baz INT);
\z acl_app.*
 Schema  | Name | TYPE  |     Access privileges     
 acl_app | t1   | TABLE | postgres=arwdDxt/postgres
         |      |       | acl_read=r/postgres       
 acl_app | t2   | TABLE | postgres=arwdDxt/postgres
         |      |       | acl_read=r/postgres       
 acl_app | t3   | TABLE |

Now we have a new table that’s only accessible by the postgres user, since that’s who created it. In most database software, that’s where the story ends. We can either re-run the above command after adding new tables, or ask developers to add an explicit GRANT statement following every new table, for every user or role that needs access.

That’s not so hard when using a template. But templates aren’t always maintained, new hires might not know where they are or use them properly, and sometimes people just plain forget. If testing or staging environments don’t exactly match production, or developers use these environments with elevated privileges, missing grants might not show up until the new tables and the associated code hit production. What then?

Well, Postgres provides some non-standard syntax that solves this conundrum. We can actually modify the default privileges for several database objects, so they’re explicitly set upon creation. Let’s try this out with another new table:

   IN SCHEMA acl_app
   TO acl_read;
CREATE TABLE acl_app.t4 (buz SERIAL);
\z acl_app.*
 Schema  |    Name    |   TYPE   |     Access privileges     
 acl_app | t1         | TABLE    | postgres=arwdDxt/postgres
         |            |          | acl_read=r/postgres       
 acl_app | t2         | TABLE    | postgres=arwdDxt/postgres
         |            |          | acl_read=r/postgres       
 acl_app | t3         | TABLE    |
 acl_app | t4         | TABLE    | postgres=arwdDxt/postgres
         |            |          | acl_read=r/postgres       
 acl_app | t4_buz_seq | SEQUENCE |

As we can see, the new t4 table has the expected grant. But this is actually a trick example. When tables are declared in Postgres with a SERIAL or BIG SERIAL column type, a new sequence is created, and grants to the parent table do not cascade to this new sequence. Thus, if a role needed to insert into this table and we granted INSERT privileges, it would not have that capability so long as it used the sequence.

Of course, the way to fix that comes through the same mechanism. For roles that require insert capabilities, two default modifications cover that scenario. Here’s another role meant for inserting into our tables, and how those permissions show up:

CREATE ROLE acl_writer;
GRANT USAGE ON SCHEMA acl_app TO acl_writer;
   IN SCHEMA acl_app
   TO acl_writer;
   IN SCHEMA acl_app
   TO acl_writer;
CREATE TABLE acl_app.t5 (bla SERIAL);
\z acl_app.t5*
 Schema  |    Name    |   TYPE   |     Access privileges
 acl_app | t5         | TABLE    | postgres=arwdDxt/postgres
         |            |          | acl_read=r/postgres
         |            |          | acl_writer=arw/postgres
 acl_app | t5_buz_seq | SEQUENCE | postgres=rwU/postgres
         |            |          | acl_writer=U/postgres

Now, when new tables are created that automatically generate an associated sequence, the acl_writer role will be able to utilize both. We can also see that the acl_read role also worked as expected. The best part is that we don’t need to remember anything when creating new objects, so long as we set everything up beforehand. And if we forget and don’t add this stuff until later? Well, we can always use the ALL TABLES IN grant syntax to fix any existing objects.

There is one caveat we must point out, however. New functions in Postgres are always granted usage to PUBLIC following creation. This is mostly because Postgres doesn’t differentiate between functions and procedures, so there is no assumption that a function will act on database objects. Normally this isn’t a problem, but if a function is created with SECURITY DEFINER, it executes as the user who created it. If the function does modify table data, that means any user with access to the database can invoke it, and that’s a massive security hole.

We recommend fixing this in all production databases with these commands:


The last command should be repeated for any other schema in the database. From that point on, all function grants must either be explicit, or use the ALTER DEFAULT... technique we’ve covered in this article. It probably seems draconian, and in most cases that might be the case. However, in critical or sensitive database systems, sometimes being a little zealous can be beneficial. Preventing accidental privilege escalations also stops malicious ones.

And data can rarely be too safe.

Tags: , ,

PG Phriday: COPY and Alternative Import Methods

January 29th, 2016 | Published in Database, Tech Talk | 4 Comments

I recently noted that the COPY command in Postgres doesn’t have syntax to skip columns in source data when importing it into a table. This necessitates using one or more junk columns to capture data we’ll just be throwing away. During that, I completely forgot that friendly devs had contributed alternative file handling methods as Foreign Data Wrappers. Most people think of foreign wrappers as a method for interacting with remote databases. Perusing the full list however, reveals some surprising data sources. Twitter? Hive? Video cards?!

Well, let’s take a quick look at a real case. Here’s a table we want to import data into. We can’t insert into it directly, because of that extra column. So we’ll use an intermediate table as a raw import target, then insert into the target table. Let’s use COPY as a first approximation with 500,000 rows:

CREATE UNLOGGED TABLE option_quote_raw
  ext_quote_id BIGINT,
  quote_date DATE NOT NULL,
  quote_time_ns BIGINT NOT NULL,
  sub_seq_no BIGINT NOT NULL,
  exchange_id INT NOT NULL,
  time_us BIGINT NOT NULL,
  feed_side VARCHAR NOT NULL,
  flag INT NOT NULL,
  bid NUMERIC(16, 8) NOT NULL,
  ask NUMERIC(16, 8) NOT NULL,
  bid_size INT NOT NULL,
  ask_size INT NOT NULL,
  trash TEXT
  ext_quote_id BIGINT,
  quote_date DATE NOT NULL,
  quote_time_ns BIGINT NOT NULL,
  sub_seq_no BIGINT NOT NULL,
  exchange_id INT NOT NULL,
  time_us BIGINT NOT NULL,
  feed_side VARCHAR NOT NULL,
  flag INT NOT NULL,
  bid NUMERIC(16, 8) NOT NULL,
  ask NUMERIC(16, 8) NOT NULL,
  bid_size INT NOT NULL,
  ask_size INT NOT NULL
CREATE INDEX idx_quote_id ON option_quote (ext_quote_id);
\timing ON
COPY option_quote_raw FROM '/tmp/quote_source.csv' WITH CSV;
TIME: 1705.967 ms
INSERT INTO option_quote
SELECT ext_quote_id, quote_date, quote_time_ns, seq_no,
       sub_seq_no, exchange_id, time_us, feed_side, flag,
       bid, ask, bid_size, ask_size
  FROM option_quote_raw;
TIME: 2062.863 ms

Just under four seconds for half a million records on a table that has a pre-existing index is pretty good. But that process is pretty convoluted. Going back to that list of foreign wrappers, there’s an included file_fdw for tying a table directly to a file. Let’s repeat the same process from above by using FDW syntax:

  ext_quote_id BIGINT,
  quote_date DATE NOT NULL,
  quote_time_ns BIGINT NOT NULL,
  sub_seq_no BIGINT NOT NULL,
  exchange_id INT NOT NULL,
  time_us BIGINT NOT NULL,
  feed_side VARCHAR NOT NULL,
  flag INT NOT NULL,
  bid NUMERIC(16, 8) NOT NULL,
  ask NUMERIC(16, 8) NOT NULL,
  bid_size INT NOT NULL,
  ask_size INT NOT NULL,
  trash TEXT
SERVER file_server
OPTIONS (format 'csv', filename '/tmp/quote_source.csv');
\timing ON
INSERT INTO option_quote
SELECT ext_quote_id, quote_date, quote_time_ns, seq_no,
       sub_seq_no, exchange_id, time_us, feed_side, flag,
       bid, ask, bid_size, ask_size
  FROM file_quote;
TIME: 3867.848 ms

Looks like the overall time is roughly the same. Of course, the best use case for this approach is when using a standard file path location that doesn’t change. The contents of the “table” do not resolve until it’s used, so the filename option could represent a standard file upload location. This means the foreign file table can be repeatedly reused, while the raw import table in our first attempt needs to be truncated and refilled before the INSERT statement.

There’s also the case of archives, where CSV files are compressed for long term storage. What if we want to examine the contents of these files, or import them at a later date? Luckily, the Multicorn extension exists as a kind of generic python FDW API. Using that API, anyone can use Python to build a quick foreign data wrapper. Someone did just that and contributed an extension that can process gzip compressed files.

Let’s see how that works:

    WRAPPER 'csvgz_fdw.CSVGZForeignDataWrapper'
  ext_quote_id BIGINT,
  quote_date DATE NOT NULL,
  quote_time_ns BIGINT NOT NULL,
  sub_seq_no BIGINT NOT NULL,
  exchange_id INT NOT NULL,
  time_us BIGINT NOT NULL,
  feed_side VARCHAR NOT NULL,
  flag INT NOT NULL,
  bid NUMERIC(16, 8) NOT NULL,
  ask NUMERIC(16, 8) NOT NULL,
  bid_size INT NOT NULL,
  ask_size INT NOT NULL,
  trash TEXT
SERVER csvgz
OPTIONS (format 'csv', file_name '/tmp/quote_source.csv.gz');
\timing ON
INSERT INTO option_quote
SELECT ext_quote_id, quote_date, quote_time_ns, seq_no,
       sub_seq_no, exchange_id, time_us, feed_side, flag,
       bid, ask, bid_size, ask_size
  FROM file_quote_gz;
TIME: 16563.306 ms

Though this takes just shy of 17 seconds for the same 500k records, probably because the stack includes Multicorn, which depends on Python. Despite multiple levels of redirection, at least it’s possible! There is a native C version, but the author neglected to include build instructions. It’s possible performance of this extension could be improved with better streaming, as zcat on the same file requires less than a second. It also opens up the exciting potential of another extension that has the compression engine as another option, so tables could also access bzip2, xz, or other compressed formats.

Compressed files aside, is there any way to handle wildcards? Sometimes it makes sense to process all CSV files in a single folder, for example. Well, that doesn’t exist yet, but at the rate new wrappers are appearing, it’s only a matter of time. It’s also possible there’s already one out there, but it simply isn’t on the Wiki page, or the author hasn’t formalized it for contribution yet. In fact, there are probably quite a few foreign data wrappers that share that particular attribute.

It’s great to see other import methods appearing to address deficits in COPY functionality. I’ll definitely be watching the Wiki page for more.

Tags: , ,

PG Phriday: Joining the Big Leagues

January 22nd, 2016 | Published in Database, Tech Talk | No Comments

I’ve maintained since about 2011, that the problem with scaling Postgres to truly immense installations could be solved by a query coordinator. Why? Most sharding systems utilize an application-level distribution mechanism, which may or may not leverage an inherent hashing algorithm. This means each Postgres instance can be treated independently of all the others if the distribution process is known. On a cleverly architected system, the application is algorithm aware, and can address individual shards through a driver proxy or accessor class.

The real complications arise when there is no clear natural column to evenly distribute data over a wide cluster. Further, when accessing that data in an OLAP context, individual IDs derived through an algorithm are essentially worthless. To address data in general, a query must be broadcast to every individual node, and the results collated and re-aggregated at some final step. Without this key component, obtaining aggregate results from a large database cluster is an exercise in futility, or over-engineered application-level query parsers.

This is one of the reasons I like Postgres-XL and its ilk over something like CitusDB. The Postgres-XL approach treats all of the data nodes like nothing more than dumb containers, and the coordinator does all of the heavy lifting. It broadcasts queries to all nodes, combines the results, and does so in a parallel context. Going back to my article on PMPP, we can see that a query must be broadcast to all of the shards, and then another query must be wrapped around the function call to summarize all the disparate results. Postgres-XL does that without the extra step, and for general users, this is a massive difference.

Given all of this, there is a limit to the current implementation, and it’s likely a side-effect of Postgres-XL coming from Postgres-XC. This code fell so far behind core, the latest stable release is 9.2. They’re working hard on merging their patches into a 9.5 branch and including other enhancements, but until that happens, this is the current state of Big Postgres.

So where are we? I’ve recently been given access to what I consider hilariously immense servers with ridiculous performance. We’re talking 48-core, 512GB servers, with 50TB of storage each. Eight of them. If this doesn’t represent the maximum possible performance of Postgres in a large cluster, I’m afraid to ask what would.

But results are what matters. It turns out that the Postgres-XL 9.2 coordinator node crashes at cluster sizes greater than 36 data nodes. With that constraint in mind, I used one server as a coordinator, and six more servers as hosts for data nodes. Then I ran a battery of load tests consisting of up to six nodes per server, with up to eight concurrent import processes. That gives us a maximum of 36 nodes, and no server is handling more than 48 simultaneous queries—perfect!

Next, I took a 90-million row sample of our 20-billion row data set, and imported it into a simple table that exactly matched the CSV format under various permutations. To make this a real-world test, I also added a single index on a column that represented stock symbols, so the imports had to do some actual work.

The results—listed below—are pretty illuminating.

Nodes / Server Loader Procs Time (s) Rows / s
1 1 213 419694
1 2 87 1027526
1 4 46 1943365
1 8 42 2128448
2 1 172 519737
2 2 89 1004436
2 4 55 1625360
2 8 21 4256896
4 1 184 485841
4 2 84 1064224
4 4 43 2078949
4 8 24 3724784
6 1 172 519737
6 2 86 1039474
6 4 43 2078949
6 8 24 3724784

It’s pretty clear that one node per server is a big loser. After that, there’s quite a bit of variance in the results, probably stemming from the fact these servers aren’t completely idle and dedicated to this testing. But there’s enough here to draw a few conclusions.

First, we hit a parallel import bottleneck relatively quickly. This is perfectly in-line with past analysis on parallel COPY regarding patches in 9.4 locking. This means the 9.5 XL branch should provide much better results. Not that three to four million imports a second is sluggish.

Second, after the outlier of one node per server, performance seems relatively consistent across the board. More nodes don’t appear to diminish (or improve) import performance. This is important considering the work the coordinator is doing in splitting up and broadcasting the CSV contents. Assuming the nodes act as independent agents, they could theoretically import CSV data directly, provided they do this while using the GTM to prevent transaction mismatches. Removing the coordinator as the bottleneck here could drastically improve import speed on extremely large clusters.

Finally, maximizing the node count is the best approach to future query needs. Even were our tests to imply that import speeds are inversely impacted by scale, the difference would need to be much larger to suggest fewer nodes. This means a naive sequential scan over the entire 90-million rows takes about half a second on this hardware due to 36 parallel workers. And in theory, up to eight queries that cause up to 100% CPU on all of the worker nodes could operate this way.

Of course, all of this data fits in memory. We’d have to load much more data than a mere 32GB to overwhelm the 3TB available on these servers. Even all 20-billion rows only consume about 2.4TB. This means the current day of data could be available real-time, while older information requiring disk seeks would be understandably slower. And yet even those queries would be a minimum of 6x faster than a single Postgres instance could operate.

All of this seems to indicate that maximizing read scalability using an architecture like Postgres-XL requires several much smaller servers, each with its own independent storage array. While these 50TB behemoths are fun to play with, a sequential scan issued from each of the nodes would quickly saturate the local storage, and likely cause severe disk contention. Six nodes each pulling several 1GB files from the same array would resemble random IO to the controller or SAN. At the very least, there will be elevated IO and queue waits. And unfortunately, 50TB of SSD-based storage (per server!) is still staggeringly expensive.

The bottom line seems to be this: the maximum scale and performance of a large Postgres-XL cluster seems to be 36 medium-size servers. If using larger servers like in these tests, storage should be provided with independent LUNs so that each node gets its own allocation. This allows read performance to scale nearly linearly and equally reduces index or vacuum maintenance concerns by spreading the data over so many nodes.

Once XL can go beyond 36 nodes, we’ll see what Postgres can really do. Hopefully this happens with the release of 9.5. Until then, Postgres can be big, but not yet truly gargantuan. Well… and still capable of returning queries before the heat death of the universe.

Tags: , , ,

PG Phriday: Database Creation Workshop

January 15th, 2016 | Published in Database, Tech Talk | No Comments

Postgres theory, feature discussion, and advocacy are fun. But even I’ll admit it’s nice to have some practical application every once in a while. This week, we’re going to build an actual database.

But what would be small enough for a proof of concept, yet somewhat interesting? Well, I’m a fan of Hearthstone. It’s a silly card game much like Magic: The Gathering, but has the distinct aura of “eSports!” Regardless, it’s a fun little time waster, and has a few hundred data points we can manipulate.

Annoyingly, there’s no official card list. But fans are nothing if not dedicated, and several folks maintain spreadsheets for their own purposes and often share them. This one had a pretty good breakdown and is up to date with the latest expansions, so it makes a good starting point.

The second page of this Excel workbook is a card list, which I saved as a quoted tab-delimited file. The first three lines are garbage, so away they went. The last four lines are also empty, so into the digital bucket in the sky they passed. What’s left is something that we can import into a junk table:

  junk1 VARCHAR,
  junk2 VARCHAR,
  junk3 VARCHAR,
  junk4 VARCHAR,
  expansion VARCHAR,
  card_name VARCHAR,
  character_class VARCHAR,
  rarity VARCHAR,
  category VARCHAR,
  tribe VARCHAR,
  mana_cost VARCHAR,
  attack VARCHAR,
  hitpoints VARCHAR,
  card_text VARCHAR,
  junk5 VARCHAR,
  junk6 VARCHAR,
  junk7 VARCHAR
COPY hs_raw_import FROM '/tmp/hearthstone_cards.tsv'
ANALYZE hs_raw_import;

All of the “junk” columns are an unfortunate reality of the current COPY command; there’s no way to ignore columns in source files. That’s OK in this case because we’re going to discard the import table when everything is done, and it’s certainly speedy. Importing all 743 rows on a test VM required less than 5ms.

Now we need to transform the data into something a bit more normalized. To do that, we need to look at the data. A technique that works really well is to examine how the data is distributed by checking the contents of the pg_stats table. Let’s look at this data:

SELECT attname, n_distinct
  FROM pg_stats
 WHERE tablename='hs_raw_import'
   AND attname NOT LIKE 'junk%';
     attname     | n_distinct 
 expansion       |          9
 card_name       |         -1
 hitpoints       |         12
 card_text       |   -0.89502
 character_class |         10
 rarity          |          5
 category        |          3
 tribe           |          7
 mana_cost       |         13
 attack          |         12

Looking at the statistics, any negative number means there’s a proportional relationship between the total row count and the number of distinct values. Positive numbers are absolute counts, and those are the ones we want to focus on for normalization.

After leveraging a bit of knowledge regarding the actual game mechanics, we can ignore elements that act as measurable metrics. Things like hitpoints, attack, and mana_cost may not have a lot of distinct values, but don’t have further associated information. But what about the rest? Let’s dig a little deeper:

  • expansion : As with most CCGs, new cards and mechanics are added through expansions. These have their own associated attributes we might want to track independently. This one warrants a new table.
  • character_class : In Hearthstone, cards are separated into nine distinct classes, with a tenth as neutral cards any class can leverage. Again, if we were writing an app, we might list several other data points about each class, so we’d want that in a separate table.
  • rarity : This one is a bit tricky. There are currently only five rarity levels, each of which has an occurrence percentage when opening new card packs. Yet Blizzard is not likely to add further rarity levels. For now, let’s leave percentages as an academic exercise, and just leave rarity as a text description.
  • category : Currently there are only minions, spells, and weapons. This is extremely likely to change in the future, assuming the game remains popular for several more decades, as Magic has. This one gets a new table.
  • tribe : Some players consider this a subcategory. Minions can sometimes be further broken down into a taxonomy that will alter game mechanics via synergistic effects. We definitely want to track this separately.

Not listed here are attributes inherent in the card text itself. The taunt attribute for example, is a significant variable game mechanic, which we’ll want to mine from the text. Since card text is often conditional, these optional elements should also be tracked separately through an attribute table.

Given all of this, here’s an extremely simplified architecture for representing the cards themselves, with ‘etc’ representing further descriptive columns. Queries to bootstrap them from the source table are at the end:

CREATE TABLE expansion (
  expansion_id    SERIAL PRIMARY KEY,
  expansion_name  VARCHAR NOT NULL,
  release_date    DATE NULL,
  etc             VARCHAR NULL
CREATE TABLE character_class (
  class_id       SERIAL PRIMARY KEY,
  class_name     VARCHAR NOT NULL,
  hero_power     VARCHAR NULL,
  etc            VARCHAR NULL
CREATE TABLE category (
  category_id    SERIAL PRIMARY KEY,
  category_name  VARCHAR NOT NULL,
  etc            VARCHAR NULL
CREATE TABLE minion_tribe (
  tribe_id    SERIAL PRIMARY KEY,
  tribe_name  VARCHAR NOT NULL,
  etc         VARCHAR NULL
CREATE TABLE attribute (
  attribute_id  SERIAL PRIMARY KEY,
  att_name      VARCHAR NOT NULL,
  effect        TEXT NULL,
  etc           VARCHAR NULL
  card_id       SERIAL PRIMARY KEY,
  expansion_id  INT NOT NULL REFERENCES expansion,
  card_name     VARCHAR NOT NULL,
  class_id      INT NOT NULL REFERENCES character_class,
  rarity        VARCHAR NOT NULL,
  category_id   INT NOT NULL REFERENCES category,
  tribe_id      INT NULL REFERENCES minion_tribe,
  mana_cost     INT NOT NULL,
  attack        INT NULL,
  hitpoints     INT NULL,
  card_text     TEXT
CREATE INDEX idx_card_expansion ON card (expansion_id);
CREATE INDEX idx_card_class ON card (class_id);
CREATE INDEX idx_card_category ON card (category_id);
CREATE INDEX idx_card_tribe ON card (tribe_id);
CREATE TABLE card_attribute (
  card_id       INT NOT NULL REFERENCES card,
  attribute_id  INT NOT NULL REFERENCES attribute
CREATE INDEX idx_card_attribute_attribute_id
    ON card_attribute (attribute_id);
INSERT INTO expansion (expansion_name)
SELECT DISTINCT expansion FROM hs_raw_import;
INSERT INTO character_class (class_name)
SELECT DISTINCT character_class FROM hs_raw_import;
INSERT INTO category (category_name)
SELECT DISTINCT category FROM hs_raw_import;
INSERT INTO minion_tribe (tribe_name)
SELECT DISTINCT tribe FROM hs_raw_import;
INSERT INTO attribute (att_name) VALUES ('taunt'), ('enrage');
    expansion_id, card_name, class_id, rarity, category_id,
    tribe_id, mana_cost, attack, hitpoints, card_text
SELECT e.expansion_id, r.card_name, cl.class_id, r.rarity,
       ca.category_id, t.tribe_id, r.mana_cost::INT,
       r.attack::INT, r.hitpoints::INT, r.card_text
  FROM hs_raw_import r
  JOIN expansion e ON (e.expansion_name = r.expansion)
  JOIN character_class cl ON (cl.class_name = r.character_class)
  JOIN category ca ON (ca.category_name = r.category)
  LEFT JOIN minion_tribe t ON (t.tribe_name = r.tribe);
INSERT INTO card_attribute (card_id, attribute_id)
SELECT c.card_id, a.attribute_id
  FROM card c
  JOIN attribute a ON (c.card_text ~* a.att_name);
ANALYZE attribute;
ANALYZE card_attribute;
ANALYZE expansion;
ANALYZE character_class;
ANALYZE category;
ANALYZE minion_tribe;

Whew! That was a lot of work! But in the end, we have a constellation that accurately represents how cards work, with future potential of adding more mechanics without the necessity of reorganizing our architecture. There are obviously more potential attributes than taunt and enrage, but we’ll leave the data mining as an exercise for anyone interested enough to really flesh out the model.

Postgres offers us a nice way to leverage this data, too. Having to decode all of these IDs is pretty annoying, so we probably want a view. Further, cards don’t change very often, and new expansions are only released every few months. This makes them a great candidate for a materialized view. Here’s how we’d do that:

SELECT e.expansion_name AS expansion, c.card_name,
       cl.class_name AS class, c.rarity,
       ca.category_name AS category,
       t.tribe_name AS tribe, c.mana_cost,
       c.attack, c.hitpoints, c.card_text,
         SELECT a.att_name
           FROM card_attribute tr
           JOIN attribute a USING (attribute_id)
          WHERE tr.card_id = c.card_id
       ) AS attributes
  FROM card c
  JOIN expansion e USING (expansion_id)
  JOIN character_class cl USING (class_id)
  JOIN category ca USING (category_id)
  LEFT JOIN minion_tribe t USING (tribe_id);
CREATE INDEX idx_all_expansion ON v_all_cards (expansion);
CREATE INDEX idx_all_class ON v_all_cards (class);
CREATE INDEX idx_all_category ON v_all_cards (category);
CREATE INDEX idx_all_tribe ON v_all_cards (tribe);
SELECT card_name, class, rarity
  FROM v_all_cards
 WHERE expansion = 'Classic'
   AND 'enrage' = ANY(attributes);
      card_name      |  class  |  rarity   
 Angry Chicken       | Neutral | Rare
 Amani Berserker     | Neutral | Common
 Raging Worgen       | Neutral | Common
 Tauren Warrior      | Neutral | Common
 Spiteful Smith      | Neutral | Common
 Grommash Hellscream | Warrior | Legendary

And of course, this particular rabbit hole can go much, much deeper. With only a few hundred rows, it may seem silly to design the database this way, but slap a front-end on this, and it’s a collection management tool that rivals the one built into the actual game client. The official client only allows players to search their collection by expansion, mana cost, and text, while our structure has far more flexibility.

This is just one example for using Postgres in an everyday scenario. There’s much more out there if you’re interested in looking. Happy hunting!

Tags: , , , ,

PG Phriday: How Far We’ve Come

January 8th, 2016 | Published in Database, Tech Talk | 1 Comment

With extremely fortuitous timing for my first article following the holidays, Postgres 9.5 was officially been released into the wild just yesterday. I tend to think about past releases when new versions come out, and consider everything that has changed since the early days. How early? I’ve personally been using Postgres since 2001, when my new employer bellyached about their Postgres 6.5 database crashing frequently and generally making their lives more difficult. So as a newly minted advocate of the LAMP stack, I purged it in favor of MySQL.

Saying “things are different now,” would be the understatement of the year. To understand just how different, let’s review advancements in our favorite database software in just the last five years.

Postgres 9.0 came out shortly after my birthday in September of 2010. With it, came probably the biggest feature that launched Postgres into the Big Leagues: replication. Before then, all we had were warm standbys held in a continuous recovery state suitable only for disaster recovery or backup purposes. Unbelievably, that’s just the tip of the iceberg. In fact, let’s take a quick look at how performance has changed between 9.0 and 9.5 on a spare 16-core, 64GB VM we have lying around.

As usual, these numbers are from pgbench at a scale of 100 for 10 million rows in the main account table. In each case, tests are read-only prepared queries and ran for one minute—only the client count changed between runs.

Clients 9.0 (tps) 9.5 (tps) % diff
1 9100 8700 -4.4
2 17700 17200 -2.8
4 31000 33400 7.7
8 50900 58900 15.7
16 56800 79400 39.8
32 59100 87600 48.2
64 57300 76500 33.5
128 51900 71300 37.4

As we can see, there’s a pretty consistent 2-5% decrease in performance for low client counts. That seems a bit odd to me, but maybe a dev will pipe in later to explain a potential cause. Otherwise, something magical happens as concurrency increases, demonstrating where newer versions really shine. Not only does the performance gap steadily widen with client count, but the drop after exceeding our available CPUs is far less drastic. Also paradoxically, our best performance isn’t reached until we have twice as many clients as CPUs.

Some of these differences could be attributed to quirks within our virtual environment, but the overall picture is fairly clear at this point. Of course, if performance didn’t improve over five years of advancements, something is terribly wrong.

But what about everything else? It’s easy to lose the plot after so many years, but here’s a quick list of the major features added in every release since 9.0:

Introduced in 9.0

  • Streaming replication and hot standby. I can’t overstate how huge this was for Postgres. Now we can have as many online readable database replicas as we need, vastly improving scalability potential.
  • Anonymous code blocks. Now we can randomly use various procedural languages to perform actions without having to wrap it in a function first. This makes advanced scripting much easier.
  • The pg_upgrade utility hits core. No more dump/restore for upgrades, a massive boon for extremely large instances.
  • VACUUM FULL rewrites tables instead of being a glorified (and abominably slow) defragmentation.

Introduced in 9.1

  • Synchronous replication for those who really must have 100% data representation across multiple instances. I don’t use this because a replica can cause delays on the primary, but this is fine for non-OLTP systems.
  • Foreign tables. Being able to address remote data as if it were local greatly simplifies several use cases. This is where Postgres starts becoming middleware and worming its way into tons of other projects.
  • The extension API! Adding features to Postgres gets hilariously easy. I’ve written three extensions myself, and I’m just some random dude. People who actually know what they’re doing have used this to bring Postgres to the next level in business contexts.
  • Unlogged tables. These are so great, they got their own article.

Introduced in 9.2

  • Index-only scans. Now any query that only uses data in an indexed column doesn’t need the extra random disk read to retrieve the actual data row. Under the right circumstances, this is another excellent performance boost.
  • Cascading replication. Want a huge constellation of replicas, and don’t want to overwhelm the primary server? Chain! Coordinating this is another issue entirely, but that’s a story for another time.
  • The JSON datatype. This has been the feature that got Postgres in the press most recently with all of the Javascript-based engines and storage systems. With JSON as a tight data-exchange format, this is another massive win for the community.
  • The pg_stat_statements extension. Install this now if you haven’t already. It’s like running pgBadger all the time without all of the irritatingly slow log analysis.

Introduced in 9.3

  • Materialized views. Views that get used frequently are sometimes slow and annoying to use based on their complexity. So why not store their contents in a table that can be indexed and refreshed at will? There were extensions that did this, but having it in core makes it far more accessible.
  • Updatable views. Updating table rows through simple views is something users had been requesting for a while.
  • Writable foreign data wrappers, pushing Postgres even further in the direction of middleware.
  • New event triggers. This makes it possible to block, audit, or otherwise control manipulation of database objects instead of just table contents.
  • Data checksums. This requires re-initializing the database, but includes actual integrity calculations within the data itself, making it much easier to catch hardware-based corruption.

Introduced in 9.4

  • The JSONB datatype. With a binary representation of JSON, it becomes faster and easier to manipulate internally. This includes potential for extremely advanced indexing and key path dereferencing.
  • Change settings in postgresql.conf through ALTER SYSTEM. No more fidgety config-file parsing!
  • Concurrent refresh of materialized views. Before this, refreshing a view prevented using it.
  • Logical WAL decoding. Another Great Leap Forward for Postgres replication. With the right wrapper or software, replication can now span Postgres versions, or even database engines. Database write queries and contents themselves are exposed for capture, making things like multi-master (bi-directional replication) possible in the future.
  • Background workers. Now database sessions can fork worker processes to carry out specific tasks. This greatly enhances advanced features of extensions, and adds potential for parallel query functionality in future versions.

Introduced in 9.5

  • “Upsert” is now possible. Transforming an INSERT into an UPDATE can result in a race condition that’s hard to resolve in asynchronous contexts experienced by databases. Tons of users wanted this capability without annoying workarounds like functional exception loops, and now it’s here.
  • Import foreign schemas. Before this, interacting with a foreign database meant manually creating every table locally. Now, we can just import the whole schema at once!
  • Row-level security. This makes it possible to control what content within a table is visible based on who is reading it. Locking down data access gets even easier!

And this isn’t even including all of the performance, back-end, and syntax improvements. Merely the major bullet-points alone are intensely sobering to read; Postgres has come a long, long way in five short years. I’ve even argued in the past that Postgres has become more of an advanced middleware than a simple database engine. Based on all the above feature additions, it’s easy to see why. Postgres can scale to dozens of machines, and within that web, facilitate communication through JSON or foreign data wrappers with dozens of data formats, using several programming languages.

And every year, that web and the materials that comprise it continues to expand. Yet with all of that, performance improves as its functionality evolves. That’s not exactly unprecedented, but if there is any reason to use Postgres, it’s certainly an enviable beginning.

Tags: , ,

« Older Posts