PG Phriday: Mining for Metadata

March 18th, 2016 | Published in Database, Tech Talk | No Comments


Every good database engine has a system catalog that describes the myriad of structures that model and preserve our data. Of course this is expected, as it would be somewhat silly for a database system not to use tables to represent its internal mechanisms. But that doesn’t mean they have to be humanly readable, or even make sense without a series of views or esoteric functions to decipher them. The information_schema standard serves a necessary role in that regard, and the Postgres implementation is extremely comprehensive. Yet the regular Postgres catalog is also fairly usable, so let’s explore and see what we find.

At first glance, there is a veritable smorgasbord of catalog tables and views to choose from. So, where to start? Let’s look at something simple and see what tables are available:

\x
 
CREATE TABLE picard (
  id           SERIAL PRIMARY KEY,
  full_name    TEXT NOT NULL,
  description  TEXT
);
 
INSERT INTO picard (full_name, description)
SELECT a.id::TEXT, 'There are ' || a.id || ' lights!'
  FROM generate_series(1, 1000000) a(id);
 
ANALYZE picard;
 
SELECT *
  FROM pg_tables
 WHERE schemaname NOT IN ('information_schema', 'pg_catalog');
 
-[ RECORD 1 ]--------
schemaname  | public
tablename   | picard
tableowner  | sthomas
tablespace  | 
hasindexes  | t
hasrules    | f
hastriggers | f

This is all pretty basic info, right? Note that we had to remove tables in the information_schema and pg_catalog schemas or we’d have to wade through over 50 other records. Just more proof that Postgres uses its own catalogs to… track its catalogs. What may not be so obvious is that pg_tables is actually a view.

A DBA might find this obvious, given the concept of normalization. For the uninitiated however, database object names might be referred to multiple times through several catalog entries. To avoid using the same text value in multiple places and risking mismatches or modifying the name everywhere should it change, Postgres uses surrogate keys. Again, this approach is extremely common, but makes views a necessity to decode those proxy values.

Let’s use pg_views and examine the definition of pg_tables as well as itself:

SELECT definition
  FROM pg_views
 WHERE viewname = 'pg_tables';
 
                           definition                           
----------------------------------------------------------------
 SELECT n.nspname AS schemaname,
    c.relname AS tablename,
    pg_get_userbyid(c.relowner) AS tableowner,
    t.spcname AS tablespace,
    c.relhasindex AS hasindexes,
    c.relhasrules AS hasrules,
    c.relhastriggers AS hastriggers
   FROM ((pg_class c
     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
     LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
  WHERE (c.relkind = 'r'::"char");
 
SELECT definition
  FROM pg_views
 WHERE viewname = 'pg_views';
 
                          definition                          
--------------------------------------------------------------
 SELECT n.nspname AS schemaname,
    c.relname AS viewname,
    pg_get_userbyid(c.relowner) AS viewowner,
    pg_get_viewdef(c.oid) AS definition
   FROM (pg_class c
     LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
  WHERE (c.relkind = 'v'::"char");

What do we have, here? It appears as if both pg_tables and pg_views are derived from information in pg_class. As it turns out, pg_class is probably the most important catalog table in the system. It’s essentially a reference for nearly every database object available. Since it’s in the middle of a vast constellation of other references, practically every column in the table is a key to another low-level catalog.

Why are views a relkind of ‘v’, while tables are designated ‘r’? Well in Postgres, that ‘r’ stands for ‘relation’, as tables describe relationships across data. Probably. Yet the documentation for pg_class describes all contents as relations, not just tables. Further, the column names in pg_class don’t reflect a consistent association to the table name. Let’s look at pg_namespace, which decodes schema (namespace) names, to see why that’s relevant:

\d pg_namespace
 
  COLUMN  |   TYPE    | Modifiers 
----------+-----------+-----------
 nspname  | name      | NOT NULL
 nspowner | oid       | NOT NULL
 nspacl   | aclitem[] | 
Indexes:
    "pg_namespace_nspname_index" UNIQUE, btree (nspname)
    "pg_namespace_oid_index" UNIQUE, btree (oid)

Both pg_tables and pg_views utilize this table to decode pg_class.relnamespace into a text value. Note the ‘nsp’ prefix for all of the column names? That naming scheme is fairly consistent across most of the other catalog tables and views. The pg_proc table that tracks functions (procedures) prefixes everything with ‘pro’, while pg_index uses ‘ind’. From that perspective, pg_class is a rather notable exception.

Though a bit perplexing and lost to the annals of history, one benefit of this abnormality is that it makes pg_class easy to remember. It’s a short name, and since it’s used practically everywhere in the other catalog views, it’s almost synonymous with the catalog itself. Regardless of why, pg_class is of the first places to look for nearly anything not covered by a ‘nicer’ view. A fun example of this is exploiting the relpages and reltuples columns:

SELECT relname AS tablename, relpages*8 AS kb_used,
       reltuples AS approx_rows
  FROM pg_class
 WHERE relname = 'picard';
 
 tablename | kb_used | approx_rows 
-----------+---------+-------------
 picard    |   66592 |       1e+06

These two are generally used by the query planner in various calculations to estimate query cost. For our purposes, they provide an excellent summary of the table’s overall impact on the database. We can use sorts to find the table with the most or least rows, the biggest table on disk, combine the two to seek out tables with abnormally large rows, and so on.

The relpages column refers to the amount of database pages the table consumes. In Postgres, this defaults to 8KB, so in multiplying by 8, we can see how much physical space the table uses. In the vast majority of cases, this is more than sufficient and a quick shortcut when doing ad-hoc analysis of tables, but there are cases when an installation is compiled with a different default page size. So to be pedantic, we should repeat the same query using a couple provided decoding functions:

\x
 
SELECT relname AS tablename,
       pg_relation_size(oid) AS bytes_used,
       pg_total_relation_size(oid) AS total_bytes,
       pg_size_pretty(pg_relation_size(oid)) AS pretty_size,
       pg_size_pretty(pg_total_relation_size(oid)) AS pretty_total,
       reltuples AS approx_rows
  FROM pg_class
 WHERE relname = 'picard';
 
-[ RECORD 1 ]+---------
tablename    | picard
bytes_used   | 68190208
total_bytes  | 90726400
pretty_size  | 65 MB
pretty_total | 87 MB
approx_rows  | 1e+06

To provide a lowest common denominator, the pg_relation_size function always returns the size of the object in bytes. The pg_size_pretty function just makes that value human readable by transforming it into the number of megabytes, gigabytes, or so on. The reason we used both in the query is to demonstrate the functionality along with the fact we don’t want to use human readable output in calculations.

But what about pg_total_relation_size? This function is a neat trick that combines the table size along with any related content, like indexes, TOAST data, and so on. Normally the only way to get that data would be to join pg_class with itself multiple times with reltoastrelid or pg_index. That query is pretty unwieldy, so a function call is much easier to use. If we had omitted or changed our WHERE clause, we could have divined size and row counts for whole swaths of tables in the database.

And what about pg_index? Its job is to act as a swing table between a table and its indexes as described by pg_class, as well as describe which columns are in each index, and other identifying information. But this particular catalog table is somewhat opaque, using multiple vector columns and trees to describe objects which can contain multiple other objects. It’s usually much easier to use pg_indexes instead, which converts all of that into something human readable. For example:

\x
 
SELECT *
  FROM pg_indexes
 WHERE tablename = 'picard';
 
-[ RECORD 1 ]----------------------------------------------------------
schemaname | public
tablename  | picard
indexname  | picard_pkey
tablespace | 
indexdef   | CREATE UNIQUE INDEX picard_pkey ON picard USING btree (id)

There isn’t very much information here, actually. But we do get to use the table name to obtain all indexes without any joins, and we can also see the index definition itself. For more complicated dives, like finding all tables that use a particular column in their indexes, especially if we also want to include composite indexes, we’d have to grit our teeth and use pg_index instead. Though even then, it’s not so bad:

SELECT c.relname AS tablename, i.relname AS indexname
  FROM pg_index x
  JOIN pg_class c ON (c.oid = x.indrelid)
  JOIN pg_class i ON (i.oid = x.indexrelid)
  JOIN pg_attribute a ON (
         a.attrelid = x.indrelid AND a.attnum = ANY (x.indkey)
       )
 WHERE a.attname = 'id';
 
 tablename |  indexname  
-----------+-------------
 picard    | picard_pkey

While this is hardly a novice-friendly example of multiple catalog tables, it does demonstrate that inquisitive data mining is possible. There’s a surprising amount of wealth stored in Postgres metadata if you’re brave enough to dig for it. Some masochistic DBAs have even constructed unholy amalgams of several catalog tables to derive potential index candidates, optimal column re-ordering for existing indexes, or even which indexes to remove entirely.

How is that even possible? Well, beyond the basic architecture metadata, Postgres collects statistics it stores in yet more tables and views. We’ll leave using these as an exercise for a future article, because they deserve to be explored in depth. Until then, explore the catalog and see what’s buried in its depths; you might be surprised what you find.


Tags: , , , ,

May the Worst Man Win

March 16th, 2016 | Published in Rant | No Comments


This Super Tuesday, it became readily apparent that Bernie Sanders and his unprecedented run were finally done for. So now that we’ve finally dispensed with the one candidate that genuinely cared, who remains? Donald Biff Tannen Trump, Ted Insane Zealot Cruz, and Hillary Nixon Clinton. Well, if those are my choices, then I may just vote Trump to finally burn the whole thing down, because we clearly deserve it.

But wait, Hillary isn’t a narcissistic populist or a religious fanatic, so why do I hate her?

It’s not because she lies constantly, or that she voted for the Iraq war and the PATRIOT act, or that she was against gay marriage until it was politically convenient. It’s not because she’s wholly owned by financial institions that put our very economy at risk. It’s not because she set up a private email server to circumvent FOIA, and removed classified headers to freely share sensitive documents, or that said server was hacked by at least one foreign entity. It’s not even because she won’t release her Wall Street speech transcripts until everyone else does, since that’s how a leader acts.

No. While those are all completely viable reasons, they’re basically politics as usual at this point. The principal source of my frothing rage is that her last name is Clinton. This isn’t the UK, where The Queen is basically just a figurehead. The president in this country has actual power. That we’ve come full circle and essentially erected a political dynasty when we first elected Bush Junior was bad enough. That we are cheerfully repeating that misadventure with Hillary has transformed a bad joke into an ongoing caricature. At this point, would it be any surprise to see Chelsea eventually bid for the presidency? After all, if Hillary gets two full terms, 2024 would be her earliest opportunity to embrace the family calling. It’s only fair, since nobody in a country of over 300 million people is more qualified than one or two families.

And what incentive is there to vote for her at all? Because she would be the first female president? What about Carly Fiorina, or Sarah Palin? Simply being a woman is not a credential, or either of them would have been fine choices. How about Supreme Court nominations? Assuming Republicans can thwart Obama’s selections until the election, Hillary would be more likely to push for somewhat liberal candidates. How about joining the rest of the developed world and opting for single payer healthcare, so we stop spending twice as much? Well, Hillary says that will never happen, despite her support for it in 1993. How about legalizing Cannabis? Well, she currently supports state efforts and taking it off Schedule I at least.

Of course, Bernie would have pushed for legalization as well as a liberal Supreme Court, so those are just generic Democratic party expectations. Yet he would have done those things without the copious political baggage. But what do we expect when Debbie Wasserman Schultz, the head of the Democratic National Committee, worked on Hillary’s campaign in 2008? And when major Clinton supporters do things like buy The Onion. Satire that recursive will probably end the universe all by itself. In any other world, Bernie would have been the political underdog taking on the establishment candidate and been a media darling. Instead, he is ignored, ridiculed, and gets 16 negative articles in 16 hours written about him just by the Washington Post. Could they at least try not to be so blatantly biased?

This entire election is a Shit Show. No matter who wins now, we all lose, and deservedly so. In that case, why not vote for Trump? Sure, it’ll severely damage our international reputation, trade partners, and potentially result in attempts to flee the US Dollar as the world reserve currency. But it sends a message. Trump is the global “None of the Above” choice, because he’s so utterly pompous and satirical, the message is unambiguous. Beyond that, we deserve it anyway. We deserve all of it. We watched Idiocracy and gleefully transformed it into a documentary. If there were ever a refutation of American Exceptionalism, the 2016 election makes the best case I’ve seen. It’s a travesty, it’s insane, it’s idiotic, and it’s a perfect representation of what the US has become.

We let this happen. Hopefully we’ll learn from it and do better in 2020, but I’m not optimistic.


Tags: , , ,

PG Phriday: Secret of the Ooze

March 11th, 2016 | Published in Database, Tech Talk | 1 Comment


A few days ago, a developer came to me with that inevitable scenario that every DBA secretly dreads: a need for a dynamic table structure. After I’d finished dying inside, I explained the various architectures that could give him what he needed, and then I excused myself to another room so I could weep silently without disturbing my coworkers. But was it really that bad? Databases have come a long way since the Bad Old Days when there were really only two viable approaches to table polymorphism. Postgres in particular adds two options that greatly reduce the inherent horror of The Blob. In fact, I might even say its a viable strategy now that Postgres JSON support is so good.

Why might a dev team need or even want a table with semi-defined columns? That seems like a silly question to ask in the days of NoSQL databases. Yet an application traditionally tied to a RDBMS can easily reach a situation where a strict table is outright inadequate. Our particular case arose because application variance demands columnar representation of precalculated values. But the generators of those values are themselves dynamic, as are the labels. One version may have ten defined columns beyond the standard core, where another might have fifteen. So we need a dynamic design of some description.

Before we go crazy in our history lesson, let’s start with a base table we’ll be expanding in various ways through the article. It’ll have a million data points ranging from today to a few months in the past, with datapoints every ten seconds.

CREATE TABLE sensor_log (
  id            INT PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (id, location, reading, reading_date)
SELECT s.id, s.id % 1000, s.id % 100,
       '2016-03-11'::TIMESTAMP - ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 1000000) s(id);
 
CREATE INDEX idx_sensor_log_location ON sensor_log (location);
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

To understand just why a DBA might shudder in abject terror at the prospect of a dynamic table polluting a database, we need to examine the ancient methods of achieving it. The first of which is simply adding columns directly. New columns for every calculation. Columns for every misspelling of a word. Potentially useful columns. All columns. Every column. Hundreds. Thousands. Postgres can have up to 1600, so let’s max that baby out! But then what? Another table with even more columns! Glued together into an unholy amalgam with an extremely inadvisable JOIN.

This is understandably… frowned upon. Instead, probably the most common technique leverages an attribute table. Attribute tables are just key/value pairs tied to the primary key of the source table. This lets us theoretically have infinite columns based on need. They commonly resemble something like this:

CREATE TABLE sensor_log_attr (
  id   INT,
  KEY  VARCHAR NOT NULL,
  val  VARCHAR NOT NULL,
  UNIQUE (id, KEY)
);

The primary concern here is one of efficiency. Each and every supplementary column requires a JOIN to represent the data. These virtual columns are impossible to index within the context of the base table itself. We also get no statistics for how values are distributed, because again, these columns aren’t really part of the table structure in question. This approach only really works for descriptive information.

Let’s see how we might use an attribute table by adding another day of data and a view to “flatten” the table for consumption:

INSERT INTO sensor_log (id, location, reading, reading_date)
SELECT 1000000 + s.id, s.id % 1000, s.id % 100,
       '2016-03-11'::TIMESTAMP + ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 8640) s(id);
 
INSERT INTO sensor_log_attr (id, KEY, val)
SELECT 1000000 + s.id, 'frequency', (s.id % 120) || 'hz'
  FROM generate_series(1, 8640) s(id);
 
INSERT INTO sensor_log_attr (id, KEY, val)
SELECT 1000000 + s.id, 'jitter', round(1.0 / ((s.id % 10)+10), 3)
  FROM generate_series(1, 8640) s(id);
 
CREATE OR REPLACE VIEW v_attr_flat AS
SELECT l.*, f.val AS frequency, j.val::DECIMAL AS jitter
  FROM sensor_log l
  LEFT JOIN sensor_log_attr f ON (f.id = l.id AND f.KEY = 'frequency')
  LEFT JOIN sensor_log_attr j ON (j.id = l.id AND j.KEY = 'jitter');
 
ANALYZE sensor_log_attr;
 
SELECT avg(jitter)
  FROM v_attr_flat
 WHERE reading_date > '2016-03-12'::TIMESTAMP - INTERVAL '10 minutes';

The query on this particular set of data on our test hardware required about 0.8ms to run. This is a best case scenario, unfortunately. Every column added compounds the view and adds more work to the underlying view. Every row retrieved is likely another iteration of a nested loop over that complicated structure. On the other hand, even if there are hundreds of column variants, only a relatively small handful will apply in any one use case. As such, there is an upper bound on the performance penalty incurred by multiple joins and incomplete statistics.

Regardless, attribute tables are still the slowest technique available in our toolbox. For some database engines, this is where the story ends, much to the dismay of their DBAs. Postgres however, has table inheritance. Some Postgres veterans probably immediately considered that approach, and why not? Optional columns are easy to add. Instead of using multiple views, we can simply create a table for each major variant to extend the common base of the root table. It’s a great place to start.

Our previous example added some data to the sensor_log table we want to use as our base, so let’s start by resetting its contents:

TRUNCATE TABLE sensor_log;
 
INSERT INTO sensor_log (id, location, reading, reading_date)
SELECT s.id, s.id % 1000, s.id % 100,
       '2016-03-11'::TIMESTAMP - ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 1000000) s(id);
 
ANALYZE sensor_log;

Now consider our frequency and jitter columns. In the attribute case, both of these had to be TEXT type because it’s the only type that can contain any data. Our inheritance model means we can customize the data types as we normally would. In addition, we get the full benefit of table statistics, including a far simpler query plan while addressing the new table. As such, inheritance is probably the most efficient approach available to us.

Here’s how we could use an inherited table for the new columns:

CREATE TABLE sensor_log_upgraded (
  frequency  TEXT,
  jitter     DECIMAL
)
INHERITS (sensor_log);
 
INSERT INTO sensor_log_upgraded (
  id, location, reading, reading_date, frequency, jitter
)
SELECT 1000000 + s.id, s.id % 1000, s.id % 100,
       '2016-03-11'::TIMESTAMP + ((s.id * 10) || 's')::INTERVAL,
       (s.id % 120) || 'hz', round(1.0 / ((s.id % 10)+10), 3)
  FROM generate_series(1, 8640) s(id);
 
CREATE INDEX idx_sensor_upgraded_location
    ON sensor_log_upgraded (location);
 
CREATE INDEX idx_sensor_upgraded_date
    ON sensor_log_upgraded (reading_date);
 
ANALYZE sensor_log_upgraded;
 
SELECT avg(jitter)
  FROM sensor_log_upgraded
 WHERE reading_date > '2016-03-12'::TIMESTAMP - INTERVAL '10 minutes';

For this very simple example, the query ran about 50% faster than the attribute version. And why wouldn’t it? The row count in the inherited table is much lower than the base table. This situation would persist for every child table modeled this way. Each would have its own custom set of extended columns customized to suit the intended task, with the performance that implies. Imagine we used the “one giant flat table” model instead. Query performance would steadily degrade as more variants were introduced because they’re all held in the same underlying table. Include the necessity of wading through dozens of unused columns, and the situation becomes even more dire.

If there’s any caveat to table inheritance, it’s shared aggregates. Beyond the common root columns held in the base table, it’s impossible to access supplementary fields from the base table without convoluted unions and liberal use of the ONLY decorator. The impact of this can be reduced by putting likely aggregate columns in the base table itself, or using inheritance chains to group variants together.

This concept of chained inheritance has a lot of potential. If a child table itself acts as a base for another table with some special case column, it’s far more likely the two share enough underlying elements to enable aggregation. With a solid naming scheme and targeted queries, it’s a completely viable—if a bit clunky—method for accommodating extremely dynamic applications.

Yet there’s still one more approach to consider. Ever since Postgres added JSON as a supported datatype, it threatened to eat the lunch of NoSQL engines. But why? In a way, it improves table inheritance by making shared aggregates possible across variants. Like inheritance, the JSON data will only contain columns relevant to the application use case. As a bonus, table statistics and indexing of the JSON data are still possible, if a bit kludgy behind the scenes.

Here’s how we might implement JSON polymorphism:

DROP TABLE sensor_log_upgraded;
 
ALTER TABLE sensor_log ADD poly_data JSONB;
 
INSERT INTO sensor_log (
  id, location, reading, reading_date, poly_data
)
SELECT 1000000 + s.id, s.id % 1000, s.id % 100,
       '2016-03-11'::TIMESTAMP + ((s.id * 10) || 's')::INTERVAL,
       json_build_object(
         'frequency', (s.id % 120) || 'hz',
         'jitter', round(1.0 / ((s.id % 10)+10), 3)
       )::JSONB
  FROM generate_series(1, 8640) s(id);
 
ANALYZE sensor_log;
 
CREATE OR REPLACE VIEW v_poly_flat AS
SELECT id, location, reading, reading_date,
       poly_data->>'frequency' AS frequency,
       (poly_data->>'jitter')::DECIMAL AS jitter
  FROM sensor_log;
 
SELECT avg(jitter)
  FROM v_poly_flat
 WHERE reading_date > '2016-03-12'::TIMESTAMP - INTERVAL '10 minutes';

Like the attribute technique, we require a view to unroll optional fields. But we don’t need joins to do it, and statistics are at least somewhat available from the JSON contents. As a result, the sample query is only slightly slower than the inheritance example. Similar to inheritance architecture, we can create views to represent specific use cases and deployment versions. Better yet, the syntax for these views is relatively simple, and easy to automate so applications can generate their own views as needs dictate. Altering one giant table, or regularly creating new tables carries far more risk.

As we mentioned before, another major benefit is that indexes propagate through the view, so indexable poly fields are possible. This would be a given for table inheritance, but completely impossible for an attribute table. Here’s how we could do use it to specifically target sensor jitter:

CREATE INDEX idx_sensor_log_jitter
    ON sensor_log ( ((poly_data->>'jitter')::DECIMAL) )
 WHERE ((poly_data->>'jitter')::DECIMAL) IS NOT NULL;
 
EXPLAIN analyze
SELECT COUNT(*)
  FROM v_poly_flat
 WHERE jitter > 0.095;
 
                             QUERY PLAN
--------------------------------------------------------------------
 Aggregate  (cost=14301.94..14301.95 ROWS=1 width=0)
            (actual TIME=0.690..0.690 ROWS=1 loops=1)
   ->  Bitmap Heap Scan ON sensor_log
           (cost=6298.08..19506.34 ROWS=336213 width=0)
           (actual TIME=0.151..0.631 ROWS=864 loops=1)
         Recheck Cond:
             (((poly_data ->> 'jitter'::text))::NUMERIC > 0.095)
         Heap Blocks: exact=115
         ->  Bitmap INDEX Scan ON idx_sensor_log_jitter
                 (cost=0.00..6214.03 ROWS=336213 width=0)
                 (actual TIME=0.130..0.130 ROWS=864 loops=1)
               INDEX Cond:
                   (((poly_data ->> 'jitter'::text))::NUMERIC > 0.095)

This is pretty exciting in a couple of ways. First, we can see that the planner used our jitter index to restrict candidate rows as expected. Secondly, we were able to leverage partial indexes to completely ignore rows that don’t have the jitter column. Consider how powerful that is in the context of a dynamic table structure. A large majority of rows will simply lack most attributes, and that makes the index one or more orders of magnitude smaller than it would be otherwise. In our case, the index on the location column was 28MB, while our jitter index was only 208kB.

For data that would instead leverage exact matches instead of ranges, there are even better indexing techniques specifically for JSON. And if the advanced Vodka indexes get integrated into Postgres core, the situation gets even better.

On the potentially concerning side, we can see that the planner was confused a bit since only a tiny fraction of our data even has poly_data contents. In a real case, it’s likely mis-estimates won’t be quite as drastic, but the presence in this example proves there’s still work to be done on the planner. We got a good query plan this time, but with three orders of magnitude separating the estimate from the actual results, things could have been far worse.

In the end, I still don’t like the prospect of supporting, or the general concept of polymorphic tables. I’m still a DBA at heart, after all. Yet with the tools Postgres provides, my urge to swim in an acid volcano at the mere thought isn’t quite as intense. In fact, I can even visualize a future where Postgres JSON features bridge the gap between a need for NoSQL and the importance of standardized data structures. And I can do it without feeling eternally soiled.

I guess Postgres makes anything possible.


Tags: , , , , , ,

PG Phriday: Being A Tattletale

March 4th, 2016 | Published in Database, Tech Talk | 1 Comment


In a heterogeneous database environment, it’s not uncommon for object creation and modification to occur haphazardly. Unless permissions are locked down to prevent it, users and applications will create tables, modify views, or otherwise invoke DDL without the DBA’s knowledge. Or perhaps permissions are exceptionally draconian, yet they’ve been circumvented or a superuser account has gone rogue. Maybe we just need to audit database modifications to fulfill oversight obligations. Whatever the reason, Postgres has it covered with event triggers.

Now, event triggers have only been around a relatively short while, having appeared in version 9.3. Even though I was personally excited to see the feature, priorities changed and they fell off my RADAR for quite a while. Yet the functionality they offer is exceedingly useful and worthy of exploration. So here’s a simple scenario: email DDL (Database Definition Language) to a DBA team whenever it occurs. This would mean anything from CREATE TABLE to DROP RULE, or anything else from this chart.

This would normally be fairly easy, but our first choice of language, the native PL/pgSQL doesn’t have email functionality. Can we use Python? Let’s try:

CREATE OR REPLACE FUNCTION sp_email_command()
RETURNS event_trigger AS
$$
    print 'Event Trigger!'
$$ LANGUAGE plpythonu;
 
ERROR:  PL/Python functions cannot RETURN TYPE event_trigger

Nope. While this is a somewhat unfortunate oversight, it’s not a roadblock. We can use a standard PL/pgSQL function as a wrapper to call our Python email routine. So let’s just do that.

The next thing to consider is what we should include in the email. Thankfully there’s a wealth of information available regarding database sessions. It’s always a good idea to be specific while snitching, so our email should minimally report who the user is, where they came from, where they went, and what they did. Here’s a very quick and dirty event trigger that does all of that:

CREATE OR REPLACE FUNCTION sp_tattle_ddl()
RETURNS event_trigger AS
$$
BEGIN
    RAISE NOTICE 'Type: %', TG_TAG;
    RAISE NOTICE 'Command: %', current_query();
    RAISE NOTICE 'DB Name: %', current_database();
    RAISE NOTICE 'DB User: %', session_user;
    RAISE NOTICE 'DB Port: %', inet_server_port();
    RAISE NOTICE 'Server Host: %', inet_server_addr();
    RAISE NOTICE 'Client Host: %', inet_client_addr();
END;
$$ LANGUAGE plpgsql;
 
CREATE EVENT TRIGGER t_event_audit_all
    ON ddl_command_start
EXECUTE PROCEDURE sp_tattle_ddl();
 
-- Let's see this in action!
 
CREATE TABLE foo (bar INT);
 
NOTICE:  TYPE: CREATE TABLE
NOTICE:  Command: CREATE TABLE foo (bar INT);
NOTICE:  DB Name: postgres
NOTICE:  DB USER: postgres
NOTICE:  DB Port: 5432
NOTICE:  Server Host: pg-dev-1
NOTICE:  Client Host: win-desktop-15

Not a bad start! The reason we used session_user in lieu of current_user, is because the latter is transient. We want to know exactly who connected, not who they’re masquerading as. This makes it much easier to perform forensics if such becomes necessary.

Now that we know we can capture the DDL and the surrounding environment, let’s create a Python function that performs the email action for us. We don’t need to worry especially about overhead, since DDL is not nearly as time-sensitive as regular queries. Modifying a database shouldn’t happen often, so we should feel free to perform reverse DNS or other potentially time-consuming operations, as long as we don’t get excessive.

CREATE OR REPLACE FUNCTION sp_email_command(
    target TEXT, tg_type TEXT, command TEXT, db_name TEXT,
    db_user TEXT, db_port INT, server_addr INET, client_addr INET
)
RETURNS VOID AS
$$
    import smtplib
    import socket
    import email
 
    # For the sake of readability, we should translate server
    # and client host names from address information when
    # possible. If these weren't sent, that means we're dealing
    # with a local UNIX socket connection, and should label it
    # with the server our trigger fired on.
 
    local_name = socket.gethostname()
    server_name = local_name
    client_name = local_name
 
    if server_addr:
        try:
            server_name = socket.gethostbyaddr(server_addr)[0]
        except:
            server_name = server_addr
 
    if client_addr:
        try:
            client_name = socket.gethostbyaddr(client_addr)[0]
        except:
            client_name = client_addr
 
    # Put together the email message. If we use a standard label
    # format, automated systems or rules can easily route these.
 
    msg = email.message_from_string(
        'Container : %s\n' % server_name +
        'DB Port : %s\n' % (db_port or 'N/A') +
        'DB User : %s\n' % db_user +
        'DB Name : %s\n' % db_name +
        'Client Host : %s\n' % client_name +
        'Full Command : \n\n' + command
    )
 
    msg['Subject'] = '%s DDL Change on %s Detected!' % (tg_type, server_name)
    msg['From'] = 'postgres@%s' % server_name
    msg['To'] = target
 
    # Transmit the actual message to the local server. It
    # should then forward to the appropriate upstream system.
 
    s = smtplib.SMTP('localhost')
    s.sendmail('postgres@%s' % server_name, target, msg.as_string())
    s.quit()
 
$$ LANGUAGE plpythonu;

This is a pretty simple email. There are no attachments, the body is just a glorified series of captured session attributes and the modification. If we were more ambitious, we could implement templating or produce an HTML-encoded version. For now though, this function works well and even covers a couple of edge cases for local connections, and IPs that don’t reverse.

Our final task is to replace the trigger wrapper function with something that will call the Python version. Here we go:

CREATE OR REPLACE FUNCTION sp_tattle_ddl()
RETURNS event_trigger AS
$$
BEGIN
    PERFORM sp_email_command(
        'dbas@my-company.com', TG_TAG, current_query(),
        current_database(), session_user,
        inet_server_port(), inet_server_addr(),
        inet_client_addr()
    );
END;
$$ LANGUAGE plpgsql;

Now instead of issuing a NOTICE for each desirable attribute, we produce an email that details everything. Otherwise, not much has changed. A more complicated version might only call the email for some specific DDL, or perhaps CC the end user in certain cases.

What about the Postgres logs? After all, the log_statement option has a setting specifically for logging database modifications. It’s a good start, and could indeed be consumed by applications like Logstash or Elasticsearch. Event triggers however, give us much more control over presentation and circumstances. Instead of an email, we could insert these details into a remote auditing database, or to a federal agency. We could utilize foreign data wrappers to directly invoke APIs from any number of systems.

But primarily, the trigger approach is immediate and far more difficult to forge as a source. The integrity of our chain of custody is just as important as the audited DDL itself. The modification wasn’t scraped from a log, or parsed from related activity, but reported from the database instance as it happened. Written properly, event triggers can even reject any modifications that aren’t properly reported or acknowledged by the expected recipient.

Such ruthless control over database architecture probably isn’t warranted in the vast majority of implementations. But Postgres makes it possible, even easy, to crank the paranoia level up to 11 when the situation calls for it. Once again, Postgres excels in its versatility.

The principal caveat here is that event triggers don’t fire on themselves, making it possible to circumvent them. Hopefully this is corrected in a future version so extremely security-conscious admins can use it as another reliable layer in the onion.


Tags: , , , ,

PG Phriday: Corralling the Hordes

February 26th, 2016 | Published in Database, Tech Talk | 2 Comments


Ah, users. They log in, query things, accidentally delete critical data, and drop tables for giggles. Bane or boon, user accounts are a necessary evil in all databases for obvious reasons. And what would any stash of data be if nobody had access? Someone needs to own the objects, at the very least. So how can we be responsible with access vectors while hosting a Postgres database? We already covered automating grants, so let’s progress to the next step: building a “best practice” access stack.

More experienced (and smarter) people than me have given this process a lot of thought, so why not learn from one of their implementations? In UNIX systems, file access is handled through up to nine grants based on the owner, an arbitrary group, or the unwashed masses, each with read, write, or execute permissions. In the context of Postgres, the shambling hordes can be associated with PUBLIC, USER with ownership, and GROUP for the usual buckets. Similarly, we have SELECT for read, INSERT, UPDATE, or DELETE for write, and EXECUTE as well.

This means we actually have a bit more control in certain areas than UNIX filesystems. In fact, since multiple users or groups have distinct privileges on each object, we have the opportunity to be really creative. If we start with the analogous case, but expand it by leveraging multiple groups, we have a set of standardized roles that resemble a filesystem’s privileges, but with greater granularity. We can even create something similar to a bitmask, so all new objects adhere to our desired grant structure.

So how do we start? The easy case would have the owner, a group for writing, a group for reading, and a group for function execution. Let’s set it up and include the default ACLs from the previous article:

CREATE USER myapp_owner WITH PASSWORD 'test';
CREATE SCHEMA myapp AUTHORIZATION myapp_owner;
ALTER USER myapp_owner SET search_path = myapp;
 
CREATE GROUP myapp_reader;
CREATE GROUP myapp_writer;
CREATE GROUP myapp_exec;
 
GRANT USAGE ON SCHEMA myapp TO myapp_reader;
GRANT USAGE ON SCHEMA myapp TO myapp_writer;
GRANT USAGE ON SCHEMA myapp TO myapp_exec;
 
ALTER DEFAULT PRIVILEGES
  FOR USER myapp_owner
   IN SCHEMA myapp
GRANT SELECT ON TABLES
   TO myapp_reader;
 
ALTER DEFAULT PRIVILEGES
  FOR USER myapp_owner
   IN SCHEMA myapp
GRANT INSERT, UPDATE, DELETE ON TABLES
   TO myapp_writer;
 
ALTER DEFAULT PRIVILEGES
  FOR USER myapp_owner
   IN SCHEMA myapp
GRANT USAGE ON SEQUENCES
   TO myapp_writer;
 
ALTER DEFAULT PRIVILEGES
  FOR USER myapp_owner
   IN SCHEMA myapp
GRANT EXECUTE ON FUNCTIONS
   TO myapp_exec;
 
ALTER DEFAULT PRIVILEGES
  FOR USER myapp_owner
   IN SCHEMA myapp
REVOKE EXECUTE ON FUNCTIONS
 FROM PUBLIC;

What we’ve done beyond creating the owning user and the groups themselves, is set defaults for new objects so everything is automatically enforced. The myapp_writer role has two of these because tables may rely on sequences for automatically generated IDs, so we need grants for both. We also neglected granting read access to the writer since writing can come from any number of vectors, some of which might not require reading the data they insert. The last thing we do is revoke execution of all stored procedures from PUBLIC, because that’s a silly default and smells of elderberries.

We should also note that, while these commands should be executed by a superuser initially, ownership itself does not require such elevated privileges. It’s unfortunately common for DBAs and operators to create objects as the postgres user itself. Since this user is often the primary superuser for the Postgres instance, deployment systems must be given access to it as well. That is an implicit risk to the entire database constellation that has only one alternative: the DBA deploys all database objects.

While certainly possible, this acts as an impediment to continuous integration, automated testing, and software deployment. Proprietary applications might also depend on database objects that must be created and maintained by the application itself. The DBA literally can’t inject themselves into every potential source.

But the myapp_owner user can. Let’s use it to build some simple object in the myapp schema:

CREATE TABLE test_tab (
  id          SERIAL PRIMARY KEY,
  somedata    VARCHAR NOT NULL,
  created_dt  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);
 
CREATE INDEX idx_test_tab_created_dt
    ON test_tab (created_dt);
 
CREATE VIEW v_test_data AS
SELECT * FROM test_tab
 WHERE somedata LIKE 'test%';
 
CREATE FUNCTION f_hello()
RETURNS VARCHAR AS
$$
 
  SELECT 'Hello ' || somedata
    FROM test_tab
   ORDER BY created_dt DESC
   LIMIT 1;
 
$$ LANGUAGE SQL;

Note that we still have no users beyond the initial owner. That comes later, once we know what username an application or user might desire or require for limited operation. We also strongly recommend that any database users for day-to-day operation should be distinct from the object owner. This prevents tool exploits from escalating to full data control. If a web user only has read access, a security bug would only expose the data, not allow the cracker the capability of changing data or dropping database structures.

Given these constraints, here’s a sample of how we might use our ACLs to constrict access. This is where a DBA could step in:

CREATE USER myapp_web WITH PASSWORD 'webthing';
ALTER USER myapp_web SET search_path = myapp;
GRANT myapp_reader TO myapp_web;
 
CREATE USER myapp_import WITH PASSWORD 'util';
ALTER USER myapp_import SET search_path = myapp;
GRANT myapp_reader TO myapp_import;
GRANT myapp_writer TO myapp_import;
 
CREATE USER myapp_report WITH PASSWORD 'calculate';
ALTER USER myapp_report SET search_path = myapp;
GRANT myapp_reader TO myapp_report;
GRANT myapp_exec TO myapp_report;

Now the web user can read all of the data it wants. The import jobs can shove data into the tables and read when reconciliation is necessary. Data reports can call functions to update fact tables or data cubes, and read the results.

This is a simple access model that might be ideal for a standard application deployment that doesn’t need to lock down individual tables. More complicated variations would have no default privilege definitions, and several more groups. For example, we might have created myapp_web as a group instead, and granted access to a limited subset of objects within the schema. Of course, provided we revoke the grants above, we still have that option.

And that’s where this kind of approach really wins out. While we defined a standardized series of default privileges that make it easy to distribute basic access, we can go much further while still maintaining those standards.

What about PUBLIC you ask? No. Stop asking that. They don’t get access. Barring functions, which we fixed, PUBLIC gets no default access to anything, ever. PUBLIC and all other users which don’t match our carefully organized roles, exist merely as objects of ridicule as they swarm fruitlessly at our impenetrable fortifications. The hordes are to be thwarted by any means available, which Postgres satisfies in spades.

After all, we all have our standards.


Tags: , ,

« Older Posts

Newer Posts »