PG Phriday: Forensic Fundamentals

August 19th, 2016 | Published in Database, Tech Talk | No Comments


All database engines, even Postgres, occasionally present a seemingly intractable problem that will drive everyone insane while attempting to isolate it. All it takes is the perfect storm of situational circumstances, and even a perfectly running stack of software will grind to a screeching halt. It’s situations like this that we must turn to various Postgres forensic tools to track down the issue before management starts firing people in frustration.

We’ve already discussed how connections can be blocked by improper operation ordering. But what if thing’s aren’t so straight-forward? When there are more than two actors involved, the potential for weird interactions increases to a point where replicating the issue in a clean environment is often difficult or impossible. So let’s make sure a few things are in place.

First things first: logging. We should tweak log settings from the defaults, as they provide a wealth of after-the-fact diagnostic information. All of these should be in postgresql.conf:

log_checkpoints = on
log_statement = ddl
log_min_duration_statement = 1000
log_line_prefix = '%p|%u|%d|%r|%t|'

There are a lot of other log settings, but these are the ones I personally consider essential modifications. Why? Let’s examine them one by one.

  • log_checkpoints: A checkpoint happens when Postgres writes pending modifications to table files. If this setting is enabled, we learn when that process started, how long it took, how much data was written, how much time was spent syncing to disk and thus waiting for the underlying OS to flush the data, and so on. Enabling this can single-handedly track down insufficient write performance of a storage device, or reveal times of heavy write activity that may suggest better transaction log coverage, and so on. It’s a crime this isn’t enabled by default.
  • log_statement: The ddl setting will log any modification to a database object. Create a table? Logged. Modify a view? Logged. Drop an index? Logged. This is the minimal level of activity auditing a DBA should expect. Modifying database structures, especially in production, should be strictly controlled, and we need to know when such modifications occur, who made them, and so on.
  • log_min_duration_statement: In most databases, queries execute on the order of tens of milliseconds. Those that exceed one full second are almost always an aberration that require further investigation. Reporting-oriented servers might err toward higher values as their queries are generally slower, but this should still be set to something. This doesn’t just reveal slow queries, but queries that ran long for any reason. This latter case makes query tracking an essential tool.
  • log_line_prefix: There are numerous environmental details in place for any session, or a particular operation. We know the account in question, where it connected from, which database it’s using, the PID of the Postgres backend it was assigned, and so on. It’s ideal to salt this to fit organization needs, and this is much better than the default. If a problem occurs, we might not be available or capable of viewing it right away. It’s up to the log to fill any gaps.

There is a huge exception here that will be relevant soon: the TRUNCATE command. Postgres classifies this as a modification, not DDL. As such, our chosen log_statement value will not log table truncations! This matters because TRUNCATE doesn’t just empty a table—it destroys it. What we see when truncating a table is that it becomes empty. In reality, Postgres created an exact copy of the original table and performed an atomic swap, discarding the original.

We could use that to argue TRUNCATE is DDL, and should be logged as such. We’re fundamentally modifying the table itself, and the documentation agrees:

TRUNCATE acquires an ACCESS EXCLUSIVE lock on each table it operates on, which blocks all other concurrent operations on the table.

This exclusive access lock is the key to understanding situations where TRUNCATE can be problematic. We have a process that isn’t logged as DDL, yet acts like DDL, and could originate from any application or script that touches the database. Consider the implications.

Now let’s construct the perfect storm. Imagine two access vectors: one reads data to a table, the other writes intermittently. They look like this:

CREATE TABLE foo (bar INT);
 
-- Connection 1, reads from a table:
 
BEGIN;
SELECT * FROM foo;
ROLLBACK;
 
-- Connection 2, writes from some script source:
 
BEGIN;
INSERT INTO foo (bar) VALUES (42);
COMMIT;

These look relatively harmless, but there are a couple clues if we examine more closely. Notice that the read-only connection is wrapped in a transaction? This looks very odd since it’s completely unnecessary, but it’s actually very common. Many database connectors actually consider this beneficial, since it allows developers to perform endless operations without affecting the database until they’re ready to commit.

Unfortunately for applications that are read-only, this can be a lot of unnecessary overhead. Further, the author of a read-only script may not be cognizant they’re operating within a transaction. It’s understandably common for such a process to request a large amount of data from Postgres and then process it, unaware that Postgres is calmly waiting for them to commit or roll back their work. We can see this in Postgres if we examine the pg_stat_activity and pg_locks views.

Imagine a connection retrieved some data and is spending hours processing it. This is what Postgres would see:

\x ON
 
SELECT * FROM pg_stat_activity
 WHERE state = 'idle in transaction';
 
-[ RECORD 1 ]----+------------------------------
datid            | 12411
datname          | postgres
pid              | 19690
usesysid         | 10
usename          | postgres
application_name | psql
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2016-08-19 11:14:11.194043-05
xact_start       | 2016-08-19 11:14:28.376349-05
query_start      | 2016-08-19 11:14:28.376627-05
state_change     | 2016-08-19 11:14:28.376977-05
waiting          | f
state            | idle IN TRANSACTION
backend_xid      | 
backend_xmin     | 
query            | SELECT * FROM foo;
 
\x off
 
SELECT l.pid, a.waiting, l.relation::regclass::text,
       l.locktype, l.mode
  FROM pg_locks l
  JOIN pg_stat_activity a USING (pid)
 WHERE a.state = 'idle in transaction';
 
  pid  | waiting | relation |  locktype  |      mode       
-------+---------+----------+------------+-----------------
 19690 | f       | foo      | relation   | AccessShareLock
 19690 | f       |          | virtualxid | ExclusiveLock

This information paints the session as ultimately innocuous. It’s idle in transaction, but it only has a share lock on one table, which shouldn’t block anything trying to write to it. Imagine our surprise when applications and users start reporting they keep getting errors about “too many connections”. When we look at the sessions, we see something like this:

SELECT COUNT(*)
  FROM pg_stat_activity
 WHERE waiting;
 
 COUNT 
-------
    97

How is that possible? It’s easy if we have a large cluster of web-driven servers all trying to write to a single table. Or even just an aggressively threaded bulk loading application that pushes data as hard as it can without any kind of throttle checks. Both of these situations are fine when the database is behaving, so nobody ever thought about coding around scenarios where it becomes unresponsive.

Sensing an issue, developers, operators, or system administrators might simply start bouncing applications or even Postgres itself in a frantic effort to clear the logjam. In that case, we can’t check pg_stat_activity for the cause, and all we have are the logs. Remember those? Let’s see what they have to say about things, given we know roughly when the problem occurred:

grep 11:42 /var/log/postgresql/postgresql-9.5-main.log
 
25956|postgres|postgres|::1(49763)|2016-08-19 11:42:21 CDT|LOG:  duration: 660488.381 ms  statement: TRUNCATE TABLE foo;

Introducing our third vector, a periodic cleanup job that empties a shared table before an extremely aggressive bulk operation fills it again. Before this process began, our read-only script started running and was calmly processing query results, unaware it had inadvertently shut down the entire company.

How? Remember that access share lock it’s holding while idle in a transaction? Well the TRUNCATE command requires an access exclusive lock, which it can’t get. It needs the exclusive lock because it needs to annihilate the table and replace it with a shady clone. While it waits for a lock it will never receive, the aggressive bulk loading application consumes all available connections with inserts that will never complete.

The inserts will never complete because they need an exclusive lock for the specific rows they’re inserting. But they can’t lock individual rows because the entire table is locked by the pending TRUNCATE. Meanwhile, the loading engine dutifully eats up all available connections with subsequent attempts that will simply exacerbate the problem. Once those connections are gone, standard applications can no longer connect at all for unrelated work, and suddenly the world stops.

All because of a single unnecessary transaction. If this sounds ridiculous and contrived, consider the actors. All of them operate independently of one another and function properly in a vacuum. It’s hilariously easy for that to go awry. I’ve seen this happen dozens of times in as many separate organizations across developers at all experience levels. I’ve never encountered a database-driven app that didn’t eventually trigger a DDoS on itself through some unlikely interaction between separate components.

So how can we solve this conundrum? The easiest way is to simply stop using implicit transactions wherever they may lurk. Most—if not all—database connectors provide an attribute or method that completely eliminates this behavior. In Python for example, we could tell psycopg2 we want to handle our own transactions:

import psycopg2
 
conn = psycopg2.connect("my connection string")
conn.autocommit = True
 
cur = conn.cursor()
 
cur.execute('BEGIN')
# Some other db-related write code here
cur.execute('COMMIT')

This reverts Postgres to standard operation: all queries resolve immediately unless commands need to be grouped together. In those cases, we can control that explicitly. This solves our first problem of dangling transactions.

Then applications and scripts need to clean up after themselves. Even in cases of garbage collection, if an insert procedure gets stuck, the whole stack needs to be canceled and cleaned up before invoking subsequent attempts. This prevents out-of-control resource consumption that starve out other access vectors. The Postgres DBA can assist here by setting role-level connection limits:

ALTER USER loader_thing WITH CONNECTION LIMIT 20;

This won’t prevent misbehaving scripts from choking themselves to death, but the rest of the application stack should continue unaffected. Of course, if different applications are sharing accounts, that needs to be stopped immediately. We can’t use high granularity controls on coarse access vectors, otherwise we’ve just moved the problem instead of solving it.

Above all, remember that this is only one possible scenario that requires forensics to track down and eliminate. All of the tools demonstrated here can almost always identify every one of them. We were even able to derive recommended practices to prevent future issues based on our observations. It’s the kind of balanced approach that emerges naturally from Postgres itself. It’s a stage where everyone has a role to play.

Except implicit transactions; those things are cancer.


Tags: , , , ,

PG Phriday: Inevitable Interdiction

August 12th, 2016 | Published in Database, Tech Talk | No Comments


“Hey! That row shouldn’t be in that table! How the heck did that get there!? Alright, who wrote the application client filters, because you’re fired!”

Good application developers know never to trust client input, but not all realize that a single app is rarely the only vector into a database. Databases don’t just preserve data with various levels of paranoia, they’re also the central nexus of a constellation of apps, scripts, APIs, GUIs, BMIs, HMOs, and STDs. As such, unless every single one of those share a common ancestor that sanitizes, boils, renders, and formats content before storage, there’s bound to be inconsistencies. That’s just how things work.

One of the major benefits of using an RDBMS, is that engines like Postgres provide several mechanisms for ensuring data integrity beyond crash durability and transaction control. Continuing our discussion on database objects from last week, we vaguely referred to other types of constraint. So, what other mechanisms are available aside from primary keys and unique constraints?

Let’s start with foreign keys, since they illustrate how tables are related and enforce that relation to reject invalid content. Here are two simply related tables and a couple of rows:

CREATE TABLE pet_type
(
  type_id  SERIAL   PRIMARY KEY,
  animal   VARCHAR  UNIQUE NOT NULL
);
 
CREATE TABLE pet
(
  pet_id      SERIAL   PRIMARY KEY,
  type_id     INT      NOT NULL,
  pet_name    VARCHAR  NOT NULL,
  owner_name  VARCHAR  NOT NULL
);
 
ALTER TABLE pet
  ADD CONSTRAINT fk_pet_pet_type FOREIGN KEY (type_id)
      REFERENCES pet_type (type_id);
 
INSERT INTO pet_type (animal) VALUES ('cat'), ('dog');
 
INSERT INTO pet
  (type_id, pet_name, owner_name)
VALUES
  (1, 'Meow Meow Fuzzyface', 'Cedric'),
  (2, 'Mr. Peanutbutter', 'Paul');

Foreign keys provide a buffer between pending modifications by enforcing the relationship. In this case, we can’t remove “dog” as a pet type because at least one pet references it. We also can’t insert a pet fish, because there’s no corresponding type.

The other job the foreign key fills is to normalize the type names. If we had used a VARCHAR column in the pet table instead, we could have types of “Dog”, “dog”, “GDo”, or any number of typos preserved for eternity and forever complicating searches. This helps illustrate and sanitize the relationship and all affected data. Let’s see it in action:

DELETE FROM pet_type WHERE animal = 'dog';
 
ERROR:  UPDATE OR DELETE ON TABLE "pet_type" violates FOREIGN KEY
        CONSTRAINT "fk_pet_pet_type" ON TABLE "pet"
 
INSERT INTO pet
  (type_id, pet_name, owner_name)
VALUES
  (3, 'Wanda', 'Lisa');
 
ERROR:  INSERT OR UPDATE ON TABLE "pet" violates FOREIGN KEY
        CONSTRAINT "fk_pet_pet_type"

Foreign keys do have other modes of operation. We could for instance, declare the constraint as ON DELETE CASCADE. That would enable us to delete from the parent table, but a delete cascade would mean every row in any table that referenced the deleted value would also be removed. That’s a fairly dangerous operation, and a pretty good reason it’s not the default in Postgres.

Foreign keys are fairly limited in application, however. Beyond describing a relationship and enforcing its integrity, there isn’t much left. If we wanted to impose actual rules on the data itself, we need to go a bit further into the toolbox.

For example, suppose we want to ensure pet birth dates are firmly established in the past. None of the previously mentioned constraints will let us apply arbitrary rules on column values, right? That’s where CHECK constraints come in!

ALTER TABLE pet ADD birth_date DATE;
 
ALTER TABLE pet
  ADD CONSTRAINT ck_pet_no_future
CHECK (CURRENT_DATE - birth_date > 0);
 
INSERT INTO pet
  (type_id, pet_name, owner_name, birth_date)
VALUES
  (1, 'Princess Carolyn', 'Amy', '2017-08-12');
 
ERROR:  NEW ROW FOR relation "pet" violates CHECK
        CONSTRAINT "ck_pet_no_future"

Not bad, eh? Check constraints are exceptionally useful when there are very simple rules we want to enforce. Maybe prices should always be positive. Perhaps invoice line items should be positive unless they’re a credit. There is a lot of potential here, but there’s also room for abuse. There’s technically no limit on the amount of conditionals a check constraint enforces, or the number of checks we prescribe, so we must be judicious or risk performance degradation.

Still, preventing critical data flaws prior to insert is a stupendous capability. Can we go further, though? Of course we can! The final constraint type is for data exclusion. Imagine in our examples that pets can change owners, but can’t be owned by two people simultaneously. Well, we can’t use check constraints for that since they only operate on the current row, and a unique constraint won’t work either.

Let’s watch EXCLUDE handle the situation with ease:

CREATE EXTENSION btree_gist;
 
ALTER TABLE pet ADD owner_range TSRANGE;
 
ALTER TABLE pet
  ADD CONSTRAINT ex_owner_overlap
      EXCLUDE USING GIST (
        pet_name WITH =, owner_range WITH &&
      );
 
INSERT INTO pet_type (animal) VALUES ('horse');
 
INSERT INTO pet
  (type_id, pet_name, owner_name, birth_date, owner_range)
VALUES 
  (3, 'Bojack', 'Will', '1964-06-12', '[1964-06-12,2014-09-07)');
 
INSERT INTO pet
  (type_id, pet_name, owner_name, birth_date, owner_range)
VALUES
  (3, 'Bojack', 'Arnett', '1964-06-12', '[2013-09-07,)');
 
ERROR:  conflicting KEY VALUE violates exclusion
        CONSTRAINT "ex_owner_overlap"

There are a couple prerequisites for using exclusion this way, of course. Since the gist index type wasn’t designed to handle types like INT or VARCHAR natively, we need to give it B-Tree capability first with an extension.

Beyond that oddity, we merely added the new ownership date range and then added the constraint itself. The exclusion syntax is to list the column and then the type of operator that should be applied. For ranges, that operator is && to indicate overlap. For our particular example, no pet with the same name can have overlapping ownership ranges. This is a tiny universe indeed!

Exclusion constraints work better for things like scheduling, preventing archive overlaps, and other operations that would be awkward or impossible otherwise. With other database engines, an application might have to search for a date range and other parameters and self-verify that a record is safe to insert. Under this paradigm, any tertiary data vector that isn’t so diligent would be free to ignore scheduling conflicts.

But no application, script, or API can ignore rules the database itself enforces. That’s what constraints are for: those times when being a shared resource is a liability. Judicious application of various constraints can protect as well as describe the data, and make things easier (and safer) for everyone.

And if that isn’t the Postgres motto, maybe it should be.


Tags: , , , , , ,

PG Phriday: Elephantary, My Dear

August 5th, 2016 | Published in Database, Tech Talk | 2 Comments


Occasionally with a lot of similar-sounding terminology, there’s ample room for misunderstandings. This is especially relevant with overloaded terms like ‘index’, which can be anything from a data lookup to a term used in mutual funds. This is further complicated if a developer’s first experience with databases is with another variant with sufficiently divergent syntax, methodology, or assumptions. To prevent future code refactors born of misunderstandings, let’s build a basic Postgres glossary to act as an executive summary for app devs.

Let’s consider this a continuation of our exploration of databases and schemas. We need a solid glossary of terms centered around frequently conflated concepts.

Tables

At its core, a Postgres table is just a container for columns of various types. Each record in a table is stored as a row of one or more columns. We can see that easily enough here:

CREATE TABLE animal_sound
(
  animal  VARCHAR,
  sound   VARCHAR
);
 
INSERT INTO animal_sound (animal, sound) VALUES ('Cow', 'Moo');
INSERT INTO animal_sound (animal, sound) VALUES ('Cat', 'Meow');
 
SELECT * FROM animal_sound;
 
 animal | sound 
--------+-------
 Cow    | Moo
 Cat    | Meow

Nothing groundbreaking here, right? Even the most rank database newbie should know this much. It’s when we start adding elements on top that things start to go sideways. Let’s take this one step at a time to prevent that from happening.

Sequences

The next thing a developer will probably want is an auto-incrementing field of some kind. MySQL does this with their AUTO_INCREMENT decorator. MS SQL uses a function called IDENTITY that needs parameters to determine operation. Oracle requires something called a SEQUENCE that must either be used explicitly or tied to the table with a TRIGGER. Postgres uses sequences too, but they tend to be hidden behind the SERIAL data type.

DROP TABLE animal_sound;
 
CREATE TABLE animal_sound
(
  id      SERIAL,
  animal  VARCHAR,
  sound   VARCHAR
);
 
INSERT INTO animal_sound (animal, sound) VALUES ('Cow', 'Moo');
INSERT INTO animal_sound (animal, sound) VALUES ('Cat', 'Meow');
 
SELECT * FROM animal_sound;
 
 id | animal | sound 
----+--------+-------
  1 | Cow    | Moo
  2 | Cat    | Meow
 
\d animal_sound
 
 COLUMN |       TYPE        |                   Modifiers                   
--------+-------------------+------------------------------------------------
 id     | INTEGER           | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq')
 animal | CHARACTER VARYING | 
 sound  | CHARACTER VARYING |

Behind the scenes, Postgres is transforming the SERIAL type into a SEQUENCE with a DEFAULT value for the column. This makes the Postgres approach something of a middle-ground between the strict Oracle approach, and the much looser MySQL decorator. We saw this explicitly by asking Postgres to tell us the structure of the table.

If we truly desired to use a manual approach, it would have looked like this:

DROP TABLE animal_sound;
 
CREATE SEQUENCE animal_sound_id_seq;
 
CREATE TABLE animal_sound
(
  id      INT DEFAULT NEXTVAL('animal_sound_id_seq'),
  animal  VARCHAR,
  sound   VARCHAR
);
 
ALTER SEQUENCE animal_sound_id_seq OWNED BY animal_sound.id;

So that SERIAL type really saved us quite a bit of work. We didn’t need to explicitly create the sequence, and we didn’t need to marry it to the table column that uses it. That last ALTER SEQUENCE statement is necessary so Postgres knows the sequence is associated with the table. This way, if we drop the table, or export it into a dump file, the sequence (and its most recent value!) is included. Basically, It becomes integral to the table’s operation.

Since a sequence is a separate database object, it has its own attributes. Beyond just the starting value and increment size, we can tweak wraparound, caching, and a couple of other things. There really is very little magic in a Postgres database, and anything that seems magical is probably just a wrapper for something we could have done by hand. This makes for a very powerful hybrid approach that covers both the common and advanced use cases.

Keep in mind that sequences being a separate entity from the table means the value is not affected by underlying operations on the table or its data. What happens to a sequence if we delete all data from a table it’s associated with?

DELETE FROM animal_sound;
 
SELECT last_value FROM animal_sound_id_seq;
 
 last_value 
------------
          3

So far as the sequence is concerned, the next ‘id’ value will be 3 unless it’s explicitly set to a different value. If this were MySQL or SQL Server, we’d start back at 1. This may seem like a limitation, but it means sequences can be used for things other than incrementing data for a single table column. One sequence can be shared by many tables, used to generate other values, or just act as a seed for some other process. These are all uncommon or otherwise advanced techniques, but they remain possible due to sequences remaining distinct entities.

Indexes

At its most basic, an index is just a functional calculation of some kind, performed on the value of one or more table columns. The result is then processed into some kind of storage structure that is optimized for instant (or at least very fast) lookup. The Postgres default is to use a B-tree, but there are numerous other options.

Let’s add an index to the id column of our fake table:

CREATE INDEX idx_animal_sound_id ON animal_sound (id);
 
\d animal_sound
 
 COLUMN |       TYPE        |                   Modifiers                   
--------+-------------------+------------------------------------------------
 id     | INTEGER           | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq')
 animal | CHARACTER VARYING | 
 sound  | CHARACTER VARYING | 
Indexes:
    "idx_animal_sound_id" btree (id)

Just like other database-level objects, indexes exist distinctly and separately from every other object in the database. They utilize storage, and incur overhead. After all, new or updated data necessarily invokes the function that maintains indexed versions of database columns. This is why we don’t simply index every column in a table.

Due to this, there are only two cases where Postgres will automatically create an index: for primary keys and unique constraints. We’ll discuss these special cases next. For now though, consider that indexes exist separately as objects we can manipulate independently as with other base-level objects. They do not replace the column data, nor even represent it. Indexes are just a lookup structure bolted onto the table, with implicit routines to maintain themselves for fast lookup.

Indexes come into use when dereferencing the lookup is faster than just reading the entire table. As such, our example table is far too small to see an index in action. But we can get around that:

CREATE TABLE huge_blob AS
SELECT id FROM generate_series(1, 1000) id;
 
CREATE INDEX idx_huge_blob_id ON huge_blob (id);
 
EXPLAIN
SELECT *
  FROM huge_blob
 WHERE id = 17;
 
                    QUERY PLAN 
-----------------------------------------------------
 INDEX ONLY Scan USING idx_huge_blob_id ON huge_blob
       (cost=0.28..8.29 ROWS=1 width=4)
   INDEX Cond: (id = 17)

See that? Instead of reading the entire table, Postgres calculated the hash for the lookup value of 17, and found that position in the B-tree to obtain the actual location of the row. There’s a longer discussion to have regarding good index candidates, optimal index types, index creation parameters and other minutiae, but we’ll save that for a future article.

Constraints

Constraints are rules we want to apply to our table columns. Maybe they should only represent specific values, or reject certain combinations. The most common however, are PRIMARY KEY and UNIQUE constraints. And why not? A primary key effectively represents the main lookup value for a row in our table, and the need for unique combinations is easily understood.

Let’s start with primary keys, since it’s very rare to encounter a table without one. Let’s modify our sample table a bit and show one in action:

DROP TABLE animal_sound;
 
CREATE TABLE animal_sound
(
  id      SERIAL,
  animal  VARCHAR,
  color   VARCHAR,
  sound   VARCHAR
);
 
ALTER TABLE animal_sound
  ADD CONSTRAINT animal_sound_pkey
      PRIMARY KEY (id);
 
INSERT INTO animal_sound (id, animal, color, sound)
       VALUES (1, 'Cow', 'Brown', 'Moo');
INSERT INTO animal_sound (id, animal, color, sound)
       VALUES (1, 'Cow', 'Spotted', 'Moo?');
 
ERROR:  duplicate KEY VALUE violates UNIQUE
        CONSTRAINT "animal_sound_pkey"
 
\d animal_sound
 
 COLUMN |       TYPE        |                   Modifiers                   
--------+-------------------+------------------------------------------------
 id     | INTEGER           | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq')
 animal | CHARACTER VARYING | 
 color  | CHARACTER VARYING | 
 sound  | CHARACTER VARYING | 
Indexes:
    "animal_sound_pkey" PRIMARY KEY, btree (id)

We can see here that the primary key prevented a duplicate value in the id column of our table. Adding the primary key also gave us a “free” index on the column as well. This should be expected, since we need to check candidates to prevent utilizing existing id values. Without this constraint, both insert statements would have succeeded, and we’d have no easy way to differentiate the data at a later date without using every column.

With the primary key in place, we can always rely on the id column being unique, and acting as the direct representative of each table row. Do not confuse a primary key with an index however! A primary key is a constraint that uses an index.

This is a statement that also applies to unique constraints. Let’s add one and test it out:

DROP TABLE animal_sound;
 
CREATE TABLE animal_sound
(
  id      SERIAL PRIMARY KEY,
  animal  VARCHAR,
  color   VARCHAR,
  sound   VARCHAR
);
 
ALTER TABLE animal_sound
  ADD CONSTRAINT udx_animal_sound_by_animal_color
      UNIQUE (animal, color);
 
INSERT INTO animal_sound (animal, color, sound)
       VALUES ('Cow', 'Brown', 'Moo');
INSERT INTO animal_sound (animal, color, sound)
       VALUES ('Cow', 'Spotted', 'Moo?');
INSERT INTO animal_sound (animal, color, sound)
       VALUES ('Cow', 'Brown', 'MooOOOoOo');
 
ERROR:  duplicate KEY VALUE violates UNIQUE
        CONSTRAINT "udx_animal_sound_by_animal_color"
 
\d animal_sound
 
 COLUMN |       TYPE        |                   Modifiers                   
--------+-------------------+------------------------------------------------
 id     | INTEGER           | NOT NULL DEFAULT NEXTVAL('animal_sound_id_seq')
 animal | CHARACTER VARYING | 
 color  | CHARACTER VARYING | 
 sound  | CHARACTER VARYING | 
Indexes:
    "animal_sound_pkey" PRIMARY KEY, btree (id)
    "udx_animal_sound_by_animal_color" UNIQUE CONSTRAINT, btree (animal, color)

There are a couple new pieces here. First, notice that we took advantage of some shorthand to create the primary key this time around. By adding the PRIMARY KEY decorator after a column definition, Postgres will automatically add the primary key constraint and choose a name for us following a fairly simple naming scheme. It’s still the same constraint, but like SERIAL, this method saves us a lot of work.

Next, we can see that our unique constraint automatically created an index on our behalf. This works the same way as the primary key; Postgres prevents non-unique values by checking new rows against the existing record of unique combinations. In our case, we’re allowed to have a brown or spotted cow, but not two separate brown cows. Again, a unique constraint is not an index, but uses a unique index to enforce the constraint.

There’s a lot to learn about constraints, but for now, they’re beyond the scope of this article. We’ll probably cover them in the future, of course!

Summary

In the end, we only really need to remember that all of the base elements discussed here are individually distinct. A column is not an index. A sequence is not a primary key. An index is not a constraint. A primary key is not an index. But a column can be indexed. Sequences can provide default values to a column. Constraints can utilize an index to enforce themselves.

Tables, sequences, and indexes can be tied together with syntactical glue, but exist independently. Don’t get these things mixed up because other database engines use them differently or deviate conceptually. While invoking the handy SERIAL and PRIMARY KEY shorthand, keep in mind what’s really going on under the hood. There may be occasions where it matters, or we need to leverage the distinction in our favor for advanced use cases.

At least Postgres gives us that option.


Tags: , , ,

PG Phriday: Constipated Connections

July 29th, 2016 | Published in Database, Tech Talk | 2 Comments


We’ve all had it happen. One day, we run a query or function and we wait for the result. And we wait. And we wait. Eventually, we realize something is wrong and find a DBA and yell at them.

“Hey Postgres dude!” we scream. “The database is slow!”

Or maybe we can’t even get a connection. Postgres just keeps saying something about too many clients. This application isn’t launching and there are ten managers breathing down our neck and we panic.

“Hey Postgres dude!” we cry. “The database is down or something!”

Is it, though?

Maybe. Sometimes. It happens, after all. Other times, the situation isn’t so obvious and we have to dig a little deeper. All too often, the application stack is its own worst enemy and actually caused the problem to begin with. Let’s explore how that happens, starting with the usual setup:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - (s.id || 's')::INTERVAL
  FROM generate_series(1, 100) s(id);
 
ANALYZE sensor_log;

We now have a hilariously simple table with a mere 100 rows. Surely, nothing can go wrong here!

That’s a bold statement though, and entirely wrong. To explain why, we first need to explore a bit deeper into how Postgres handles locking. We’ve discussed MVCC in the past, so a quick summary is that it provides us with natural guarantees:

  • Readers do not block readers.
  • Readers do not block writers.
  • Writers do not block readers.
  • Writers may block writers.

Just so we’re complete, let’s test each of these assertions. Just run this on two separate connections:

BEGIN TRANSACTION;
SELECT * FROM sensor_log
 WHERE id = 10;
 
-- Then type this to clear the transaction
ROLLBACK;

Before running the ROLLBACK, we should get no delays from either connection. Each should produce the same results, and do so extremely quickly. This is no surprise, or at least it shouldn’t be. Readers aren’t modifying the data, so why should they block other readers?

Once we introduce a process that modifies data however, the scenario changes quite a bit. To better illustrate this, and perhaps provide a bit more insight into how MVCC works in Postgres, we’ll run another simulation. This time, we’ve included the ctid column, one of many system columns Postgres uses for it own purposes.

Here’s a quick interaction between one connection that just wants to read from sensor_log, and another that is writing to it.

-- On connection 1
 
BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 10;
 
  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50
 
-- On connection 2
 
BEGIN TRANSACTION;
UPDATE sensor_log
   SET reading = 89
 WHERE id = 10;
 
SELECT ctid,* FROM sensor_log
 WHERE id = 10;
 
  ctid   | id | location | reading |    reading_date     
---------+----+----------+---------+---------------------
 (0,101) | 10 | 10       |      89 | 2016-07-28 23:59:50
 
-- Then back to connection 1
 
SELECT ctid,* FROM sensor_log
 WHERE id = 10;
 
  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50
 
-- Then type this to clear the transaction on both connections.
 
ROLLBACK;

No blocking, right? Can a writer block a reader? Apparently not. Having an open transaction and reading a row in some databases will result in the write waiting until the read is complete. After the second connection modifies the data, we can see that the row is different between the two, yet there was no call for Postgres to block either operation.

Now let’s observe the ctid in those results. The particular encoding for this is (page, block), giving the physical location of the row in the table data file. After modifying the row in the second connection, we can see that the ctid changed. This is MVCC in action; the updated row is at the end of the table, yet the previous row also exists unmolested. That old row is what the first connection will see until the second connection commits the change. Neat, right?

But then we must consider the final case, where we have two independent entities writing data to the database. In circumstances where both connections are interested in the same rows, we need implicit and explicit locking to prevent inconsistent results.

With that in mind, let’s try something a bit more invasive:

-- On connection 1
 
BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 10
   FOR UPDATE;
 
  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50
 
-- On connection 2, this should hang
 
BEGIN TRANSACTION;
UPDATE sensor_log
   SET reading = 89
 WHERE id = 10;
 
-- Execute this on connection 1 to clear the write lock.
 
ROLLBACK;

The second connection hung after attempting to execute the update, didn’t it? By setting FOR UPDATE, we’ve declared our intent to modify the row we’re viewing, so Postgres treats it like any other update. With that lock in place, the second connection will wait indefinitely for the transaction to commit before making its own modifications. This is where most database engines necessarily stop when resolving write precedence.

If we return to connection two after running the rollback, we can see that it’s no longer waiting and has completed the update. But a major problem occurs if connection one never finishes its transaction. Imagine someone running ad-hoc queries against the database forgot to commit the last transaction and left for the night? Or maybe an application got stuck in a loop somewhere down the line after starting a transaction and making multiple modifications to several tables.

What then? Well, there are two possible outcomes for this roadblock:

  1. Connection two waits forever for a lock it can never obtain.
  2. The application times out without canceling the previous attempt, so it grabs a new connection and tries again. It does this until all available connections are exhausted.

Of these two results, the first is inevitable. In most cases, the problem just ends here and is where most users notice something is wrong. If we’re the second connection, we can’t tell if the database is slow or something is preventing us from proceeding. The second situation is what automation brings us. Once all connections are utilized, Postgres will refuse to make more unless it’s for a superuser.

If we examine the Postgres system catalog, we can see the problem explicitly by checking from a third connection:

SELECT pid, state, waiting, now() - query_start AS duration,
       SUBSTRING(query, 1, 15) AS query
  FROM pg_stat_activity
 ORDER BY query_start;
 
 pid  |        state        | waiting |    duration     |      query      
------+---------------------+---------+-----------------+-----------------
 5564 | idle IN TRANSACTION | f       | 00:01:48.564482 | SELECT ctid,* F
  574 | active              | t       | 00:01:43.940607 | UPDATE sensor_l
 5688 | active              | f       | 00:00:00        | SELECT pid, sta

Beware those idle in transaction connections, for they are the bane of every Postgres DBA. Unfortunately there’s no built-in way to find and correct these aside from scheduling a script to enforce a certain threshold by cancelling or killing connections that overstay their welcome.

Given what we now know, we can go even further than this. Let’s cause a deadlock!

-- On connection 1
 
BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 10
   FOR UPDATE;
 
  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,10) | 10 | 10       |      10 | 2016-07-28 23:59:50
 
-- Then connection 2
 
BEGIN TRANSACTION;
SELECT ctid,* FROM sensor_log
 WHERE id = 11
   FOR UPDATE;
 
  ctid  | id | location | reading |    reading_date     
--------+----+----------+---------+---------------------
 (0,11) | 11 | 11       |      11 | 2016-07-28 23:59:49
 
-- Now back to connection 1
 
UPDATE sensor_log
   SET reading = 911
 WHERE id = 11;
 
-- And finally connection 2 again
 
UPDATE sensor_log
   SET reading = 911
 WHERE id = 10;
 
ERROR:  deadlock detected

Anyone following along probably noticed a short delay before the deadlock detected error popped up. This delay is the deadlock_timeout configuration setting, which has a default of one second. This means there’s some process in Postgres that checks for circular lock dependencies every second, and will cancel the transaction of one offender when detected. It’s so easy to replicate because we simulated the basic criteria:

  1. Transaction A has locked object x.
  2. Transaction B has locked object y.
  3. Transaction A wants a lock on object y.
  4. Transaction B wants a lock on object x.

This circular loop of dependencies is what causes a deadlock, and without resolving it, both connections would have hung indefinitely. Some databases would indeed do just that. Still, our example is a bit contrived. The more common way for this to happen is between two separate tables being modified by different jobs.

Something like this:

-- On connection 1
 
BEGIN TRANSACTION;
UPDATE table_a SET col = 'whatever' WHERE ...;
UPDATE table_b SET col = 'whatever' WHERE ...;
 
-- While simultaneously on connection 2
 
BEGIN TRANSACTION;
UPDATE table_b SET col = 'whatever' WHERE ...;
UPDATE table_a SET col = 'whatever' WHERE ...;

If there are a series of operations we want to perform on multiple tables or rows, they should always come in the same order every time. In the above example, depending on the WHERE clause, we may or may not get a deadlock. There’s no way to tell, and there may be no evidence in development or QA systems simply because the volume is insufficient to trigger a deadlock.

This is most common when there are multiple access vectors. Imagine an application and a job server that were developed by different teams. Separated by a role barrier, operation order might not even be a consideration. Yet without a defined and consistent approach to handling database interactions, interlocked systems that work fine in development may become a deadlock salad in production.

Systems with a lot of row contention will see frequent periods of inexplicably long waits for seemingly simple write operations. Given the right (or wrong!) conditions, this will escalate into a deadlock or two… hundred. In either case, the excessive application-driven locking massively delays standard database operation are often why the “database is slow” or it appears inaccessible for short periods.

So go ahead and yell at your DBA; they’re just as likely to yell back. And that’s OK! Nobody really cares as long as somebody fixes it. So far as preventing the problem in the first place… well, that’s an article for a different day.


Tags: , , ,

PG Phriday: The Audacity of NoSQL

July 22nd, 2016 | Published in Database, Tech Talk | 11 Comments


The pure, unadulterated, presumptuous impudence of NoSQL. Engines like MongoDB recklessly discard concepts like ACID in some futile quest to achieve “web scale”, and end up accomplishing neither. RDBMS systems have literally decades of history to draw upon, and have long since conquered the pitfalls NoSQL platforms are just now encountering. There may be something to a couple of them, but by and large, they’re nothing we really need.

At least, that’s something I might have said a couple of weeks ago.

Now, I’ve never really ascribed to the supposition that NoSQL is the strict antithesis of traditional SQL-driven systems. The intro paragraph is something of a hyperbolic exaggeration of the general disdain that seems to permeate the NoSQL vs SQL debate. Most DBAs I’ve met basically say “meh” and move on with their day, optimizing queries, tweaking storage metrics, and what not. Usually NoSQL is on our RADAR, but we have other stuff to worry about.

Last week, I was forced to reconsider due to an existing internal project that needed my input. That analysis really showed me why devs and data guys from all walks of life are seduced to the Dark Side of data. After another week slogging through documentation, experimenting with self-healing replication sets, self-balancing sharding, and taking copious notes on everything, something finally clicked. The Dark Side, it seems, has some merit beyond simplifying data structure.

darkside_fullpic

So what can Postgres potentially learn from its eternal adversaries? Beyond stealing their encoding mechanisms via JSON and JSONB datatypes, that is. Though my analysis thus far has focused mostly on MongoDB, it alone incorporates several fundamental concepts that I’ve always felt Postgres lacked. For now, I’ll focus on the two that have—in my opinion, anyway—really prevented Postgres from reaching its full potential.

Break Me

Postgres has replication, but no meaningful application beyond having a hot standby or a few read slaves. Why no master election consensus system like RAFT? Last year, I came across an excellent article that discussed implementing it with HAProxy, Governor, and etcd instead of the more historic (and infinitely more complicated) Pacemaker stack. Someone even forked Governer into Patroni to incorporate alternative consensus approaches. Still, that’s a lot of external dependencies.

Postgres has all of the parts to handle this itself, it just doesn’t. It has the replication stream. It knows among all replica nodes which has the highest transaction id. It has a process for cloning an existing instance in pg_basebackup. If an old primary node needs to become a replica, it can either do it directly, or use pg_rewind if necessary. Yet these fragments must be combined manually. Some use scripts, others use Pacemaker or some variant of the Governer approach above, but it’s much easier for a new project to simply chose another platform.

The specific issue at hand, is that Postgres elects to consider itself an autonomous process. In this view, a database is a self-contained data management structure, where everything goes into a single extremely high integrity bucket. As a consequence, Postgres has no concept of a cluster at all, except perhaps a slight awareness of currently streaming replicas.

MongoDB takes itself much less seriously. In the MongoDB world, a mongod instance is a bucket that will accept any and all content, and do a passable job of protecting it. As such, it’s much less concerned with spillage, giving it greater leverage for passing that bucket around. The authors of MongoDB clearly wanted to focus on data availability rather than integrity. To get that, they settled on a consensus system that some might argue is inadequate when combined with their data writing strategies.

Despite that, it works. MongoDB is a concept beyond mere data storage. Where Postgres is becoming more of a storage middleware through additions like extensions and foreign data wrappers, MongoDB is “where your data goes.” That’s a very alluring attitude, and a subtle shift in focus few (if any) mainstream RDBMS have adopted. First and foremost, engines like MongoDB are a cluster, incorporating multiple nodes that interchangeably share roles but present a united face to the world. Postgres has no equivalent.

But it could! Imagine a universe where we have the integrity guarantee of an ACID database, with a Mongo-like front-end that manages the nodes such that our bulletproof data is always available. Speaking of front ends…

Into Pieces

Shards. I have 50TB of data, and I have no Earthly idea what to do with it. I’d love to blow it up into a trillion little bits, but Postgres gives me no easy way to do that. Here’s the second place MongoDB trumps Postgres in its current incarnation. It’s not exactly a cake-walk, considering I drew up this chart while compiling my notes:

Mongo Cluster

But all the tools are there. Anyone with a dozen servers or VMs can build a scalable data store without writing their own distribution implementation. That’s hard to ignore.

For Postgres, it really boils down to the citus extension and Postgres-XL. Unfortunately citus has severe limitations with joins including non-distributed tables. Likewise, Postgres-XL has been trying to merge the scaling code into the 9.5 branch for a year now. Despite the increased visibility and presumed quality of 2ndQuadrant’s work, there are a couple of orders of magnitude fewer eyes on that code. Even assuming they manage to merge everything before 9.6 drops, will they be able to keep up with core afterwards? Citus has its limitations, but in the end, it is an extension anyone can install. Anything that isn’t an extension risks falling woefully behind or being abandoned.

Is there another way? If we treated Postgres backends like dumb containers the way MongoDB does, things suddenly change quite a bit. MongoDB has a process called mongos which is basically just a glorified proxy that manages shard metadata and forks out a balancing job that ushers data around to ensure shard content isn’t lopsided. That’s the “Manager / Balancer” in my chart. What if it also included a transaction manager?

Postgres has supported background workers since 9.3. In theory, it could adopt management of shard metadata, chunk migration, or global transaction roles where appropriate. This is essentially what Postgres-XL (and similar) is doing. Where MongoDB wrote a glue process to bring everything together, Postgres-XL opted to directly patch the Postgres source, and all the pain that entails.

I’m in danger of being too dismissive of implementation details, of course. Yet there are massive potential benefits to taking a cluster approach by default. Such a distribution proxy could be database agnostic, compatible with any engine that speaks SQL. Consider too that Postgres foreign data wrappers imply it could be the communication layer.

Last Thoughts

Postgres is tantalizingly close to being a cluster system simply through the communication protocols it provides. It just needs that final piece. I’m fully convinced that someone with the right skillset (definitely not me) could whip up a proof of concept for this in a few days. This is, after all, how most application-level sharding approaches work anyway. In the end, that’s all MongoDB, and several other NoSQL solutions ultimately provide. Take one part integrated RAFT process to always present a writable primary, and mix in a data distribution and balancing proxy. Done.

If they so desired, the MongoDB developers could theoretically even use Postgres as a back-end storage mechanism. In fact, I highly encourage them to do so! Given a natural shard split and extensions like cstore_fdw, suddenly NoSQL becomes a scalable column-store.

Barring that, corporate use of Postgres will be necessarily limited. There’s only so much you can do in a single instance, and without integral scaling features, it becomes migration time. I don’t want to tell our devs to convert all of their ORMs and queries to JSON, give up easy joins, or forgo NULL handling, but I will. I have to. This 50TB is only growing every day, and without a straight-forward and reliable migration path, we need to retool.

I disagree with too many of the data compromises MongoDB makes to use it for important data, but the availability approach is fundamentally sound. Mongo’s implementation is refreshing and innovative, and one which many NoSQL engines appear to share. Is it any wonder there has been a silent transition to these systems, when everything has several billion rows, and applications fully expect to reach well beyond Terabytes of storage into Petabyte territory?

No single instance can absorb that kind of volume, and not everyone has time or inclination to work around that limitation. We need a common middle-ground with the NoSQL world. We have the better storage engine, and they have the better distribution and availability mechanisms. Can’t there be a place where both of those things work in tandem?

I, for one, welcome our cluster overlords.


Tags: , , , , , , , , , ,

« Older Posts

Newer Posts »