PG Phriday: Reducing Writes With Unlogged Tables

Last week, I covered how MVCC, Postgres’s storage system, works on a very high level. Near the end, I also mentioned that it doesn’t quite lend itself well to certain use cases, such as rapidly mutating session storage. Well, there is one caveat to that statement that I’d forgotten about because of its relatively limited utility: unlogged tables.

Here’s what the Postgres documentation has to say about unlogged tables:

Data written to unlogged tables is not written to the write-ahead log, which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers.

But what does this actually mean in the back end? That comes down to how databases interact with the underlying filesystem. In the case of Postgres, all writes to the database are actually first saved to a binary file that is decoupled from the actual data files. Then a periodic event called a checkpoint will commit those changes to the target table files. This way, crashes do not damage the database, the binary logs can be used to replicate to other instances, and backups can be rolled forward in time by applying transaction logs manually.

This is why unlogged tables have absolutely no crash protection, will not be copied to the replication stream, and get truncated if there’s any doubt regarding the contents. There’s no way to verify the data they contain, and they’ve been pruned from the very infrastructure that gives Postgres its durability. So what are they good for, then?

Let’s run a very short experiment. I commonly recommend any PostgreSQL server be set up with these three options in postgresql.conf:

wal_level = hot_standby
archive_mode = on
archive_command = 'exit 0'

Since archive mode can only be modified by restarting the server, it’s a good idea to just leave it enabled, but neutralized. Postgres does not archive transaction logs with these settings, just as imposed by the defaults. However, by modifying archive_command and reloading the config files, we can easily enable archival some time later without disrupting the database. Let’s create a junk drawer and set up a basic copy command, and reload:

mkdir /tmp/xlog

# Set archive_command in postgresql.conf
archive_command = 'cp -f %p /tmp/xlog/%f'

# Reload Postgres
sudo service postgresql reload

This will copy transaction log files which are no longer needed by the database (%p), to the directory we created in /tmp. Now let’s create two tables that might benefit from an unlogged approach:

\timing

CREATE UNLOGGED TABLE sys_session_unlogged
(
    session_id    SERIAL     NOT NULL,
    session_data  TEXT       NOT NULL,
    modified_dt   TIMESTAMP  NOT NULL DEFAULT now()
);

INSERT INTO sys_session_unlogged (session_data)
SELECT repeat(chr((random() * 1000)::INT % 94 + 33), 32)
  FROM generate_series(1, 5000000) a(id);

-- Pause here to check the /tmp/xlog directory for its contents.

CREATE TABLE sys_session
(
    session_id    SERIAL     NOT NULL,
    session_data  TEXT       NOT NULL,
    modified_dt   TIMESTAMP  NOT NULL DEFAULT now()
);

INSERT INTO sys_session (session_data)
SELECT repeat(chr((random() * 1000)::INT % 94 + 33), 32)
  FROM generate_series(1, 5000000) a(id);

Why five million rows? It acts as a great amplifier for timings, to make it more obvious what benefits there might be. In my test VM, the unlogged version took about 16 seconds to load, while the regular table required around 20. What might be more important however, is what happened in the /tmp/xlog directory. After filling the unlogged table, there were two files there, suggesting there’s some behind-the-scenes accounting that imply the table is not entirely unrepresented in the transaction log stream. For some perspective, the regular table contributed 35.

Now, a 25% performance benefit is probably not worth discarding crash protection and other safeguards, but avoiding needless writes just might be. All of the other aspects of MVCC still apply, so I still wouldn’t suggest using something like this for user session tables, but what can it be used for? Perhaps as a replacement for temporary tables.

Temporary tables have one major shortcoming: they’re unusable by anything but the current connection. If an application is comprised of several disjointed components that need to manipulate the same in-transit data, there is no option but to create a real table for this collaborative work. In this context, a database crash or other disruption is not a disaster, as the process can be restarted, and the table rebuilt. Alternatively, if the application itself crashes, its connection and all of its previously inserted data are lost. Unlogged tables thwart both of these scenarios.

They also make a great dumping ground for all kinds of volatile data that doesn’t need long-term protection, such as a web user activity log. Such tables can collect a period of clicks, API calls, logins, and so on, before being analyzed and stored permanently in aggregate form at a much lower volume. Essentially, any use case that doesn’t need a bullet-proof durability guarantee is a good candidate for conversion to an unlogged table.

As a DBA, it’s actually somewhat difficult to view data in this respect. Purposefully creating a table and explicitly disabling data persistence seems both foolhardy and dangerous at first glance. But there are use cases, as mentioned above, where we don’t need to jealously loom over the table with a loaded shotgun, suspiciously glaring at anyone in the vicinity. Some data is simply meant to be regularly destroyed, so why not accommodate that aspect?

In fact, I have a developer or two to talk into incorporating these into a multi-phase ETL system. Wish me luck!