PG Phriday: Bodacious Benchmarks

When it comes to putting Postgres through its paces, we often turn to benchmarks to absolutely bury it under a torrent of oppressive activity. It’s a great way to obtain maximum performance metrics and also observe how Postgres reacts and breaks down under such pressure. But these kinds of tests aren’t really practical, are they? After all, many such simulated workloads are nothing but bragging rights measured against previous Postgres releases, or for hardware comparisons. But while functionality beyond defaults is often overlooked, tools like pgbench are actually critical to the development process.

I’ve used pgbench frequently in the past for examples in these articles. It’s much like Apache’s own ab for bombarding a server with various workloads, except pgbench must bootstrap by creating a set of test tables. Or does it? What if we have our own existing tables from a project that we want to measure, either for a hardware migration, or for scalability concerns?

Let’s see how we can utilize pgbench with our trusty sensor_log table. It’s not really designed to experience any updates, so we need something else as well. Let’s say the sensor log is an endpoint capture for a very busy sensor table that is only updated when sensors detect a change in the environment. That should provide us with plenty of opportunities for contention!

CREATE TABLE sensor (
  sensor_id      SERIAL PRIMARY KEY NOT NULL,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  modified_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor (location, reading, modified_date)
SELECT s.id, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id % 60 || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);

CREATE INDEX idx_sensor_location
    ON sensor (location);

ANALYZE sensor;

CREATE TABLE sensor_log (
  sensor_id     BIGINT NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

CREATE UNIQUE INDEX udx_sensor_log_sensor_id_reading_date
    ON sensor_log (reading_date, sensor_id);

CREATE OR REPLACE FUNCTION f_enforce_modified()
RETURNS TRIGGER 
LANGUAGE plpgsql AS
$$
BEGIN
  NEW.modified_date = now();
  RETURN NEW;
END;
$$;

CREATE TRIGGER t_sensor_timestamp_b_iu
BEFORE INSERT OR UPDATE
    ON sensor
   FOR EACH ROW
       EXECUTE PROCEDURE f_enforce_modified();

CREATE OR REPLACE FUNCTION f_capture_sensor_changes()
RETURNS TRIGGER 
LANGUAGE plpgsql AS
$$
BEGIN
  INSERT INTO sensor_log (sensor_id, location, reading, reading_date)
  VALUES (NEW.*);

  RETURN NULL;
END;
$$;

CREATE TRIGGER t_sensor_changes_a_iu
 AFTER INSERT OR UPDATE
    ON sensor
   FOR EACH ROW
       EXECUTE PROCEDURE f_capture_sensor_changes();

We’ve bootstrapped the sensor table with a relatively substantial amount of data. Five million rows isn’t really much these days, but it’s more than enough for demonstration purposes on a laptop or VM. With that said, there are a few things we can test with this basic stable structure. But what to analyze?

Let’s begin with a basic custom script that just emulates a simple update schedule with sensor readings coming in constantly from any random location. It’s a good place to start and allows us to tweak further. Note that this script only works with 9.6 and above, but previous iterations have an equivalent.

\set sid random(1, 100000 * :scale)
\set val random(1, 100)

SELECT * FROM sensor
 WHERE sensor_id = :sid;

UPDATE sensor SET reading = :val
 WHERE sensor_id = :sid;

SELECT max(reading) FROM sensor_log
 WHERE sensor_id = :sid
   AND reading_date > CURRENT_DATE - INTERVAL '5 minutes';

This script does a few things for us. Taken as a whole, we just grab a random entry from the sensor table, capture its current reading, and see if that affected the aggregate high over the last five minutes. This is something an application might do as part of an analysis cycle. The real magic starts when we actually execute the script with pgbench.

This is actually pretty cool, and full of valuable information. We’re also going to utilize a trick so we don’t taint our existing data. It’s not something recommended for production systems, but it’s perfect for QA or development environments.

createdb sensor_test -T sensors
pgbench --scale 50 --report-latencies --no-vacuum \
        --file sensor-bench-1.sql --time 30 sensor_test

transaction type: sensor-bench-1.sql
scaling factor: 50
query mode: simple
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 9200
latency average: 3.261 ms
tps = 306.636514 (including connections establishing)
tps = 306.672074 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.003  \set sid random(1, 100000 * :scale)
         0.001  \set val random(1, 100)
         0.291  SELECT * FROM sensor
         1.245  UPDATE sensor SET reading = :val
         1.713  SELECT max(reading) FROM sensor_log

The first command created a new database named sensor_test based on the sensors database that contained both of our sample tables, and includes any tables and data that already existed. This means we can run as many tests as we want, throw everything away, and start fresh as many times as we need. It takes a bit of time to copy data, so there is probably a limit to where this becomes difficult to justify, but small and medium data sets fit this model just fine.

The pgbench command itself deserves some explanation, though we did defer to the long-format parameters to help convey what we’re actually doing.

Let’s start with the scale. By default, pgbench uses 100,000 rows as the base, and our script retains that because why not? With a scale of 50, that’s our five-million rows. What’s interesting about this is that we can actually increase possible contention by reducing the scale so it focuses modifications on a smaller portion of the data. We’ll probably find this useful later.

We disable vacuuming because pgbench only knows to vacuum its own tables, so we disable that while using custom scripts. The script statistics at the end of the report is why we included the latency report. With this information, we can focus on the part of the script causing any slowdowns. Assuming we base our scripts on real application samples, this is an invaluable source of performance metrics.

Aside from the custom file we’re using for the test itself, we also set the test duration to thirty seconds. For any real kind of analysis, we’d want to use a much larger value, on the order of tens of minutes or even hours. That would give us a much better idea of average latencies after accounting for other system activity, including write throughput to the base data files during checkpoints.

Let’s see what happens if we crank down the scale to create a “hot spot” in our table, and increase the client concurrency to force more contention:

dropdb sensor_test
createdb sensor_test -T sensors
pgbench --client 4 --report-latencies --no-vacuum \
        --log --aggregate-interval 10 \
        --file sensor-bench-1.sql --time 30 sensor_test

transaction type: sensor-bench-1.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 30 s
number of transactions actually processed: 32425
latency average: 3.701 ms
tps = 1080.246903 (including connections establishing)
tps = 1080.360307 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.002  \set sid random(1, 100000 * :scale)
         0.001  \set val random(1, 100)
         0.654  SELECT * FROM sensor
         1.702  UPDATE sensor SET reading = :val
         1.332  SELECT max(reading) FROM sensor_log

cat $(ls -t pgbench_log.* | head -n 1)

1474658287 13068 36026916 122319506636 1060 17761
1474658297 10252 39970824 203678243092 1568 32993
1474658307 8371 39968669 270707139663 1910 82191

There isn’t a whole lot that has changed here, but we can see a couple important bits of information. First, the update statement is notably slower than before. This is a 2-CPU VM, so that’s expected, though incoming activity eclipsing CPU count on a system is somewhat common. Despite this, the aggregate maximum did not get slower, and in fact is actually slightly faster. Clearly the update is the issue, which is expected since we saddled it with two triggers.

That last bit of output is the result of including output logging and tweaking our aggregate interval to summarize instead of reporting the results of every transaction. This data gets logged based on the current PID of the pgbench command, so we cheated a bit and used cat to print the file for the most recent benchmark. Each column has a meaning, but the last two represent minimum and maximum latency in microseconds. Based on this, we can see that latency has some pretty wild swings from a mere 1ms to over 80ms.

This tells us a couple things, actually. Our previous models never included the sensor table, just sensor_log. We can always pull the most recent reading from a sensor log, so having a lot of updates and two triggers on the base table is a lot of unnecessary overhead. What if we flipped the model slightly and emulated the current state with a view instead?

CREATE TABLE sensor_log (
  sensor_id     SERIAL NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL DEFAULT now()
);

INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id % 60 || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);

CREATE UNIQUE INDEX udx_sensor_log_sensor_id_reading_date
    ON sensor_log (sensor_id, reading_date DESC);

ANALYZE sensor_log;

CREATE VIEW v_sensor AS
SELECT DISTINCT ON (sensor_id) *
  FROM sensor_log
 ORDER BY sensor_id, reading_date DESC;

This comes with a corresponding update to our script:

\set sid random(1, 100000 * :scale)
\set val random(1, 100)

SELECT * FROM v_sensor
 WHERE sensor_id = :sid;

INSERT INTO sensor_log (sensor_id, location, reading)
VALUES (:sid, :sid, :val);

SELECT max(reading) FROM sensor_log
 WHERE sensor_id = :sid
   AND reading_date > CURRENT_DATE - INTERVAL '5 minutes';

And finally, we run the altered benchmark:

dropdb sensor_test
createdb sensor_test -T sensors
pgbench --client 4 --report-latencies --no-vacuum \
        --file sensor-bench-2.sql --time 30 sensor_test

transaction type: sensor-bench-2.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 30 s
number of transactions actually processed: 52091
latency average: 2.304 ms
tps = 1736.124362 (including connections establishing)
tps = 1736.281647 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.002  \set sid random(1, 100000 * :scale)
         0.001  \set val random(1, 100)
         0.463  SELECT * FROM v_sensor
         1.347  INSERT INTO sensor_log (sensor_id, location, reading)
         0.480  SELECT max(reading) FROM sensor_log

Maybe it comes as no surprise, but these modifications improved performance by over 70%! A good DBA probably would have said something about the sensor table for the same reasons we listed, but it’s better to have numbers to prove it. It’s certainly logical enough to keep a table with current values and keep an audit log for changes, but we found a better model if we need to increase throughput.

Keep in mind that this is a very simple example. In a real cluster, our benchmark script would likely be much more involved, and we’d probably have several more of them for various read/write percentage scenarios. This is the kind of tooling necessary to really squeeze the database for every last drop of performance, or otherwise reveal points of contention or excessive latency swings.

If you’re not already using it, pgbench is a powerful addition to any database developer or DBA arsenal. It’s much more than its defaults might suggest!