PG Phriday: EXCEPTIONal Performance

Friday July 01, 2016

Like any programming language, the PL/pgSQL Postgres procedural language has plenty of handy control structures. Among those thankfully modern accoutrements is the humble EXCEPTION block. While not the more prevalent try/catch methodology, we can use BEGIN anywhere to start an embedded section for the same effect.

Knowing this is the case, what are the costs of actually using them? Postgres is fairly streamlined, and as such, can take several shortcuts when there are no exception blocks within a chunk of PL/pgSQL. Thus, it’s not uncommon for members of mailing lists to mutter about ‘performance costs’ and suggest that the worst performance hit is felt by even breathing the word ’exception’ in the presence of a Postgres instance. The implication is that further leverage will be incremental by comparison.

But is that the case? And if not, what are the actual metrics observed while using exceptions in different contexts? Let’s go beyond nebulous hand waving and take a look, starting with a basic table to hold data:

CREATE TABLE sensor_log (
  id            SERIAL NOT NULL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

CREATE INDEX idx_sensor_log_date
    ON sensor_log (reading_date);

The above table is one we’ve used often before. It helps us illustrate using a function to insert data as one potential entry vector commonly associated with projects like PL/Proxy. Really this will apply to any application that is designed to interact through a database-side API of predefined functions.

A very basic example of this is a simple insert function:

CREATE OR REPLACE FUNCTION f_capture_log(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (location, reading, reading_date)
  VALUES (sensor_loc, sensor_val, now());

  RETURN true;
END;
$$ LANGUAGE plpgsql;

\o /dev/null

SELECT f_capture_log('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);

Time: 2567.990 ms

There’s nothing too strange here. All we have is a function that’s wrapping our insert and applying a timestamp when the function is called. We used 100,000 rows to push enough data that our timings remain mostly consistent. By and large, we were successful here; any variance in run times is generally within 5% on our otherwise idle test VM.

Knowing that, we can really start having fun now. We came here to examine exceptions, so let’s add one that does nothing but invert the success of our function call. Keep in mind that this exception will never fire, since we’re relying on a sequence to provide conflict-free primary key IDs.

CREATE OR REPLACE FUNCTION f_capture_log_except(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (location, reading, reading_date)
  VALUES (sensor_loc, sensor_val, now());
  RETURN true;
EXCEPTION
  WHEN OTHERS THEN
    RETURN false;
END;
$$ LANGUAGE plpgsql;

\o /dev/null

SELECT f_capture_log_except('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);

Time: 3211.702 ms

It looks like we have a 25% performance hit simply by including the exception block. Does that justify the argument that there is a significant upfront cost associated with exceptions, and that once we cross that threshold, everything is smooth sailing? Indeed, 25% is rather substantial for an exception that literally does nothing and will never actually run. Maybe, or maybe not. Whatever we find, at least we have some numbers to work with!

So let’s screw up our function a little, and make it work for its dinner.

CREATE OR REPLACE FUNCTION f_capture_log_broken(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (id, location, reading, reading_date)
  VALUES (sensor_val, sensor_loc, sensor_val, now());
  RETURN true;
EXCEPTION
  WHEN OTHERS THEN
    RETURN false;
END;
$$ LANGUAGE plpgsql;

\o /dev/null

SELECT f_capture_log_broken('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);

Time: 5235.468 ms

Ouch! Our function is now about twice as slow as the original incarnation that didn’t use an exception block. Still, this is a worst case scenario where literally every invocation of the function will cause an exception. If we reduce the amount of overlap and truncate the sensor_log table, run times are affected proportionally. In other words, if only 5% of the calls caused a key violation, we’d only see a further 5% performance cost.

So far, everyone has been correct. The initial investment cost from using exceptions is still the highest we’ve paid, barring ridiculous examples like this one. Does that change if we catch the actual error that’s occurring so there are two conditionals? Let’s see:

CREATE OR REPLACE FUNCTION f_capture_log_extra_broken(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (id, location, reading, reading_date)
  VALUES (sensor_val, sensor_loc, sensor_val, now());
  RETURN true;
EXCEPTION
  WHEN unique_violation THEN
    RETURN false;
  WHEN OTHERS THEN
    RETURN false;
END;
$$ LANGUAGE plpgsql;

\o /dev/null

SELECT f_capture_log_extra_broken('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);

Time: 5370.879 ms

It turns out that the answer is no. Using further exception conditions doesn’t appear to be statistically significant. There’s probably a small amount of overhead associated with the underlying code represented in the jump logic, but that’s an expected element of any language. But this is still boring and pointless; our exceptions still do nothing but tell us the insert failed.

What happens if we actually modify the exception to raise another exception that the caller can process? So long as the exception level we use isn’t fatal, it won’t short-circuit function loops, allowing us to feel the full cost of every uncaught exception.

CREATE OR REPLACE FUNCTION f_capture_log_super_broken(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (id, location, reading, reading_date)
  VALUES (sensor_val, sensor_loc, sensor_val, now());
  RETURN true;
EXCEPTION
  WHEN unique_violation THEN
    RAISE WARNING 'This sucks!';
    RETURN false;
  WHEN OTHERS THEN
    RAISE WARNING 'Ditto!';
    RETURN false;
END;
$$ LANGUAGE plpgsql;

\o /dev/null
SET client_min_messages TO error;

SELECT f_capture_log_super_broken('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);

Time: 6974.891 ms

Almost three times slower? Now we’re talking! Again, this is a worst case scenario, but it’s also proportional. Since the new function is about 30% slower than simply catching the key violation, we can extrapolate that there’s a 30% cost associated with raising an exception. In that case, if our data had a 10% overlap, we’d have the initial 10% from the exception logic, and another 3% for raising our own exceptions afterwards.

This is in addition to the initial 25% cost associated with even using exceptions in the first place. Again, the upfront price from having an exception block is worse than activating various functionality within it. Even in our highly contrived example, we’d need to call our function on data that had 20% duplicates before we’d start to eclipse the initial exception cost.

How likely is that, really? In most cases where APIs are used like this, there will be an occasional race condition every few hundred thousand or million rows. In that case, the full run time of even our most broken function would be indistinguishable from the best case that had a completely inactive exception block. In the end, that’s what we’ll likely see in real-world scenarios.

At least with Postgres 9.5 on an idle VM, it would appear that exceptions add a 25% performance penalty. Whether this is enough to justify a different approach, or bar them outright, is completely dependent on the expected use case. Indeed, we’re already paying a hefty price by using a function instead of a direct INSERT statement anyway, so it could be a moot argument. A highly sensitive transactional database would probably avoid the overhead, while a more common day-to-day system wouldn’t even notice.

Just keep that number in mind: 25%. It’s the cost of doing business with exceptions, but hardly a deal breaker.

See Also