PG Phriday: PostgreSQL Select Filters
Long have CASE
statements been a double-edged sword in the database world. They’re functional, diverse, adaptive, and simple. Unfortunately they’re also somewhat bulky, and when it comes to using them to categorize aggregates, something of a hack. This is why I wanted to cry with joy when I found out that PostgreSQL 9.4 introduced a feature I’ve always wanted, but found difficult to express as a need. I mean, CASE
statements are fine, right? Well, yes they are, but now we have something better. Now, we have the FILTER
aggregate expression.
I always like working with examples, so let’s create some test data to illustrate just what I’m talking about.
CREATE TABLE sys_order
(
order_id SERIAL NOT NULL,
product_id INT NOT NULL,
item_count INT NOT NULL,
order_dt TIMESTAMP NOT NULL DEFAULT now()
);
INSERT INTO sys_order (product_id, item_count)
SELECT (a.id % 100) + 1, (random()*10)::INT + 1
FROM generate_series(1, 1000000) a(id);
ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
PRIMARY KEY (order_id);
We now have a table for tracking fake orders, using 100 nonexistent products. I added the primary key after loading the table as a well known DBA trick. Doing this after data loading means the index can be created as a single step, which is much more efficient than repeatedly extending an existing index.
With that out of the way, let’s do a basic product order count, since that’s something many people are already familiar with:
SELECT sum(item_count) AS total
FROM sys_order;
-[ RECORD 1 ]--
total | 1000000
No surprises here. But what happens when Jeff from Accounting wants to know how many people ordered five specific products as a column list? In the old days, we might do something like this:
SELECT sum(CASE WHEN product_id = 1 THEN item_count ELSE 0 END) AS horse_mask_count,
sum(CASE WHEN product_id = 7 THEN item_count ELSE 0 END) AS eyeball_count,
sum(CASE WHEN product_id = 13 THEN item_count ELSE 0 END) AS badger_count,
sum(CASE WHEN product_id = 29 THEN item_count ELSE 0 END) AS orb_count,
sum(CASE WHEN product_id = 73 THEN item_count ELSE 0 END) AS memebox_count
FROM sys_order;
horse_mask_count | eyeball_count | badger_count | orb_count | memebox_count
------------------+---------------+--------------+-----------+---------------
59870 | 59951 | 59601 | 59887 | 60189
Gross.
As a DBA, I’ve seen more of these than I can reasonably stand, and hate them every single time. It’s not the use of the CASE
statement that is so irksome, but the micromanaging methodology necessary to reduce the count to zero for unwanted items. With FILTER
however, this query changes quite a bit:
SELECT sum(item_count) FILTER (WHERE product_id = 1) AS horse_mask_count,
sum(item_count) FILTER (WHERE product_id = 7) AS eyeball_count,
sum(item_count) FILTER (WHERE product_id = 13) AS badger_count,
sum(item_count) FILTER (WHERE product_id = 29) AS orb_count,
sum(item_count) FILTER (WHERE product_id = 73) AS memebox_count
FROM sys_order;
horse_mask_count | eyeball_count | badger_count | orb_count | memebox_count
------------------+---------------+--------------+-----------+---------------
59870 | 59951 | 59601 | 59887 | 60189
The query itself isn’t much shorter, but semantically, it’s far easier to understand what we’re trying to accomplish. It’s clear that we want the item count for each specific product, and nothing else. Further, since this is built-in functionality instead of a gross hack, it’s much faster. On the system we used for testing, after ten runs, the average time for the CASE
variant was about 500ms; the FILTER
version was about 300ms. In both cases, the query execution plan is identical. Internally however, invoking hundreds of thousands of CASE
statements causes an immense CPU impact, where FILTER
can utilize set grouping or another efficient stratagem based on the filter criteria. For large OLAP databases, this is a significant improvement in both query simplicity and performance.
This is good stuff we are getting in these new releases, and I encourage everyone to enjoy all of the new toys we get every year. Some of them are much more than mere window-dressing.