PG Phriday: 10 Ways to Ruin Performance: Cast Away
Unlike a lot of programming languages that have loose typing, databases are extremely serious about data types. So serious in fact, many patently refuse to perform automatic type conversions in case the data being compared is not exactly equivalent afterwards. This is the case with Postgres (PostgreSQL) and surprisingly often, queries will suffer as a result. Fortunately this is something we can address!
There are easier ways to demonstrate this, but in the interests of making this more of an adventure, let’s use a join between two tables:
CREATE TABLE sys_order ( order_id SERIAL NOT NULL, product_id INT NOT NULL, item_count INT NOT NULL, order_dt TIMESTAMPTZ NOT NULL DEFAULT now(), valid_dt TIMESTAMPTZ NULL ); INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt) SELECT (a.id % 100000) + 1, (a.id % 100) + 1, now() - (id % 1000 || 'd')::INTERVAL, CASE WHEN a.id % 499 = 0 THEN NULL ELSE now() - (id % 999 || 'd')::INTERVAL END FROM generate_series(1, 1000000) a(id); ALTER TABLE sys_order ADD CONSTRAINT pk_order_id PRIMARY KEY (order_id); CREATE TABLE sys_exchange_map ( exchange_map_id SERIAL NOT NULL, vendor_id INT NOT NULL, ext_order_code TEXT NOT NULL, map_message TEXT NOT NULL, xmit_dt TIMESTAMPTZ NOT NULL DEFAULT now() ); INSERT INTO sys_exchange_map ( exchange_map_id, vendor_id, ext_order_code, map_message, xmit_dt ) SELECT a.id, (a.id % 10) + 1, ((a.id % 10000) + 1) || ':' || substring(random()::TEXT, 3), 'This is a bunch of junk from an external vendor queue!', now() - (id % 30 || 'd')::INTERVAL FROM generate_series(1, 100000) a(id); ALTER TABLE sys_exchange_map ADD CONSTRAINT pk_exchange_map_id PRIMARY KEY (exchange_map_id); CREATE INDEX idx_exchange_map_xmit_dt ON sys_exchange_map (xmit_dt); ANALYZE sys_order; ANALYZE sys_exchange_map;
This structure simulates something I’ve seen somewhat often in real systems. In this case, our order table is normal, but there’s also an exchange mapping table that maps our orders to external vendor orders. Imagine this table is part of the queue engine running the application, so the various codes and message columns are describing the interactions within the platform.
Given this structure, it’s not uncommon to encounter a query that manually parses one or more of these codes and tries to find matching orders. This is what that might look like:
SELECT o.* FROM sys_exchange_map m JOIN sys_order o ON ( o.order_id = substring(m.ext_order_code, '(.*):')::NUMERIC ) WHERE m.xmit_dt > CURRENT_DATE;
Unfortunately on my test VM, this query takes almost four seconds! Experienced developers and DBAs probably already know what the problem is, but let’s take a gander at two segments of the
EXPLAIN ANALYZE output and see if there are any obvious clues.
Merge Join (cost=142800.24..568726.04 rows=16935000 width=28) (actual time=4844.280..4875.651 rows=3333 loops=1) [ SNIP ] -> Seq Scan on sys_order o (cost=0.00..17353.00 rows=1000000 width=28) (actual time=0.036..884.354 rows=1000000 loops=1) Planning time: 0.240 ms Execution time: 3626.783 ms
We should be able to immediately see two potential problems. Thinking back on our discussions regarding estimates matching actual, notice that the database thinks it will be returning almost 17M rows, when there are really only around 3300. Further, it’s scanning the entire
sys_order table instead of using the primary key. What happened, here?
This may or may not come as a surprise, but
NUMERIC is not compatible with standard integer types in Postgres. For those who have worked with other databases such as Oracle, it may seem natural to use
NUMERIC as a catch-all for any number. This is especially true for developers who work with different data types in a myriad of languages, some of which feature multiple compatible numeric types.
With Postgres, casting is controlled through explicit declaration of type compatibility. We can look at
NUMERIC and see which types it will convert to with this query:
SELECT s.typname AS source_type, t.typname AS target_type, CASE c.castcontext WHEN 'a' THEN 'Assignment' WHEN 'i' THEN 'Implicit' WHEN 'e' THEN 'Explicit' END AS context FROM pg_cast c JOIN pg_type s ON (s.oid = c.castsource) JOIN pg_type t ON (t.oid = c.casttarget) WHERE s.typname = 'numeric'; source_type | target_type | context -------------+-------------+------------ numeric | int8 | Assignment numeric | int2 | Assignment numeric | int4 | Assignment numeric | float4 | Implicit numeric | float8 | Implicit numeric | money | Assignment numeric | numeric | Implicit
From this chart, it would appear that
NUMERIC are compatible, but only through assignment. This means we can use them interchangeably in calculations, for example, or save an integer to a numeric column or the reverse. This does not mean they’ll be automatically converted or treated as compatible for equality tests. Since the types are not implicitly compatible, Postgres will not use the
INT index for the
Unable to use the proper index, the planner does its best to come up with a working plan. Unfortunately the type incompatibility wreaks havoc during that phase as well. What we end up with is something of a mess, with completely incorrect row estimates, an unnecessary sequence scan, and an expensive in-memory merge. And all it would take to fix all of this, is to make one tiny change to our query:
SELECT o.* FROM sys_exchange_map m JOIN sys_order o ON ( o.order_id = substring(m.ext_order_code, '(.*):')::INT ) WHERE m.xmit_dt > CURRENT_DATE; QUERY PLAN -------------------------------------------------------------------- Nested Loop (cost=66.98..22648.71 rows=3387 width=28) (actual time=1.058..42.084 rows=3333 loops=1) [ SNIP ] -> Index Scan using pk_order_id on sys_order o (cost=0.43..6.19 rows=1 width=28) (actual time=0.003..0.003 rows=1 loops=3333) Planning time: 1.176 ms Execution time: 43.635 ms
That’s a pretty massive difference! In this particular case, it’s 80 times faster, but I’ve run into situations where a single miscast caused the query to be almost 80,000 times slower. It all depends on the scale of the data involved, and how badly everything compounds at each step of the execution process.
The message here is to pay particular attention to data types and casting. If a query is performing badly when it should be much faster, make sure types are compatible across the board. Some languages are particularly strict with regard data type usage, so consider Postgres among them.