PG Phriday: Moving to 9.5

Friday June 10, 2016

Page content

There comes a day in every young database’s life that it’s time to move on. I’m sorry 9.4, but the day has come that we must say goodbye. It’s not like we haven’t had our good times. While I truly appreciate everything you’ve done for me, we must part ways. I’m far too needy, and I can’t demand so much of you in good conscience. May your future patches make you and your other suitors happy!

In all seriousness, Postgres 9.5 has been out since January 7th. Despite that, I’ve never really been an advocate of deploying dot-0 versions in production environments. Since the Postgres dev team is notoriously fanatically responsible, this is probably excessively paranoid. Still, I persist that some day, my certifiable anxiety regarding fresh releases will pay off. Until then, it’s time to examine a few reasons to start upgrading the 120+ Postgres instances littering our server fabric. Especially now that 9.5.3 is out, making it practically rock solid in terms of field testing.

JSONBetter

A common complaint about JSON and JSONB up to 9.4, was that there was no easy way to modify existing JSON objects. Adding fields was an exercise at text conversion, unpacking, concatenation, and re-casting. It was a mess. As a consequence, I recommended using it as a mere container column, or relying on PL/V8 or PL/Python to actually manipulate JSON data.

That’s no longer the case with 9.5. Not only are several standard operators overloaded to support JSONB, but they introduced the jsonb_set function for more direct manipulation. I covered this more extensively in the past, but here are a couple quick examples for reference:

-- Concatenation

SELECT '{"Hairy": true}'::JSONB || '{"Wobbly": false}'::JSONB;
 
             ?COLUMN?             
----------------------------------
 {"Hairy": TRUE, "Wobbly": FALSE}

-- Subtraction 

SELECT '{"Hairy": true, "Excited": false}'::JSONB - 'Excited';
 
    ?COLUMN?     
-----------------
 {"Hairy": TRUE}

-- Explicit jsonb_set

SELECT jsonb_set(
  '{"Hairy": true, "Status": {"Burning": true}}'::JSONB,
  '{Status,Burning}',
  'false'
);

                   jsonb_set                   
-----------------------------------------------
 {"Hairy": true, "Status": {"Burning": false}}

This is no less an exciting advancement than it was back in December, and could be a singular reason to upgrade depending on JSON usage saturation.

INSERT += UPDATE

One feature the MySQL camp has lorded over Postgres is the REPLACE INTO syntax. For those who don’t already know, it operates like an INSERT unless it finds an existing row for that primary key, upon which it acts more like an UPDATE. This has been a… point of contention for more than a decade.

I could find similar links until the end of time, but stopped at 2002 because this article needs to end eventually. Nary a month has gone by without someone asking about it in the Postgres mailing lists, being dissatisfied with the answer, and going back to MySQL. The only equivalent we had was a convoluted function that made use of exceptions and a loop in order to avoid a race condition. No more. Now, this is trivial:

CREATE TABLE upsert_me (
  id     INT PRIMARY KEY,
  stuff  TEXT
);

INSERT INTO upsert_me VALUES (1, 'Woo!');
INSERT INTO upsert_me VALUES (1, 'Woohoo!');

ERROR:  duplicate key value violates unique constraint "upsert_me_pkey"

INSERT INTO upsert_me VALUES (1, 'Woohoo!')
    ON CONFLICT (id) DO UPDATE
   SET stuff = EXCLUDED.stuff;

SELECT * FROM upsert_me;

 id |  stuff  
----+---------
  1 | Woohoo!

The Postgres implementation is actually safer, given the explicit control over when the replacement actually happens, under what criteria, and which columns are included. This kind of solution is exactly what the community has come to expect; the Postgres dev team never simply adds a new feature without careful and meticulous consideration. We get the same functionality, with a far more flexible and less fragile implementation.

Block Range Indexes

Otherwise known as a BRIN index, this type of indexing is geared toward VLDB installations with gargantuan tables that are generally INSERT only. They work by storing a page offset followed by a range of values represented there. This is extremely lossy, but for large sequentially organized tables, can help Postgres limit page scans to certain table regions immediately. Unlike a regular BTREE index which might require a random read per match, a BRIN index will match several potentially unnecessary pages and read them all. The primary difference here is between a random or a sequential read operation.

In the end, this requires more filtering to remove unwanted matches for values pulled in from the whole range. Yet this can be far cheaper in aggregate depending on the granularity of the ranges. Let’s try an example with 10M rows:

CREATE TABLE sensor_log (
  id            INT PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (id, location, reading, reading_date)
SELECT s.id, s.id % 1000, s.id % 100,
       '2016-12-31'::DATE - (s.id || 's')::INTERVAL
  FROM generate_series(1, 10000000) s(id);

CREATE INDEX idx_sensor_log_date_brin
    ON sensor_log USING BRIN (reading_date);

ANALYZE sensor_log;

\timing on

SELECT count(*)
  FROM sensor_log
 WHERE reading_date >= '2016-12-01'
   AND reading_date < '2016-12-02';

Time: 20.805 ms

How does the sample query compare to using a regular BTREE index? Tests put the performance at about 10% faster when all results are in memory, otherwise the BRIN index was 3-5x faster. But there’s another benefit that isn’t quite as obvious. Due to the lossy nature of BRIN structure, these indexes are also orders of magnitude smaller. Let’s compare just how much:

CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

SELECT relname, pg_size_pretty(pg_relation_size(oid))
  FROM pg_class
 WHERE relname LIKE '%sensor_log%';

         relname          | pg_size_pretty 
--------------------------+----------------
 sensor_log               | 498 MB
 sensor_log_pkey          | 214 MB
 idx_sensor_log_date_brin | 32 kB
 idx_sensor_log_date      | 214 MB

This means an index that’s nearly 7000x larger is only 10% faster for this particular case under ideal circumstances. On a moderately busy warehouse system that mostly sees reporting queries spanning segments of multiple TB-size tables that would never fit in memory, the BRIN index would actually be a better use of resources. In cases where the query planner would take random page cost into account for a large result set and revert to a sequential scan, this is even more relevant. We may end up reading a small fraction of a vast table instead of the entire thing.

Thus BRIN indexes definitely have a niche role, but given the number of larger enterprises making use of Postgres for VLDB installations, that niche might be larger than we think.

Moving On

In the end, there are quite a few more features I will probably cover in more depth later. The Wiki is more than complete in addressing them, and includes cases I glossed over or skipped entirely. These are the ones I wanted to highlight due to their potential to directly impact instances I manage, or empower end-users under my purview. Any one of them would be an excellent justification to upgrade to 9.5, and taken together, they practically demand it.

Regardless, this is old news to anyone drooling over 9.6, including myself. However, following my own standards for not immediately deploying new versions, it’ll be a while before 9.6 will reach our production systems. To that end, I can rest easy knowing I’ll be writing a similar article about 9.5 when 9.6 reaches maturity. I’m glad that’s the one constant I’ve observed regarding Postgres over the years: every single upgrade is worth the trouble, often to a hilarious degree.

That’s not something you can always say in the software world, and I appreciate it.

JSONBetter

INSERT += UPDATE

Block Range Indexes

Moving On

See Also