PG Phriday: Functions and Addressing JSON Data

Fairly recently, a friend of mine presented a problem he wanted to solve with some JSON he had in a table. After he presented the end result he was trying to reach, I made the assumption that this would be pretty easy to do. But then I looked at the JSON Functions to try and find that quick fix. Though I read extensively and used rather liberal interpretations of the functions, there’s no way to directly manipulate JSON object contents with PostgreSQL.

Wait! Before you start yelling at me for being an idiot, I know what you’re going to say. I thought the same thing… at first. Go ahead and look, though. As of PostgreSQL 9.4, there is no built-in functionality to add or remove JSON elements without one or more intermediate transformation steps through PostgreSQL arrays or records. But that isn’t necessarily unexpected. Why?

Because PostgreSQL is a database. Its primary purpose is to store data and subsequently extract and view it. From this perspective, there’s no reason for PostgreSQL to have an entire library of JSON modification functions or operators. Regardless of this however, actions such as data merges and bulk updates still need to be possible. Yet all other fields allow a single update statement to append information, or otherwise perform a native calculation to replace the value in-line. There must be a way to do this with JSON too, without jumping through too many burning hoops.

Luckily there is, but it does require some preliminary work. Let’s start with a simple JSON document, as seen by PostgreSQL:

SELECT '{"Hairy": true, "Smelly": false}'::JSON;

               json
----------------------------------
 {"Hairy": true, "Smelly": false}

Ok. Now, how would I add an attribute named “Wobbly”? Well, I could pull the data into an external application, add it, and store the result. But suppose this was in a table of millions of records? That’s probably the least efficient way to modify them. This could be parallelized to a certain extent, but that requires a lot of scaffolding code and is way too much work for something so simple.

Instead, let’s create a function to do it for us. We’ve already established that PostgreSQL JSON manipulation is extremely limited, so what other option do we have? Here’s a python function:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$

    if not key:
        return data

    from json import loads, dumps
    js = loads(data)
    js[key] = loads(value)
    return dumps(js)

$$ LANGUAGE plpythonu;

Now we could add the field with ease:

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                    json_update                    
---------------------------------------------------
 {"Hairy": true, "Smelly": false, "Wobbly": false}

And if we want to get really fancy, there’s always PLV8:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$
    if (key)
        data[key] = value;
    return data;

$$ LANGUAGE plv8;

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                 json_update                  
----------------------------------------------
 {"Hairy":true,"Smelly":false,"Wobbly":false}

Though with PLV8, there are a couple of relatively minor caveats.

  1. PLV8 doesn’t work with JSONB yet, which is why all of these examples are in JSON instead.
  2. You might notice that it stripped all the extraneous whitespace, which may not be desirable.

Either way, both of these variants do something that PostgreSQL can’t do on its own. This is one of the reasons PostgreSQL is my favorite database; it’s so easy to extend and enhance.

Just as a thought experiment, which of these functional variants is faster? I didn’t use the IMMUTABLE or STRICT decorators, so it would be easy to run a loop of a few thousand iterations and see what the final run-time is. Here’s a modification of the test query:

EXPLAIN ANALYZE
SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON)
  FROM generate_series(1, 100000);

On my test VM, the python function took around four seconds, while the PLV8 version only needed a second and a half. Clearly PLV8’s native handling of its own datatype helps here, and python having to repeatedly import the json library hurts its own execution. By adding IMMUTABLE, both fly through all 100-thousand iterations in less than 200ms.

Don’t be afraid to stray from SQL when using PostgreSQL. In fact, this might be a good case for thinking about PostgreSQL in an entirely different light. I might start calling it Postgres from now on, simply to honor its roots and its primary functionality. SQL is no longer the Alpha and Omega when it comes to its capabilities these days. So I feel it’s only right to adapt along with it.

Here’s to the future of Postgres!