Fairly recently, a friend of mine presented a problem he wanted to solve with some JSON he had in a table. After he presented the end result he was trying to reach, I made the assumption that this would be pretty easy to do. But then I looked at the JSON Functions to try and find that quick fix. Though I read extensively and used rather liberal interpretations of the functions, there’s no way to directly manipulate JSON object contents with PostgreSQL.

Wait! Before you start yelling at me for being an idiot, I know what you’re going to say. I thought the same thing… at first. Go ahead and look, though. As of PostgreSQL 9.4, there is no built-in functionality to add or remove JSON elements without one or more intermediate transformation steps through PostgreSQL arrays or records. But that isn’t necessarily unexpected. Why?

Because PostgreSQL is a database. Its primary purpose is to store data and subsequently extract and view it. From this perspective, there’s no reason for PostgreSQL to have an entire library of JSON modification functions or operators. Regardless of this however, actions such as data merges and bulk updates still need to be possible. Yet all other fields allow a single update statement to append information, or otherwise perform a native calculation to replace the value in-line. There must be a way to do this with JSON too, without jumping through too many burning hoops.

Luckily there is, but it does require some preliminary work. Let’s start with a simple JSON document, as seen by PostgreSQL:

SELECT '{"Hairy": true, "Smelly": false}'::JSON;

               json
----------------------------------
 {"Hairy": true, "Smelly": false}

Ok. Now, how would I add an attribute named “Wobbly”? Well, I could pull the data into an external application, add it, and store the result. But suppose this was in a table of millions of records? That’s probably the least efficient way to modify them. This could be parallelized to a certain extent, but that requires a lot of scaffolding code and is way too much work for something so simple.

Instead, let’s create a function to do it for us. We’ve already established that PostgreSQL JSON manipulation is extremely limited, so what other option do we have? Here’s a python function:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$

    if not key:
        return data

    from json import loads, dumps
    js = loads(data)
    js[key] = loads(value)
    return dumps(js)

$$ LANGUAGE plpythonu;

Now we could add the field with ease:

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                    json_update                    
---------------------------------------------------
 {"Hairy": true, "Smelly": false, "Wobbly": false}

And if we want to get really fancy, there’s always PLV8:

CREATE or REPLACE FUNCTION json_update(data JSON, key TEXT, value JSON)
RETURNS JSON
AS $$
    if (key)
        data[key] = value;
    return data;

$$ LANGUAGE plv8;

SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON);

                 json_update                  
----------------------------------------------
 {"Hairy":true,"Smelly":false,"Wobbly":false}

Though with PLV8, there are a couple of relatively minor caveats.

  1. PLV8 doesn’t work with JSONB yet, which is why all of these examples are in JSON instead.
  2. You might notice that it stripped all the extraneous whitespace, which may not be desirable.

Either way, both of these variants do something that PostgreSQL can’t do on its own. This is one of the reasons PostgreSQL is my favorite database; it’s so easy to extend and enhance.

Just as a thought experiment, which of these functional variants is faster? I didn’t use the IMMUTABLE or STRICT decorators, so it would be easy to run a loop of a few thousand iterations and see what the final run-time is. Here’s a modification of the test query:

EXPLAIN ANALYZE
SELECT json_update('{"Hairy": true, "Smelly": false}'::JSON,
       'Wobbly', 'false'::JSON)
  FROM generate_series(1, 100000);

On my test VM, the python function took around four seconds, while the PLV8 version only needed a second and a half. Clearly PLV8’s native handling of its own datatype helps here, and python having to repeatedly import the json library hurts its own execution. By adding IMMUTABLE, both fly through all 100-thousand iterations in less than 200ms.

Don’t be afraid to stray from SQL when using PostgreSQL. In fact, this might be a good case for thinking about PostgreSQL in an entirely different light. I might start calling it PGDB from now on, simply to honor its roots and its primary functionality. SQL is no longer the Alpha and Omega when it comes to its capabilities these days. So I feel it’s only right to adapt along with it.

Here’s to the future of PGDB!

PG Phriday: Functions and Addressing JSON Data
Tagged on:                 

6 thoughts on “PG Phriday: Functions and Addressing JSON Data

  • The current mainline PLV8 is based on a quite old legacy V8 version which is out of support for a long time and the PostgreSQL side of the implementation does not support jsonb.

    Unfortunately plv8 development seems currently stalled and not very responsive towards community contributions — the pull request on github containing modifications to support jsonb and an up-to-date V8 version including ES6 features is lying around uncommented and ignored now for over three weeks…

    https://github.com/plv8/plv8/pull/115

    IMHO a more vivid PLV8 extension would help PostgreSQL quite a bit for not-only-SQL and jsonb use cases. The processing inside V8 is really fast, while converting to and from PostgreSQL json and jsonb types introduces some overhead. So complex modifications of json or jsonb data could benefit from a V8 procedure in the database, while simpler tasks on a large number of tuples would be more efficient using native psql json functions.

    Another opportunity using PLV8 is the ability to share logic, for example validations, between the Web client and the server side implementation.

    1. That’s really a shame about PLV8. That kinda makes PostgreSQL look bad. It sounds like someone should fork the repo and effectively take it away from the unresponsive dev(s).

      1. To fork the repo is what’s actually required to contribute and create a pull request.

        Given the potential of PLV8, the whole project and the user base is surprisingly calm. See the low traffic at the google groups forum (https://groups.google.com/forum/#!forum/plv8js) — or the few sporadic posts in pgsql-hackers mailing list.

        I think what’s actually missing are more users, perhaps also more maintainers of some binary packages to make the whole thing more accessible to a pg user. Especially the V8 compilation is quite involved and inconvenient compared to the simple configure/make/make install procedure required for pg itself.

        So I would not be that hard with the devs, PLV8 just needs more momentum. Advocacy blog posts, like this one here, may help to point some more json(b) users in the right direction. 😉

  • The current options for manipulating json values in postgres are definitely awkward, but when dealing with immutable data the some sort of intermediate transformation step will always be necessary. In your JS/Python functions that intermediate transformation step is the mutable JS/Python representation.

    A pure sql alternative to your function extended to work with two objects rather than a single key/value pair is:

    CREATE or REPLACE FUNCTION json_merge(obj1 JSON, obj2 JSON)
    RETURNS JSON
    AS $$
        SELECT json_object_agg(deduped.key, deduped.value) FROM (
            SELECT DISTINCT ON (key) key, value FROM (
                SELECT key, value, 1 as precedence FROM json_each(obj1) UNION ALL
                SELECT key, value, 2 as precedence FROM json_each(obj2)
            ) as combined ORDER  BY key, precedence DESC
        ) as deduped;
    $$ LANGUAGE sql IMMUTABLE;
    

    Example usage:

    postgres=# select json_merge('{"Hairy": true, "Smelly": false}'::JSON, '{"Wobble": false}'::JSON);
                          json_merge
    { "Hairy" : true, "Smelly" : false, "Wobble" : false }
    (1 row)
    

    In 9.5 there will be jsonb_object_agg so a jsonb version becomes possible too.

    Ideally I’d like to see out of the box functions for a variadic version of json_merge and a json_remove to return an object without some keys.

    1. Yeah, sadly I’m not familiar enough with the built-in JSON functions because they’re so (to me) weird. Though looking at your quick solution, I’m a bit embarrassed I didn’t try that. 🙂 Still though, just for the sake of usability, PGDB does need these functions. I’ve said in past posts that devs are not DBAs, and by not having these functions, we’re encouraging them to come up with (usually worse) workarounds.

      Thanks for the SQL version of the functions!

  • For purists, I’d rather suggest the following, that can be reused and does not need to store a function:

    UPDATE "my_table" SET "my_column" = ( SELECT concat('{', string_agg(to_json("key") || ':' || "value", ','), '}') FROM (

        -- all the properties (key:value) from my_column that have not Wobbly as key
        SELECT *
        FROM json_each("my_column")
        WHERE "key" <> 'Wobbly'

    UNION ALL
    
    -- will overwrite Woobly if existed before
    SELECT 'Wobbly', false
    

    ) AS "fields"

    )::json

Comments are closed.