PG Phriday: Getting It Sorted

When it comes to reordering the items in a list, databases have long had a kind of Faustian Bargain to accomplish the task. Nobody really liked any of the more common solutions, least of all the poor database tasked with serving up the inevitable resulting hack.

Postgres is no different in this regard. Consider a list_item table like this, demonstrating five items in a to-do list:

  id   | list_id | item_name  | sort_order 
-------+---------+------------+------------
     1 |       1 | Gotta do 1 |          1
     2 |       1 | Gotta do 2 |          2
     3 |       1 | Gotta do 3 |          3
     4 |       1 | Gotta do 4 |          4
     5 |       1 | Gotta do 5 |          5

Adding an item to the end of the list is trivial. But what if we want to add one to the top? Or the middle? Well that changes everything for the worse! One of the best analyses I’ve seen for this problem came from Joe Nelson in his post on User-defined Order in SQL. It’s a fantastic treatise on the problem, and he proposes a lot of awesome workarounds.

And I hate every last one of them. All of those elegant numeric solutions are as beautiful as they are terrible. It all comes down to the fact our solution must rely on some kind of extension, mathematical approximation, or other imprecise approach. This is Postgres! We shouldn’t have to rely on that kind of thing here!

So what can we do instead? I can’t remember where I saw this, or if I came up with the idea myself, but it’s the true solution to this particular SQL wart. The insight here is that item order should not be an attribute of the item itself, but of the list. Tying item order to other items in the list breaks a fundamental rule of data normalization, as each tuple is now intrinsically linked to other tuples. In theory, we can just use an item attribute such as order_position, but in practice, we must derive and assign, and possibly reorder other items to fill that attribute.

If we change our perspective such that item order is now an attribute of the containing list, things change substantially. Consider these tables:

CREATE TABLE todo_list (
  id          INT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  list_name   TEXT NOT NULL,
  item_order  INT[] NULL
);

CREATE INDEX idx_todo_list_name ON todo_list (list_name);

CREATE TABLE todo_list_item (
  id          INT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  list_id     INT NOT NULL REFERENCES TODO_LIST,
  item_name   TEXT NOT NULL
);

There’s nothing particularly innovative here; it’s just a list table, and the items held in that list. But note the item_order column shown as an array. That’s going to be the key to our solution. It’s declared as an array of integers for this example, to represent the IDs of the list items in the order in which they should appear.

Technically that’s all we need. The code could theoretically maintain that array from this point on, and we’re done. But this is Postgres, and as such, there’s so much more we can do!

First things first, let’s create a trigger to capture all item inserts and deletes and plug those values into the array for the corresponding list. This should ensure that the default order is identical to the item insert order. This also keeps the array in sync in case we delete items later.

CREATE OR REPLACE FUNCTION f_mutate_list_item()
RETURNS TRIGGER AS
$$
BEGIN
  IF TG_OP = 'INSERT' THEN
    UPDATE todo_list
       SET item_order = array_append(item_order, NEW.id)
     WHERE id = NEW.list_id;
  ELSEIF TG_OP = 'DELETE' THEN
    UPDATE todo_list
       SET item_order = array_append(item_order, OLD.id)
     WHERE id = OLD.list_id;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER t_update_list_item_order
 AFTER INSERT OR DELETE
    ON todo_list_item
   FOR EACH ROW EXECUTE FUNCTION f_mutate_list_item();

One trigger is all it takes, and now every list has a default sort order for all of its items. Can you see where this is going? If not, don’t feel bad, because the end result here depends on two features not reflected in every database engine.

Next, let’s populate the tables with a basic subset of data. 1000 lists with five items each should do it:

INSERT INTO todo_list (list_name)
SELECT 'List ' || id FROM generate_series(1, 1000) a(id);

INSERT INTO todo_list_item (list_id, item_name)
SELECT a.id, 'Gotta do ' || b.id
  FROM generate_series(1, 1000) a(id),
       generate_series(1, 5) b(id);

Now we need to pick on one list in particular. Let’s use List #421.

SELECT * FROM todo_list WHERE id = 421;

 id  | list_name |        item_order         
-----+-----------+---------------------------
 421 | List 421  | {421,1421,2421,3421,4421}

SELECT * FROM todo_list_item WHERE list_id = 421;

  id   | list_id | item_name  
-------+---------+------------
   421 |     421 | Gotta do 1
 1,421 |     421 | Gotta do 2
 2,421 |     421 | Gotta do 3
 3,421 |     421 | Gotta do 4
 4,421 |     421 | Gotta do 5

Great! Now let’s take the role of the application. It has fetched the current order of the list items and is displaying them to the user, allowing them to rearrange to their heart’s content. After some time, the user is happy and clicks ‘Submit’. Then the application sends a request like this:

UPDATE todo_list
   SET item_order = ARRAY[1421, 4421, 421, 2421, 3421]
 WHERE list_name = 'List 421';

So now the items are in a new order, which should correspond with an item order of 2, 5, 1, 3, 4. But how do we display them that way? Sure, the application could just fetch all the list items and iterate through the item_order array and simply display the list items in that order. That would actually be a fine solution. But again, this is Postgres! Is there a SQL solution here?

It turns out there is, and it’s deceptively simple. Let’s take it in parts. First of all, many Postgres users know about the unnest() array function for transforming an array into rows. Let’s apply that to our list first and see what happens:

SELECT l.list_name, a.item_id
  FROM todo_list l, unnest(item_order) a (item_id)
 WHERE id = 421;

 list_name | item_id 
-----------+---------
 List 421  |   1,421
 List 421  |   4,421
 List 421  |     421
 List 421  |   2,421
 List 421  |   3,421

That’s interesting. Now we’ve got our items in the exact order they should appear. Will that help us with our join somehow? Yes and no. Yes, because in many cases, a join using the values here will sort “properly” in the order shown, but not always. It’s incredibly fragile when combined with joins, sorts, and other compounding operations.

That’s where the WITH ORDINALITY table expression comes into play. When a function returns multiple rows, it can enumerate them. Let’s see how that works:

SELECT l.list_name, a.item_id, a.sort_order
  FROM todo_list l, unnest(item_order) WITH ORDINALITY a (item_id, sort_order)
 WHERE id = 421;

 list_name | item_id | sort_order 
-----------+---------+------------
 List 421  |   1,421 |          1
 List 421  |   4,421 |          2
 List 421  |     421 |          3
 List 421  |   2,421 |          4
 List 421  |   3,421 |          5

While the IDs themselves aren’t of much use to us, the simulated ordinal column is exceptionally handy. We’ve now translated the array into the contained IDs and the intended order with a single statement. Now all we have to do is put everything together to get our list items in the intended order:

SELECT l.list_name, i.item_name, a.sort_order
  FROM todo_list l
 CROSS JOIN unnest(item_order) WITH ORDINALITY a (item_id, sort_order)
  JOIN todo_list_item i on (i.id = a.item_id AND i.list_id = l.id)
 WHERE l.list_name = 'List 421'
 ORDER BY a.sort_order;

 list_name | item_name  | sort_order 
-----------+------------+------------
 List 421  | Gotta do 2 |          1
 List 421  | Gotta do 5 |          2
 List 421  | Gotta do 1 |          3
 List 421  | Gotta do 3 |          4
 List 421  | Gotta do 4 |          5

Now our application merely needs to fetch and iterate through the rows as they’re returned. No special lookups or separate row fetches necessary. From this point forward, a single update to the item_order column for any list will be enough to reorder every item in the list. We don’t need to know the position of the other items. We don’t need special mathematical tricks. We don’t need new column types to store real fractions. We don’t need linked lists, or any other trickery.

Just. One. Update. For. Every. Item.

That? That’s magic right there. All of the previous methods only solved for inserting an item into an arbitrary position. This technique allows us to reorder the entire list all at once. If our list is 1000 items and the user scrambled them all into an arbitrary order, that’s no longer 1000 updates to assign the new positions.

In the business, we call this Strategic Denormalization, and it’s an incredibly powerful technique. Keep it in your toolbelt for later, because you may just find a scenario where it’ll deliver huge dividends. As always, Postgres takes it to the next level.

Hope you have a happy Phriday!