PG Phriday: Replacing Business Logic

Friday February 19, 2016

Back in 2005, I started a new job with a company that did work for other businesses. Their data model was designed by developers and they had no DBA, as is pretty common in smaller organizations. A critical part of our main application relied on an event log that captured customer activity and relayed their final decision to the client for reconciliation. One day someone noticed that there was a bug in the system that resolved the final decision from the event stack, and panic ensued. I put a lot of thought into it that night and had a fix for it the next morning. As usual, my approach relied on Postgres, and carried a fun lesson.

As it turns out, some business logic belongs in the database. Imagine an existing data model with business logic that must adhere to these criteria:

All front-end events are captured to an event log table.
Depending on customer activity, an event chain can be initiated.
Within this event chain, the final decision must be captured.
Decisions are ranked based on client pay scale for customer progress through the interface.

Nothing proprietary or complicated—just a basic ranking algorithm. Unfortunately in this case, whatever system maintained that ranking had some subtle flaw we needed to fix, and that included past data for the current billing cycle. We needed something that would fix the decision tree, and then prevent it from getting broken again in the future.

We can represent the data with a pretty simple model:

CREATE TABLE event_type
(
  event_type_id  SERIAL PRIMARY KEY,
  event_name     VARCHAR NOT NULL,
  score          INT NOT NULL,
  can_init       BOOLEAN NOT NULL DEFAULT FALSE  
);

CREATE TABLE event_log
(
  event_id       SERIAL PRIMARY KEY,
  session_id     UUID NOT NULL,
  event_type_id  INT NOT NULL REFERENCES event_type (event_type_id),
  event_data     VARCHAR NOT NULL,
  created_dt     TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);

CREATE TABLE decision_log
(
  session_id     UUID PRIMARY KEY,
  init_event_id  INT NOT NULL REFERENCES event_log (event_id),
  decision_id    INT NULL REFERENCES event_log (event_id)
);

INSERT INTO event_type (event_name, score, can_init)
VALUES ('start', 0, TRUE);

INSERT INTO event_type (event_name, score)
VALUES ('pending', 10), ('order', 20), ('cancel', 30);

The business logic requires something more, however. To a DBA, it sounds like a job for at least one trigger. But how do we organize them, and how should they fire? Going back to the criteria, we know that all events are captured to the event_log table, so it needs a trigger. Since it’s a log of events, there are no updates or deletes. If we make the assumption that decision_log can receive inserts from another vector, it should have a separate trigger to control ranking.

So we need one trigger after inserts on event_log and one before updates on decision_log. Here’s what that might resemble:

CREATE OR REPLACE FUNCTION sp_capture_event()
RETURNS TRIGGER AS
$$
BEGIN
    PERFORM event_type_id FROM event_type
      WHERE event_type_id = NEW.event_type_id
        AND can_init;

    IF FOUND THEN
        INSERT INTO decision_log (session_id, init_event_id)
        VALUES (NEW.session_id, NEW.event_id);
    ELSE
        UPDATE decision_log
           SET decision_id = NEW.event_id
         WHERE session_id = NEW.session_id;
    END IF;

    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER t_capture_events_a_i
 AFTER INSERT ON event_log
   FOR EACH ROW EXECUTE PROCEDURE sp_capture_event();

CREATE OR REPLACE FUNCTION sp_update_decision()
RETURNS TRIGGER AS
$$
BEGIN
    IF OLD.decision_id IS NULL THEN
        RETURN NEW;
    END IF;

    PERFORM n.event_id
       FROM event_log o
       JOIN event_type ot USING (event_type_id),
            event_log n
       JOIN event_type nt USING (event_type_id)
      WHERE o.event_id = OLD.decision_id
        AND n.event_id = NEW.decision_id
        AND nt.score > ot.score;

    IF FOUND THEN
        RETURN NEW;
    END IF;
    
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER t_control_decisions_b_u
BEFORE UPDATE ON decision_log
   FOR EACH ROW EXECUTE PROCEDURE sp_update_decision();

The trigger on event_log can issue an INSERT or UPDATE based on event type since only some event types can initiate a new chain, and customers can only have one chain per session. It’s job is a very simple one. The trigger on decision_log isn’t actually very complex either, and there were several ways we could have implemented it. The query shown basically just checks if the new event outranks the old one, and only allows the update when it does.

We could have independently selected the ranks into variables and compared with an IF statement, or used DISTINCT ON to return the top-ranked event, and only allow the update if it was the same event, and so on. It’s been over ten years, so I have no idea how I did it back then, but the solution above relies on the assumption we only need to fetch information for two events. Using an IF statement would mean variables and the associated overhead, and the DISTINCT ON approach could theoretically match tens or hundreds of rows before producing the final result.

In any case, we need to test this arrangement. Let’s use the pgcrypto extension, the Postgres 9.2+ ability to set arbitrary variables, and a prepared statement to create some event traffic:

CREATE EXTENSION pgcrypto;
SELECT set_config('local.sess', gen_random_uuid()::text, false);

PREPARE event_ins(INT, VARCHAR) AS
INSERT INTO event_log (session_id, event_type_id, event_data)
SELECT current_setting('local.sess')::UUID, $1, $2;

EXECUTE event_ins(1, 'starting up...');
EXECUTE event_ins(2, 'ordering stuff...');
EXECUTE event_ins(3, 'Ordered!');

SELECT init_event_id, decision_id
  FROM decision_log;

 init_event_id | decision_id 
---------------+-------------
             1 |           3

Not bad. By this point, we’ve created a few basic events in the event log, and the decision log appears to be working as intended. But we have one more test to verify: will it reject outranked events? That’s important, because that’s what got is into trouble in the first place. If we can prove the trigger will directly reject bad changes, the integrity of the decision tree is preserved in a way the application can’t circumvent.

Let’s see what happens if the customer tries to screw up the process:

EXECUTE event_ins(2, 'No... wait...');

SELECT init_event_id, decision_id
  FROM decision_log;

 init_event_id | decision_id 
---------------+-------------
             1 |           3

EXECUTE event_ins(4, 'I changed my mind.');

SELECT init_event_id, decision_id
  FROM decision_log;

 init_event_id | decision_id 
---------------+-------------
             1 |           5

Pending events have less priority than order events, so the decision tree did not change when the customer went back in the chain. The events are still logged and stamped so we can see the entire session, but the final state which the client pays for is still preserved. In this particular instance, the customer decided they didn’t want whatever service was offered, and the client got a free pass.

That solved the conundrum of preventing further corruption of our data, but we still had to fix the existing billing cycle. Luckily, the DISTINCT ON approach that isn’t quite correct for the trigger, works great on rebuilding the ideal rows. A fix would go something like this:

WITH ranked AS (
    SELECT DISTINCT ON (session_id)
           e.session_id, e.event_id AS decision_id
      FROM event_log e
      JOIN event_type t USING (event_type_id)
     WHERE e.created_dt >= date_trunc('month', now())
       AND NOT t.can_init
     ORDER BY e.session_id, t.score DESC
)
UPDATE decision_log d
   SET decision_id = r.decision_id
  FROM ranked r
 WHERE d.session_id = r.session_id
   AND d.decision_id != r.decision_id;

That single update statement finds all sessions for the current billing cycle and gets the proper ranked decision event. Then it only changes the decisions that were wrong due to the bug. Even if that fix contains millions of rows, it’s a one-time operation. Our trigger ensures only valid entries will make it into the decision table from now on, thus ending the chaos. The only remaining task was to retire the code that previously maintained the decision stack.

I came up with that fix soon after starting at the company, and it remained in place until at least the day I left almost five years later. It wasn’t a particularly difficult solution, and it would be trivial to argue that more complicated processes should be implemented at the application level instead. But for simple integrity-enforcing rules, some business logic works best when it can’t be circumvented, even accidentally.

In the end, as basic as this approach was, it uses several Postgres features and SQL syntax. Triggers, functions, DISTINCT ON, and UPDATE ... FROM are only the beginning. Being able to use all available tools greatly enhances developer agility and understanding, as well as the quality of the final result. So don’t shy away from putting some code in the database; sometimes, that’s where it belongs.

See Also