One of the things Postgres has been “missing” for a while is logical replication based on activity replay. Until fairly recently, in order to replicate single tables from one database to another, we had to encumber the table with performance-robbing triggers coupled to a third party daemon to manage transport. Those days might finally be behind us thanks to pglogical.

But is it easy to use? Let’s give it a try on our trusty sensor_log table. First, we’ll need to create everything on the donor node:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - (s.id || 's')::INTERVAL
  FROM generate_series(1, 1000000) s(id);
 
CREATE EXTENSION pglogical;
 
SELECT pglogical.create_node(
    node_name := 'prod_sensors',
    dsn := 'host=localhost port=5432 dbname=postgres'
);
 
SELECT pglogical.create_replication_set(
    set_name := 'logging',
    replicate_insert := TRUE, replicate_update := FALSE,
    replicate_delete := FALSE, replicate_truncate := FALSE
);
 
SELECT pglogical.replication_set_add_table(
    set_name := 'logging', relation := 'sensor_log', 
    synchronize_data := TRUE
);

After all of that, we have a replication set containing a single table representing one million rows of sensor data. At this point, any number of subscribers could connect to the replication set and request its contents. After that, all inserts would also be replayed on the subscriber once they are detected in the Postgres transaction log.

In a second Postgres instance, we would do this to “consume” the table:

CREATE TABLE sensor_log (
  id            INT PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
CREATE EXTENSION pglogical;
 
SELECT pglogical.create_node(
    node_name := 'sensor_warehouse',
    dsn := 'host=localhost port=5999 dbname=postgres'
);
 
SELECT pglogical.create_subscription(
    subscription_name := 'wh_sensor_data',
    replication_sets := array['logging'],
    provider_dsn := 'host=localhost port=5432 dbname=postgres'
);
 
SELECT pg_sleep(5);
 
SELECT COUNT(*) FROM sensor_log;
 
  COUNT  
---------
 1000000

We actually did a few things here, so let’s break it down. We began by creating the table structure itself. This will act as a container for the incoming data. Then we created the node like we did on the origin, and then subscribed to the provider itself.

We should note that it wasn’t strictly required to create the table beforehand. The create_subscription function has a parameter called synchronize_structure that uses pg_dump to obtain the table DDL during the subscription phase for replay on the subscriber. Unfortunately, this operation does not restrict itself to the tables in the replication set for some reason. As a result, any conflicting objects in the existing schema will cause the table import to fail.

Once we’ve established the subscription, we merely need to wait a few seconds for the initial data copy to complete. This is only one million rows, so we don’t have to wait very long. After that, all subsequent inserts should also show up on this subscriber.

One really cool thing about pglogical is that it takes advantage of Postgres 9.4+ background workers. This means there’s no external daemon sitting around watching a queue, or the transaction logs, or any other source. There’s an actual Postgres backend supervising the Postgres replication stream, and it will capture applicable content for the sensor_log table. The extension becomes the management daemon as if it were a native Postgres feature.

And it gets even better. For those of us that rely on large warehouses that might accumulate data from one or more active production locations for several years, this is something of a golden fleece. Trigger-based replication falls flat here because synchronization means synchronization. If we need a table to just sit and accumulate data, it was ETL, ad-hoc copy scripts, or nothing. But what if we set up our sensor_log table on the subscriber just a bit differently?

SELECT pglogical.drop_subscription(
    subscription_name := 'wh_sensor_data'
);
 
TRUNCATE TABLE sensor_log;
 
CREATE TABLE sensor_log_part (
    LIKE sensor_log INCLUDING INDEXES
) INHERITS (sensor_log);
 
CREATE OR REPLACE FUNCTION f_redirect_sensor_log()
RETURNS TRIGGER 
AS $$
BEGIN
  INSERT INTO sensor_log_part VALUES (NEW.*);
  PERFORM 1;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER t_redirect_sensor_log_bi
BEFORE INSERT ON sensor_log
   FOR EACH ROW
       EXECUTE PROCEDURE f_redirect_sensor_log();
 
ALTER TABLE sensor_log
      ENABLE ALWAYS TRIGGER t_redirect_sensor_log_bi;
 
SELECT pglogical.create_subscription(
    subscription_name := 'wh_sensor_data',
    replication_sets := array['logging'],
    provider_dsn := 'host=localhost port=5432 dbname=postgres'
);
 
SELECT pg_sleep(5);
 
SELECT COUNT(*) FROM ONLY sensor_log;
 
 COUNT 
-------
     0
 
SELECT COUNT(*) FROM sensor_log;
 
  COUNT  
---------
 1000000

This is where the magic really resides. Instead of inserting data into the table itself, we were able to redirect it into a table partition. This is important, because some of the larger Postgres warehouses rely on partitions to distribute maintenance and risk. Instead of one single 4TB table, we might have a dozen 350GB tables.

This proof of concept suggests pglogical is compatible with existing partition systems. And that is important if we’re trying to simplify our stack by removing complex ETL processes. Since we created the replication set to only capture insert activity, we can purge the origin of old data as frequently as we wish, and it will remain in the warehouse indefinitely.

This comes with one caveat, though. Notice that last ALTER TABLE statement where we marked our trigger to always fire? This is necessary because pglogical uses copy mechanisms that otherwise circumvent trigger logic. If we didn’t include that statement, the sync would have dumped all of the rows into the base sensor_log table instead of our target partition.

This last detail raises an important question: are common partition management extensions like pg_partman really compatible with this use case? Automated partition management libraries must regularly modify the underlying trigger to ensure current data is redirected into the appropriate partition. In this case, the answer is highly reliant on how the triggers are redefined.

If partition management systems simply replace the underlying function, all is safe. The trigger definition itself was not dropped and recreated without the ALWAYS modification. This would be the preferred method since it doesn’t require a lock on the table to recreate the trigger definition. Unfortunately this is only an assumption. Using the “correct” approach also doesn’t prevent the extra step of manually marking the triggers as ALWAYS post-creation.

For pg_partman, we’d do something like this after initial partition setup on the warehouse node:

DO $$
DECLARE
    TABLE_NAME VARCHAR;
    trigger_name VARCHAR;
BEGIN
    FOR TABLE_NAME, trigger_name IN
        SELECT tgrelid::regclass::text, tgname
          FROM pg_trigger
         WHERE tgname LIKE '%\_part\_trig'
    LOOP
        EXECUTE 'ALTER TABLE ' || TABLE_NAME || 
                ' ENABLE ALWAYS TRIGGER ' || trigger_name;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

Gross. This also precludes any new tables from being added to the subscription set without this kind of interception. It would be better if popular partition libraries offered a parameter for always enforcing triggers. Or perhaps a more universal approach like some kind of post-definition hook for executing an arbitrary function that could handle advanced use cases. I promise I’m not picking on pg_partman! This applies even to not-so-popular partition extensions like my own tab_tier.

It’s a crazy world out there in extension land. Apparently we have to keep our eyes peeled for exciting new functionality like those pglogical introduces, and any of the ensuing implications for potentially related extensions. Either way, our warehouses are happier for it!

PG Phriday: Perfectly Logical
Tagged on:                         

6 thoughts on “PG Phriday: Perfectly Logical

  • This is awesome! Question: if I wanted to make an eventually consistent ‘view’ table that stored things like aggregations, etc. could I (and should I..) create a logical replication slot back into the same DB?

    Essentially, trying to recreate functionality that an asynchronous trigger might give us.

    1. You probably can, but not with pglogical. So far as I can tell, the table names in the provider and subscriber have to match. So if you somehow managed to get a pglogical node to subscribe to itself for the same subscription set, you’d just end up with a geometric recursive loop. You couldn’t get the inserts redirected into another table within the same database.

      That’s a cool idea, though. I wonder if they would add recipient table name as an option somewhere.

  • Could we do the same thing for updates re partitions? I created a similar trigger to the example above except I partition by month. If the incoming row does not have a matching partition for its date, the trigger will dynamically create one. However, I just realized that my table requires UPDATEs as well. Update to a table that’s been partitioned doesn’t work out of the box with pglogical (tuple not found). Is it possible to write a “trigger before update” to intercept the update, parse the date from the incoming row, and then update the correct partition? I’m going to try it today to see but I’m curious if anyone had been able to work something like this. Pglogical seems to use COPY for INSERT data, but I have no idea what mechanism it uses to replay UPDATEs… I’m hoping that it just replays them as UPDATES but UPDATE on a table that has descendants (partitions) should work…

Comments are closed.