PG Phriday: Extended Elections

Friday November 25, 2016

One of the best features Postgres boasts is the ability to adapt. Any schmo off the street can write an extension and bolt it onto Postgres with nary a second glance. As proof, I’m going to whip one up really quick. That should be enough to convince anyone that it takes no skill at all to add functionality to Postgres.

Just so our extension actually does something, let’s start off with the instant-runoff code we wrote a few weeks ago. Except this time, we also want to transform it into a kind of utility to justify the fact it’s an extension. If it’s just a schema, a few tables and associated functions, it’s just another application.

Let’s start with the preamble in our SQL file:

\echo Use "CREATE EXTENSION irv;" to load this file. \quit

SET client_min_messages = warning;

--------------------------------------
-- CREATE EXTENSION USER
--------------------------------------

DO $$
BEGIN
  PERFORM 1
    FROM pg_roles
   WHERE rolname = 'irv_role';

  IF NOT FOUND THEN
    EXECUTE 'CREATE ROLE irv_role';
    EXECUTE 'GRANT USAGE ON SCHEMA @extschema@ TO irv_role';
  END IF;
END;
$$ LANGUAGE plpgsql;

It’s best to begin with a line that prevents the extension from being executed as a simple SQL script. After that, we set the extension to report any warnings during installation.

The giant DO block is an old trick some extension authors use. When an extension is installed, it is generally owned only by the user who installed it. In most cases, this is a superuser of some kind, and that means nobody else can use the extension. Here, we’re creating a role that will have all necessary grants to call extension functions or view table contents. Then we can just grant the role to any users that should have access to the extension.

Next we should create our tables:


CREATE TABLE election
(
  election_id    SERIAL PRIMARY KEY,
  election_name  VARCHAR NOT NULL,
  schema_name    VARCHAR NOT NULL UNIQUE
);

CREATE INDEX idx_election_election_name
    ON election (election_name);

SELECT pg_catalog.pg_extension_config_dump('election', '');

CREATE TABLE candidate
(
  candidate_id    SERIAL PRIMARY KEY,
  election_id     INT NOT NULL REFERENCES election,
  candidate_name  VARCHAR NOT NULL
);

CREATE INDEX idx_candidate_candidate_name
    ON candidate (candidate_name);

SELECT pg_catalog.pg_extension_config_dump('candidate', '');

CREATE TABLE vote
(
  election_id     INT NOT NULL REFERENCES election,
  candidate_list  INT[] NOT NULL
);

CREATE INDEX idx_vote_election_id
    ON vote (election_id);

CREATE TABLE scratch
(
  candidate_name  VARCHAR,
  votes           BIGINT,
  percent         NUMERIC,
  my_round        INT
);

By and large, these are the same as before. But notice that we’re invoking pg_extension_config_dump after two of the tables. This is because the election and candidate tables contain election metadata, and we want pg_dump to retain that information. Normally extensions don’t contain user data, so the presumption is that recreating the extension during database population is sufficient. Well, if we want to dump and then restore our database contents and retain the elections and associated candidates, we need to register those tables with Postgres.

But what about the vote table? This is where our extension earns its name. The thing about elections is that there are a lot of them, and millions of people vote. To prevent having a gigantic vote table, we’ve elected to “shard” the election vote results into separate tables in a schema named after the election itself. This means the vote table here is just scaffolding to help define the table which gets created by the extension.

Now let’s talk about how elections are started:

CREATE OR REPLACE FUNCTION start_election(
  elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR := substring(
    regexp_replace(elect, '\W', '', 'g'),
    1, 32
  );
BEGIN
  INSERT INTO @extschema@.election (election_name, schema_name)
  VALUES (elect, use_schema);

  EXECUTE 'CREATE SCHEMA ' || use_schema;

  PERFORM set_config('search_path', use_schema, true);

  CREATE TABLE irv_vote (
    vote_id BIGSERIAL PRIMARY KEY,
    LIKE @extschema@.vote INCLUDING ALL
  );
END;
$$ LANGUAGE PLPGSQL;

We start by removing all non-word characters from the election name, and then extracting the first 32 characters to create a valid schema name. Then we capture the election information as we did before. After that, we create the schema that will contain our votes, and the vote table itself. We’ve prefixed the vote table with “irv” to note that the extension created it.

We’ve also set the search_path to default to the new schema we just created. This way, we don’t need to build a SQL statement string and use EXECUTE to invoke it. There’s also the @extschema@ syntax. Some extensions allow users to specify a schema during installation. By using this syntax, Postgres will substitute @extschema@ for the schema where extension objects will be stored. Normally the schema is set by the extension author.

The rest of the functions have similar modifications; a preamble to set the vote schema, and @extschema@ replacements where necessary. Here’s what they look like:

CREATE OR REPLACE FUNCTION no_losers(orig INT[], remove INT[])
RETURNS INT[] AS
$$
DECLARE
  item INT; 
  ret_arr INT[] := orig;
BEGIN
  IF array_length(remove, 1) IS NULL THEN
    RETURN ret_arr;
  END IF;

  FOR item IN 1 .. array_length(remove, 1) LOOP
    ret_arr := array_remove(ret_arr, remove[item]);
  END LOOP;

  RETURN ret_arr;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION register_candidate(
  elect VARCHAR,
  cand VARCHAR
)
RETURNS VOID AS
$$
  INSERT INTO @extschema@.candidate (election_id, candidate_name)
  SELECT election_id, cand
    FROM @extschema@.election
   WHERE election_name = elect;
$$ LANGUAGE SQL;

CREATE OR REPLACE FUNCTION register_vote(
  elect VARCHAR,
  candidates VARCHAR[]
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR;
BEGIN
  SELECT schema_name INTO use_schema
    FROM @extschema@.election
   WHERE election_name = elect;

  PERFORM set_config('search_path', use_schema, true);

  WITH ranked AS (
    SELECT candidate_name, row_number() OVER () AS rank
      FROM unnest(candidates) candidate_name
  ),
  conv AS (
    SELECT c.election_id, c.candidate_id, r.rank
      FROM @extschema@.candidate c
      JOIN ranked r USING (candidate_name)
     ORDER BY rank
  )
  INSERT INTO irv_vote (election_id, candidate_list)
  SELECT election_id, array_agg(candidate_id)
    FROM conv
   GROUP BY 1;
END;
$$ LANGUAGE PLPGSQL;

CREATE OR REPLACE FUNCTION tabulate_votes(
  elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR;
  num_candidates INT;
  num_votes NUMERIC;
  vote_round INT;
  losers INT[] := '{}'::INT[]; -- Non-null empty array.
  winner VARCHAR;
  results RECORD;
BEGIN
  SELECT schema_name INTO use_schema
    FROM @extschema@.election
   WHERE election_name = elect;

  PERFORM set_config('search_path', use_schema, true);

  DROP TABLE IF EXISTS working_round;
  CREATE TABLE working_round (LIKE @extschema@.scratch);

  -- Given the number of candidates in this election, we can
  -- reduce calculations.

  SELECT count(*) INTO num_candidates
    FROM @extschema@.candidate c
    JOIN @extschema@.election e USING (election_id)
   WHERE e.election_name = elect;

  -- We need the total number of votes cast for display
  -- percentages.

  SELECT count(*) INTO num_votes
    FROM irv_vote v
    JOIN @extschema@.election e USING (election_id)
   WHERE e.election_name = elect;

  -- Record each round of the IRV process. At the end of each round,
  -- eliminate the worst performer and try again. Do this until we
  -- reach > 50% for one candidate.

  FOR vote_round IN 1 .. (num_candidates - 1) LOOP
    RAISE NOTICE '== Round % ==', vote_round;

    INSERT INTO working_round
    SELECT c.candidate_name, count(v.vote_id) AS votes,
           round(count(v.vote_id) / 
                 num_votes * 100.0, 2) AS percent,
           vote_round AS my_round
      FROM irv_vote v
      JOIN @extschema@.candidate c ON (
             c.candidate_id = (
               @extschema@.no_losers(v.candidate_list, losers)
             )[1]
           )
      JOIN @extschema@.election e ON (e.election_id = v.election_id)
     WHERE e.election_name = elect
     GROUP BY c.candidate_name;

    -- Print the results of the round for spot-checking.

    FOR results IN
        SELECT * FROM working_round
         WHERE my_round = vote_round
    LOOP
      RAISE NOTICE '%: % (%)', results.candidate_name,
        results.votes, results.percent;
    END LOOP;

    -- If this round has a winner, short-circuit so we can
    -- just report the results.

    SELECT candidate_name INTO winner
      FROM working_round
     WHERE percent > 50
       AND my_round = vote_round;

    IF winner IS NOT NULL THEN
      RAISE NOTICE 'Winner of % is %!', elect, winner;
      EXIT;
    END IF;

    -- This is where we determine the loser of this round.
    -- It's just the lowest ranked result.

    SELECT array_append(losers, c.candidate_id) INTO losers
      FROM working_round w
      JOIN @extschema@.candidate c USING (candidate_name)
     WHERE my_round = vote_round
     ORDER BY w.votes
     LIMIT 1;

  END LOOP;

END;
$$ LANGUAGE PLPGSQL;

But how to install it? First we need a file named irv.control that will contain some parameters that tell Postgres how to install and identify the extension. It looks something like this:

comment = 'Extension for managing instant-runoff elections.'
default_version = '1.0'
relocatable = false
schema = irv

Don’t worry about that relocatable line. That just means we don’t want the extension to be moved with ALTER EXTENSION after initial installation. Our schema contains data instead of simply a collection of utility functions, so it’s not good practice to relocate willy nilly.

All Postgres expects is that this control file exists along with the SQL file in its configured extension directory. The easiest way to find this is to install the Postgres dev libraries so we have access to pg_config. Extensions are normally found in the extension subdirectory of the location reported by pg_config --sharedir.

Then we just need to create our SQL file there, along with the control file. Postgres does have a naming scheme we need to follow, however. We need to take the extension name and version, and separate them by a double dash. So in our case, the SQL file should be named irv--1.0.sql.

Of course, we can’t forget the grants we spoke about in the beginning:

REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ FROM PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA @extschema@ TO tab_tier_role;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ TO tab_tier_role;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA @extschema@ TO tab_tier_role;

If we combine all of the SQL from this article, and save it as irv--1.0.sql along with the specified control file, we’ve successfully written an extension. But one question remains: does it work?

Well, let’s see! Votes have been coming in for weeks now, so we can even use the new totals under the same assumptions as before.

CREATE EXTENSION irv;

SELECT irv.start_election('Pres 2016');

SELECT irv.register_candidate('Pres 2016', 'Clinton');
SELECT irv.register_candidate('Pres 2016', 'Trump');
SELECT irv.register_candidate('Pres 2016', 'Stein');
SELECT irv.register_candidate('Pres 2016', 'Johnson');

SELECT irv.register_vote('Pres 2016', '{Clinton, Stein}')
  FROM generate_series(1, 64433);

SELECT irv.register_vote('Pres 2016', '{Stein, Clinton}')
  FROM generate_series(1, 1395);

SELECT irv.register_vote('Pres 2016', '{Trump, Johnson}')
  FROM generate_series(1, 62337);

SELECT irv.register_vote('Pres 2016', '{Johnson, Trump}')
  FROM generate_series(1, 4418);

SELECT irv.tabulate_votes('Pres 2016');

NOTICE:  == Round 1 ==
NOTICE:  Stein: 1395 (1.05)
NOTICE:  Trump: 62337 (47.02)
NOTICE:  Clinton: 64433 (48.60)
NOTICE:  Johnson: 4418 (3.33)
NOTICE:  == Round 2 ==
NOTICE:  Trump: 62337 (47.02)
NOTICE:  Clinton: 65828 (49.65)
NOTICE:  Johnson: 4418 (3.33)
NOTICE:  == Round 3 ==
NOTICE:  Trump: 66755 (50.35)
NOTICE:  Clinton: 65828 (49.65)
NOTICE:  Winner of Pres 2016 is Trump!

We can even see the tables where all of the data exists for this particular election. This pattern will persist for any election controlled by this extension.

SELECT schemaname, tablename
  FROM pg_tables
 WHERE schemaname = 'pres2016';

 schemaname |   tablename   
------------+---------------
 pres2016   | working_round
 pres2016   | irv_vote

Regardless of how you might feel about the US 2016 election, it’s great to see the extension working as expected!

And isn’t that all that really matters?

See Also