PG Phriday: Extended Elections
One of the best features Postgres boasts is the ability to adapt. Any schmo off the street can write an extension and bolt it onto Postgres with nary a second glance. As proof, I’m going to whip one up really quick. That should be enough to convince anyone that it takes no skill at all to add functionality to Postgres.
Just so our extension actually does something, let’s start off with the instant-runoff code we wrote a few weeks ago. Except this time, we also want to transform it into a kind of utility to justify the fact it’s an extension. If it’s just a schema, a few tables and associated functions, it’s just another application.
Let’s start with the preamble in our SQL file:
\echo Use "CREATE EXTENSION irv;" to load this file. \quit
SET client_min_messages = warning;
--------------------------------------
-- CREATE EXTENSION USER
--------------------------------------
DO $$
BEGIN
PERFORM 1
FROM pg_roles
WHERE rolname = 'irv_role';
IF NOT FOUND THEN
EXECUTE 'CREATE ROLE irv_role';
EXECUTE 'GRANT USAGE ON SCHEMA @extschema@ TO irv_role';
END IF;
END;
$$ LANGUAGE plpgsql;
It’s best to begin with a line that prevents the extension from being executed as a simple SQL script. After that, we set the extension to report any warnings during installation.
The giant DO
block is an old trick some extension authors use. When an extension is installed, it is generally owned only by the user who installed it. In most cases, this is a superuser of some kind, and that means nobody else can use the extension. Here, we’re creating a role that will have all necessary grants to call extension functions or view table contents. Then we can just grant the role to any users that should have access to the extension.
Next we should create our tables:
CREATE TABLE election
(
election_id SERIAL PRIMARY KEY,
election_name VARCHAR NOT NULL,
schema_name VARCHAR NOT NULL UNIQUE
);
CREATE INDEX idx_election_election_name
ON election (election_name);
SELECT pg_catalog.pg_extension_config_dump('election', '');
CREATE TABLE candidate
(
candidate_id SERIAL PRIMARY KEY,
election_id INT NOT NULL REFERENCES election,
candidate_name VARCHAR NOT NULL
);
CREATE INDEX idx_candidate_candidate_name
ON candidate (candidate_name);
SELECT pg_catalog.pg_extension_config_dump('candidate', '');
CREATE TABLE vote
(
election_id INT NOT NULL REFERENCES election,
candidate_list INT[] NOT NULL
);
CREATE INDEX idx_vote_election_id
ON vote (election_id);
CREATE TABLE scratch
(
candidate_name VARCHAR,
votes BIGINT,
percent NUMERIC,
my_round INT
);
By and large, these are the same as before. But notice that we’re invoking pg_extension_config_dump
after two of the tables. This is because the election
and candidate
tables contain election metadata, and we want pg_dump
to retain that information. Normally extensions don’t contain user data, so the presumption is that recreating the extension during database population is sufficient. Well, if we want to dump and then restore our database contents and retain the elections and associated candidates, we need to register those tables with Postgres.
But what about the vote table? This is where our extension earns its name. The thing about elections is that there are a lot of them, and millions of people vote. To prevent having a gigantic vote table, we’ve elected to “shard” the election vote results into separate tables in a schema named after the election itself. This means the vote
table here is just scaffolding to help define the table which gets created by the extension.
Now let’s talk about how elections are started:
CREATE OR REPLACE FUNCTION start_election(
elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
use_schema VARCHAR := substring(
regexp_replace(elect, '\W', '', 'g'),
1, 32
);
BEGIN
INSERT INTO @extschema@.election (election_name, schema_name)
VALUES (elect, use_schema);
EXECUTE 'CREATE SCHEMA ' || use_schema;
PERFORM set_config('search_path', use_schema, true);
CREATE TABLE irv_vote (
vote_id BIGSERIAL PRIMARY KEY,
LIKE @extschema@.vote INCLUDING ALL
);
END;
$$ LANGUAGE PLPGSQL;
We start by removing all non-word characters from the election name, and then extracting the first 32 characters to create a valid schema name. Then we capture the election information as we did before. After that, we create the schema that will contain our votes, and the vote table itself. We’ve prefixed the vote table with “irv” to note that the extension created it.
We’ve also set the search_path
to default to the new schema we just created. This way, we don’t need to build a SQL statement string and use EXECUTE
to invoke it. There’s also the @extschema@
syntax. Some extensions allow users to specify a schema during installation. By using this syntax, Postgres will substitute @extschema@
for the schema where extension objects will be stored. Normally the schema is set by the extension author.
The rest of the functions have similar modifications; a preamble to set the vote schema, and @extschema@
replacements where necessary. Here’s what they look like:
CREATE OR REPLACE FUNCTION no_losers(orig INT[], remove INT[])
RETURNS INT[] AS
$$
DECLARE
item INT;
ret_arr INT[] := orig;
BEGIN
IF array_length(remove, 1) IS NULL THEN
RETURN ret_arr;
END IF;
FOR item IN 1 .. array_length(remove, 1) LOOP
ret_arr := array_remove(ret_arr, remove[item]);
END LOOP;
RETURN ret_arr;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION register_candidate(
elect VARCHAR,
cand VARCHAR
)
RETURNS VOID AS
$$
INSERT INTO @extschema@.candidate (election_id, candidate_name)
SELECT election_id, cand
FROM @extschema@.election
WHERE election_name = elect;
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION register_vote(
elect VARCHAR,
candidates VARCHAR[]
)
RETURNS VOID AS
$$
DECLARE
use_schema VARCHAR;
BEGIN
SELECT schema_name INTO use_schema
FROM @extschema@.election
WHERE election_name = elect;
PERFORM set_config('search_path', use_schema, true);
WITH ranked AS (
SELECT candidate_name, row_number() OVER () AS rank
FROM unnest(candidates) candidate_name
),
conv AS (
SELECT c.election_id, c.candidate_id, r.rank
FROM @extschema@.candidate c
JOIN ranked r USING (candidate_name)
ORDER BY rank
)
INSERT INTO irv_vote (election_id, candidate_list)
SELECT election_id, array_agg(candidate_id)
FROM conv
GROUP BY 1;
END;
$$ LANGUAGE PLPGSQL;
CREATE OR REPLACE FUNCTION tabulate_votes(
elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
use_schema VARCHAR;
num_candidates INT;
num_votes NUMERIC;
vote_round INT;
losers INT[] := '{}'::INT[]; -- Non-null empty array.
winner VARCHAR;
results RECORD;
BEGIN
SELECT schema_name INTO use_schema
FROM @extschema@.election
WHERE election_name = elect;
PERFORM set_config('search_path', use_schema, true);
DROP TABLE IF EXISTS working_round;
CREATE TABLE working_round (LIKE @extschema@.scratch);
-- Given the number of candidates in this election, we can
-- reduce calculations.
SELECT count(*) INTO num_candidates
FROM @extschema@.candidate c
JOIN @extschema@.election e USING (election_id)
WHERE e.election_name = elect;
-- We need the total number of votes cast for display
-- percentages.
SELECT count(*) INTO num_votes
FROM irv_vote v
JOIN @extschema@.election e USING (election_id)
WHERE e.election_name = elect;
-- Record each round of the IRV process. At the end of each round,
-- eliminate the worst performer and try again. Do this until we
-- reach > 50% for one candidate.
FOR vote_round IN 1 .. (num_candidates - 1) LOOP
RAISE NOTICE '== Round % ==', vote_round;
INSERT INTO working_round
SELECT c.candidate_name, count(v.vote_id) AS votes,
round(count(v.vote_id) /
num_votes * 100.0, 2) AS percent,
vote_round AS my_round
FROM irv_vote v
JOIN @extschema@.candidate c ON (
c.candidate_id = (
@extschema@.no_losers(v.candidate_list, losers)
)[1]
)
JOIN @extschema@.election e ON (e.election_id = v.election_id)
WHERE e.election_name = elect
GROUP BY c.candidate_name;
-- Print the results of the round for spot-checking.
FOR results IN
SELECT * FROM working_round
WHERE my_round = vote_round
LOOP
RAISE NOTICE '%: % (%)', results.candidate_name,
results.votes, results.percent;
END LOOP;
-- If this round has a winner, short-circuit so we can
-- just report the results.
SELECT candidate_name INTO winner
FROM working_round
WHERE percent > 50
AND my_round = vote_round;
IF winner IS NOT NULL THEN
RAISE NOTICE 'Winner of % is %!', elect, winner;
EXIT;
END IF;
-- This is where we determine the loser of this round.
-- It's just the lowest ranked result.
SELECT array_append(losers, c.candidate_id) INTO losers
FROM working_round w
JOIN @extschema@.candidate c USING (candidate_name)
WHERE my_round = vote_round
ORDER BY w.votes
LIMIT 1;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
But how to install it? First we need a file named irv.control
that will contain some parameters that tell Postgres how to install and identify the extension. It looks something like this:
comment = 'Extension for managing instant-runoff elections.'
default_version = '1.0'
relocatable = false
schema = irv
Don’t worry about that relocatable
line. That just means we don’t want the extension to be moved with ALTER EXTENSION
after initial installation. Our schema contains data instead of simply a collection of utility functions, so it’s not good practice to relocate willy nilly.
All Postgres expects is that this control file exists along with the SQL file in its configured extension directory. The easiest way to find this is to install the Postgres dev libraries so we have access to pg_config
. Extensions are normally found in the extension
subdirectory of the location reported by pg_config --sharedir
.
Then we just need to create our SQL file there, along with the control file. Postgres does have a naming scheme we need to follow, however. We need to take the extension name and version, and separate them by a double dash. So in our case, the SQL file should be named irv--1.0.sql
.
Of course, we can’t forget the grants we spoke about in the beginning:
REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ FROM PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA @extschema@ TO tab_tier_role;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ TO tab_tier_role;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA @extschema@ TO tab_tier_role;
If we combine all of the SQL from this article, and save it as irv--1.0.sql
along with the specified control file, we’ve successfully written an extension. But one question remains: does it work?
Well, let’s see! Votes have been coming in for weeks now, so we can even use the new totals under the same assumptions as before.
CREATE EXTENSION irv;
SELECT irv.start_election('Pres 2016');
SELECT irv.register_candidate('Pres 2016', 'Clinton');
SELECT irv.register_candidate('Pres 2016', 'Trump');
SELECT irv.register_candidate('Pres 2016', 'Stein');
SELECT irv.register_candidate('Pres 2016', 'Johnson');
SELECT irv.register_vote('Pres 2016', '{Clinton, Stein}')
FROM generate_series(1, 64433);
SELECT irv.register_vote('Pres 2016', '{Stein, Clinton}')
FROM generate_series(1, 1395);
SELECT irv.register_vote('Pres 2016', '{Trump, Johnson}')
FROM generate_series(1, 62337);
SELECT irv.register_vote('Pres 2016', '{Johnson, Trump}')
FROM generate_series(1, 4418);
SELECT irv.tabulate_votes('Pres 2016');
NOTICE: == Round 1 ==
NOTICE: Stein: 1395 (1.05)
NOTICE: Trump: 62337 (47.02)
NOTICE: Clinton: 64433 (48.60)
NOTICE: Johnson: 4418 (3.33)
NOTICE: == Round 2 ==
NOTICE: Trump: 62337 (47.02)
NOTICE: Clinton: 65828 (49.65)
NOTICE: Johnson: 4418 (3.33)
NOTICE: == Round 3 ==
NOTICE: Trump: 66755 (50.35)
NOTICE: Clinton: 65828 (49.65)
NOTICE: Winner of Pres 2016 is Trump!
We can even see the tables where all of the data exists for this particular election. This pattern will persist for any election controlled by this extension.
SELECT schemaname, tablename
FROM pg_tables
WHERE schemaname = 'pres2016';
schemaname | tablename
------------+---------------
pres2016 | working_round
pres2016 | irv_vote
Regardless of how you might feel about the US 2016 election, it’s great to see the extension working as expected!
And isn’t that all that really matters?