PG Phriday: Extended Elections

November 25th, 2016 | Published in Database, Tech Talk | No Comments


One of the best features Postgres boasts is the ability to adapt. Any schmo off the street can write an extension and bolt it onto Postgres with nary a second glance. As proof, I’m going to whip one up really quick. That should be enough to convince anyone that it takes no skill at all to add functionality to Postgres.

Just so our extension actually does something, let’s start off with the instant-runoff code we wrote a few weeks ago. Except this time, we also want to transform it into a kind of utility to justify the fact it’s an extension. If it’s just a schema, a few tables and associated functions, it’s just another application.

Let’s start with the preamble in our SQL file:

\echo USE "CREATE EXTENSION irv;" TO LOAD this file. \quit
 
SET client_min_messages = warning;
 
--------------------------------------
-- CREATE EXTENSION USER
--------------------------------------
 
DO $$
BEGIN
  PERFORM 1
    FROM pg_roles
   WHERE rolname = 'irv_role';
 
  IF NOT FOUND THEN
    EXECUTE 'CREATE ROLE irv_role';
    EXECUTE 'GRANT USAGE ON SCHEMA @extschema@ TO irv_role';
  END IF;
END;
$$ LANGUAGE plpgsql;

It’s best to begin with a line that prevents the extension from being executed as a simple SQL script. After that, we set the extension to report any warnings during installation.

The giant DO block is an old trick some extension authors use. When an extension is installed, it is generally owned only by the user who installed it. In most cases, this is a superuser of some kind, and that means nobody else can use the extension. Here, we’re creating a role that will have all necessary grants to call extension functions or view table contents. Then we can just grant the role to any users that should have access to the extension.

Next we should create our tables:

 
CREATE TABLE election
(
  election_id    SERIAL PRIMARY KEY,
  election_name  VARCHAR NOT NULL,
  schema_name    VARCHAR NOT NULL UNIQUE
);
 
CREATE INDEX idx_election_election_name
    ON election (election_name);
 
SELECT pg_catalog.pg_extension_config_dump('election', '');
 
CREATE TABLE candidate
(
  candidate_id    SERIAL PRIMARY KEY,
  election_id     INT NOT NULL REFERENCES election,
  candidate_name  VARCHAR NOT NULL
);
 
CREATE INDEX idx_candidate_candidate_name
    ON candidate (candidate_name);
 
SELECT pg_catalog.pg_extension_config_dump('candidate', '');
 
CREATE TABLE vote
(
  election_id     INT NOT NULL REFERENCES election,
  candidate_list  INT[] NOT NULL
);
 
CREATE INDEX idx_vote_election_id
    ON vote (election_id);
 
CREATE TABLE scratch
(
  candidate_name  VARCHAR,
  votes           BIGINT,
  percent         NUMERIC,
  my_round        INT
);

By and large, these are the same as before. But notice that we’re invoking pg_extension_config_dump after two of the tables. This is because the election and candidate tables contain election metadata, and we want pg_dump to retain that information. Normally extensions don’t contain user data, so the presumption is that recreating the extension during database population is sufficient. Well, if we want to dump and then restore our database contents and retain the elections and associated candidates, we need to register those tables with Postgres.

But what about the vote table? This is where our extension earns its name. The thing about elections is that there are a lot of them, and millions of people vote. To prevent having a gigantic vote table, we’ve elected to “shard” the election vote results into separate tables in a schema named after the election itself. This means the vote table here is just scaffolding to help define the table which gets created by the extension.

Now let’s talk about how elections are started:

CREATE OR REPLACE FUNCTION start_election(
  elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR := SUBSTRING(
    regexp_replace(elect, '\W', '', 'g'),
    1, 32
  );
BEGIN
  INSERT INTO @extschema@.election (election_name, schema_name)
  VALUES (elect, use_schema);
 
  EXECUTE 'CREATE SCHEMA ' || use_schema;
 
  PERFORM set_config('search_path', use_schema, TRUE);
 
  CREATE TABLE irv_vote (
    vote_id BIGSERIAL PRIMARY KEY,
    LIKE @extschema@.vote INCLUDING ALL
  );
END;
$$ LANGUAGE PLPGSQL;

We start by removing all non-word characters from the election name, and then extracting the first 32 characters to create a valid schema name. Then we capture the election information as we did before. After that, we create the schema that will contain our votes, and the vote table itself. We’ve prefixed the vote table with “irv” to note that the extension created it.

We’ve also set the search_path to default to the new schema we just created. This way, we don’t need to build a SQL statement string and use EXECUTE to invoke it. There’s also the @extschema@ syntax. Some extensions allow users to specify a schema during installation. By using this syntax, Postgres will substitute @extschema@ for the schema where extension objects will be stored. Normally the schema is set by the extension author.

The rest of the functions have similar modifications; a preamble to set the vote schema, and @extschema@ replacements where necessary. Here’s what they look like:

CREATE OR REPLACE FUNCTION no_losers(orig INT[], remove INT[])
RETURNS INT[] AS
$$
DECLARE
  item INT; 
  ret_arr INT[] := orig;
BEGIN
  IF array_length(remove, 1) IS NULL THEN
    RETURN ret_arr;
  END IF;
 
  FOR item IN 1 .. array_length(remove, 1) LOOP
    ret_arr := array_remove(ret_arr, remove[item]);
  END LOOP;
 
  RETURN ret_arr;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;
 
CREATE OR REPLACE FUNCTION register_candidate(
  elect VARCHAR,
  cand VARCHAR
)
RETURNS VOID AS
$$
  INSERT INTO @extschema@.candidate (election_id, candidate_name)
  SELECT election_id, cand
    FROM @extschema@.election
   WHERE election_name = elect;
$$ LANGUAGE SQL;
 
CREATE OR REPLACE FUNCTION register_vote(
  elect VARCHAR,
  candidates VARCHAR[]
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR;
BEGIN
  SELECT schema_name INTO use_schema
    FROM @extschema@.election
   WHERE election_name = elect;
 
  PERFORM set_config('search_path', use_schema, TRUE);
 
  WITH ranked AS (
    SELECT candidate_name, ROW_NUMBER() OVER () AS rank
      FROM unnest(candidates) candidate_name
  ),
  conv AS (
    SELECT c.election_id, c.candidate_id, r.rank
      FROM @extschema@.candidate c
      JOIN ranked r USING (candidate_name)
     ORDER BY rank
  )
  INSERT INTO irv_vote (election_id, candidate_list)
  SELECT election_id, array_agg(candidate_id)
    FROM conv
   GROUP BY 1;
END;
$$ LANGUAGE PLPGSQL;
 
CREATE OR REPLACE FUNCTION tabulate_votes(
  elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR;
  num_candidates INT;
  num_votes NUMERIC;
  vote_round INT;
  losers INT[] := '{}'::INT[]; -- Non-null empty array.
  winner VARCHAR;
  results RECORD;
BEGIN
  SELECT schema_name INTO use_schema
    FROM @extschema@.election
   WHERE election_name = elect;
 
  PERFORM set_config('search_path', use_schema, TRUE);
 
  DROP TABLE IF EXISTS working_round;
  CREATE TABLE working_round (LIKE @extschema@.scratch);
 
  -- Given the number of candidates in this election, we can
  -- reduce calculations.
 
  SELECT COUNT(*) INTO num_candidates
    FROM @extschema@.candidate c
    JOIN @extschema@.election e USING (election_id)
   WHERE e.election_name = elect;
 
  -- We need the total number of votes cast for display
  -- percentages.
 
  SELECT COUNT(*) INTO num_votes
    FROM irv_vote v
    JOIN @extschema@.election e USING (election_id)
   WHERE e.election_name = elect;
 
  -- Record each round of the IRV process. At the end of each round,
  -- eliminate the worst performer and try again. Do this until we
  -- reach > 50% for one candidate.
 
  FOR vote_round IN 1 .. (num_candidates - 1) LOOP
    RAISE NOTICE '== Round % ==', vote_round;
 
    INSERT INTO working_round
    SELECT c.candidate_name, COUNT(v.vote_id) AS votes,
           round(COUNT(v.vote_id) / 
                 num_votes * 100.0, 2) AS percent,
           vote_round AS my_round
      FROM irv_vote v
      JOIN @extschema@.candidate c ON (
             c.candidate_id = (
               @extschema@.no_losers(v.candidate_list, losers)
             )[1]
           )
      JOIN @extschema@.election e ON (e.election_id = v.election_id)
     WHERE e.election_name = elect
     GROUP BY c.candidate_name;
 
    -- Print the results of the round for spot-checking.
 
    FOR results IN
        SELECT * FROM working_round
         WHERE my_round = vote_round
    LOOP
      RAISE NOTICE '%: % (%)', results.candidate_name,
        results.votes, results.percent;
    END LOOP;
 
    -- If this round has a winner, short-circuit so we can
    -- just report the results.
 
    SELECT candidate_name INTO winner
      FROM working_round
     WHERE percent > 50
       AND my_round = vote_round;
 
    IF winner IS NOT NULL THEN
      RAISE NOTICE 'Winner of % is %!', elect, winner;
      EXIT;
    END IF;
 
    -- This is where we determine the loser of this round.
    -- It's just the lowest ranked result.
 
    SELECT array_append(losers, c.candidate_id) INTO losers
      FROM working_round w
      JOIN @extschema@.candidate c USING (candidate_name)
     WHERE my_round = vote_round
     ORDER BY w.votes
     LIMIT 1;
 
  END LOOP;
 
END;
$$ LANGUAGE PLPGSQL;

But how to install it? First we need a file named irv.control that will contain some parameters that tell Postgres how to install and identify the extension. It looks something like this:

comment = 'Extension for managing instant-runoff elections.'
default_version = '1.0'
relocatable = false
schema = irv

Don’t worry about that relocatable line. That just means we don’t want the extension to be moved with ALTER EXTENSION after initial installation. Our schema contains data instead of simply a collection of utility functions, so it’s not good practice to relocate willy nilly.

All Postgres expects is that this control file exists along with the SQL file in its configured extension directory. The easiest way to find this is to install the Postgres dev libraries so we have access to pg_config. Extensions are normally found in the extension subdirectory of the location reported by pg_config --sharedir.

Then we just need to create our SQL file there, along with the control file. Postgres does have a naming scheme we need to follow, however. We need to take the extension name and version, and separate them by a double dash. So in our case, the SQL file should be named irv--1.0.sql.

Of course, we can’t forget the grants we spoke about in the beginning:

REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ FROM PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA @extschema@ TO tab_tier_role;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ TO tab_tier_role;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA @extschema@ TO tab_tier_role;

If we combine all of the SQL from this article, and save it as irv--1.0.sql along with the specified control file, we’ve successfully written an extension. But one question remains: does it work?

Well, let’s see! Votes have been coming in for weeks now, so we can even use the new totals under the same assumptions as before.

CREATE EXTENSION irv;
 
SELECT irv.start_election('Pres 2016');
 
SELECT irv.register_candidate('Pres 2016', 'Clinton');
SELECT irv.register_candidate('Pres 2016', 'Trump');
SELECT irv.register_candidate('Pres 2016', 'Stein');
SELECT irv.register_candidate('Pres 2016', 'Johnson');
 
SELECT irv.register_vote('Pres 2016', '{Clinton, Stein}')
  FROM generate_series(1, 64433);
 
SELECT irv.register_vote('Pres 2016', '{Stein, Clinton}')
  FROM generate_series(1, 1395);
 
SELECT irv.register_vote('Pres 2016', '{Trump, Johnson}')
  FROM generate_series(1, 62337);
 
SELECT irv.register_vote('Pres 2016', '{Johnson, Trump}')
  FROM generate_series(1, 4418);
 
SELECT irv.tabulate_votes('Pres 2016');
 
NOTICE:  == Round 1 ==
NOTICE:  Stein: 1395 (1.05)
NOTICE:  Trump: 62337 (47.02)
NOTICE:  Clinton: 64433 (48.60)
NOTICE:  Johnson: 4418 (3.33)
NOTICE:  == Round 2 ==
NOTICE:  Trump: 62337 (47.02)
NOTICE:  Clinton: 65828 (49.65)
NOTICE:  Johnson: 4418 (3.33)
NOTICE:  == Round 3 ==
NOTICE:  Trump: 66755 (50.35)
NOTICE:  Clinton: 65828 (49.65)
NOTICE:  Winner OF Pres 2016 IS Trump!

We can even see the tables where all of the data exists for this particular election. This pattern will persist for any election controlled by this extension.

SELECT schemaname, tablename
  FROM pg_tables
 WHERE schemaname = 'pres2016';
 
 schemaname |   tablename   
------------+---------------
 pres2016   | working_round
 pres2016   | irv_vote

Regardless of how you might feel about the US 2016 election, it’s great to see the extension working as expected!

And isn’t that all that really matters?


Tags: , , ,

PG Phriday: Primal Planner Prep

November 18th, 2016 | Published in Database, Tech Talk | No Comments


The Postgres query planner is house of cards built upon the ever-shifting sand of our data. It has the utterly impossible mission of converting our ridiculous and inane requests into a logical series of fetch, filter, sort, join, and other instructions. Then the resulting steps must be ruthlessly efficient or the execution phase could very well saturate every hardware resource available; Set Theory isn’t very forgiving.

Forewarned is forearmed is very apt when applied to database query planners. Without proper statistics, they are reduced to assumptions that make adequate first approximations. But scale is the utter enemy of imprecision, as multiplicative effects quickly overwhelm reality. This allows seemingly simple report scripts to fall endlessly into a pit of smoldering system resources.

To perhaps translate that analogy a bit, let’s start with a very basic schema:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - (s.id || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE INDEX idx_sensor_log_reading_date
    ON sensor_log (reading_date DESC);
 
ANALYZE sensor_log;

It’s just a basic sensor log table with a mere five million rows. There are only one thousand sensors spread across one hundred locations, and readings are captured once per second. There’s nothing ground-breaking here.

Given how unassuming this structure appears, imagine we have a regularly scheduled script that executes in multiple steps. Each step produces an intermediate UNLOGGED TABLE because a chance the script is reentrant and we taught our users that valuable trick.

But there’s a problem. Execution time, disk IO, and CPU usage are all much higher than we’d like. If we investigated the first two parts of the script, we might see something like this:

CREATE UNLOGGED TABLE recent_info AS
SELECT * FROM sensor_log
 WHERE reading_date >= CURRENT_DATE - INTERVAL '5 day';
 
EXPLAIN ANALYZE
SELECT * 
  FROM sensor_log
 WHERE id IN (
         SELECT id FROM recent_info
          WHERE reading BETWEEN 10 AND 50
       );
 
                              QUERY PLAN
------------------------------------------------------------------------
 Nested Loop  (cost=6966.50..6977.01 ROWS=2500000 width=23)
              (actual TIME=225.389..2151.500 ROWS=177120 loops=1)
   ->  HashAggregate  (cost=6966.07..6966.09 ROWS=2 width=4)
                      (actual TIME=225.342..334.923 ROWS=177120 loops=1)
         GROUP KEY: r.id
         ->  Seq Scan ON recent_info r
                 (cost=0.00..6962.56 ROWS=1404 width=4)
                 (actual TIME=0.030..135.195 ROWS=177120 loops=1)
               FILTER: ((reading >= 10) AND (reading <= 50))
               ROWS Removed BY FILTER: 254880
   ->  INDEX Scan USING sensor_log_pkey ON sensor_log s
           (cost=0.43..5.45 ROWS=1 width=23)
           (actual TIME=0.009..0.010 ROWS=1 loops=177120)
         INDEX Cond: (id = r.id)
 
 Planning TIME: 0.290 ms
 Execution TIME: 2164.980 ms

The first step creates an unlogged table to store some small fraction of the source data. That in itself is extremely common. The problems start immediately when we try to use that intermediate table as the basis for further tables. If we examine the query that builds the second table, it’s readily apparent something is horribly wrong.

Don’t worry about learning to read EXPLAIN output. In all of this output, the first set of parentheses is the planner estimate for that particular step, while the second outlines what actually happened. If we just compare the expected row counts between the two, the estimate of matched rows in recent_info was off by two orders of magnitude. Due to this drastic underestimation, the planner figured it would be faster to loop through the 1404 rows, and find corresponding matches in sensor_log.

Well, there’s a dramatic difference in looping over 1,000 matches and nearly 200,000. Our example isn’t terrible due to the scale, but a real system likely dwarfs ours by a factor of 100 at minimum. Errors scale, and not always linearly.

Postgres has a mechanism for fixing bad estimates like this called ANALYZE. There’s even a series of background workers with the sole duty of analyzing tables as data accumulates. If we inserted an arbitrary pause in the script, it’s possible one of those workers would eventually accumulate the missing statistics and transform the planner assumptions into hard facts. Or we could force the issue and analyze the table ourselves.

Let’s see how that changes the execution plan:

ANALYZE recent_info;
 
EXPLAIN ANALYZE
SELECT * 
  FROM sensor_log
 WHERE id IN (
         SELECT id FROM recent_info
          WHERE reading BETWEEN 10 AND 50
       );
 
                              QUERY PLAN
------------------------------------------------------------------------
 MERGE Semi JOIN  (cost=26533.87..42842.33 ROWS=176707 width=23)
                  (actual TIME=195.161..419.301 ROWS=177120 loops=1)
   MERGE Cond: (s.id = r.id)
   ->  INDEX Scan USING sensor_log_pkey ON sensor_log s
           (cost=0.43..141129.93 ROWS=5000000 width=23)
           (actual TIME=0.010..92.786 ROWS=431951 loops=1)
   ->  Materialize  (cost=26533.15..27416.68 ROWS=176707 width=4)
                    (actual TIME=195.130..237.153 ROWS=177120 loops=1)
         ->  Sort  (cost=26533.15..26974.92 ROWS=176707 width=4)
                   (actual TIME=195.125..218.283 ROWS=177120 loops=1)
               Sort KEY: r.id
               Sort Method: external sort  Disk: 2424kB
               ->  Seq Scan ON recent_info r
                       (cost=0.00..9232.00 ROWS=176707 width=4)
                       (actual TIME=0.018..93.871 ROWS=177120 loops=1)
                     FILTER: ((reading >= 10) AND (reading <= 50))
                     ROWS Removed BY FILTER: 254880
 
 Planning TIME: 0.305 ms
 Execution TIME: 426.620 ms

The degree of difference here really illustrates how well the planner adapts. Instead of a nested loop, it opted to fetch, filter, and sort the rows from recent_info and merge that into the primary key for sensor_log to find the intersection. All of our row estimates are much better, too. Why such a radical departure from the original plan?

To find that answer, we need to examine the Postgres catalog. This is where Postgres maintains everything it knows about tables, indexes, and other objects that reside within its confines. Of particular interest to us are the pg_stats view and pg_class table. This is a small part of what they contain after we analyzed recent_info:

SELECT reltuples, relpages
  FROM pg_class
 WHERE relname = 'recent_info';
 
 reltuples | relpages
-----------+----------
    432000 |     2752
 
SELECT attname, n_distinct
  FROM pg_stats
 WHERE tablename = 'recent_info';
 
   attname    | n_distinct 
--------------+------------
 id           |         -1
 location     |       1000
 reading      |        100
 reading_date |         -1

If we ran these same two queries immediately after creating recent_info, the first would report zero tuples, and the second would show no matches at all. At that point, Postgres knew next to nothing about the table, and that is reflected in the row estimates and planner decisions.

After we analyzed recent_info, Postgres garnered a vast smorgasbord of pertinent statistics. Not only does it have an approximate row count, it also knows how many data pages the table occupies. This lets Postgres calculate expense related to hardware interaction; how much work is involved with fetching and processing these rows.

What Postgres gains from the contents of pg_stats is altogether different. When Postgres analyzes a table, it performs a heuristic statistical sampling of its physical contents. This includes such facts such as most frequent values for all columns, average size of column data, the amount of distinct values in per column, and so on. As we can see, Postgres did a great job of scanning the table contents, as it identified the exact variance for our location and reading data.

Negative values in n_distinct denote a ratio between the amount of distinct values for that column and the total row count for the table. From that, we can derive that there’s basically one unique id or reading_date per row. Neat! For us mundane humans, this shows us the best columns to index. For Postgres, it will consider value histograms and multiply frequencies together and produce much more accurate row estimates. Better estimates almost always result in improved query execution times.

Without table analysis, Postgres is effectively operating blindly. Adding an ANALYZE statement after initializing every temporary or unlogged table is a critical element to script performance. Yet it’s also a step that’s often omitted. Inexperienced users aren’t aware of Postgres internals, and may not even know the ANALYZE command exists.

There is, of course, a way to cheat and prevent our users from having to become Postgres experts. Postgres added event triggers in version 9.3. These triggers can activate any time DDL is detected, and that means we can detect new tables right when they’re created.

Watch this:

CREATE OR REPLACE FUNCTION analyze_new_table()
RETURNS event_trigger AS
$$
DECLARE
  tab_name TEXT;
BEGIN
  IF tg_tag IN ('CREATE TABLE AS', 'SELECT INTO') THEN
    FOR tab_name IN
      SELECT objid::REGCLASS::TEXT
        FROM pg_event_trigger_ddl_commands()
    LOOP
      EXECUTE 'ANALYZE ' || tab_name;
    END LOOP;
  END IF;
END;
$$ LANGUAGE plpgsql;
 
CREATE EVENT TRIGGER t_analyze_new
    ON ddl_command_end 
       EXECUTE PROCEDURE analyze_new_table();

Now any time a table is created with CREATE TABLE AS or SELECT INTO, Postgres will instantly analyze the contents. Depending on the size of the table and the granularity of the default settings, this may be a relatively demanding operation. However, considering how greatly statistics determine performance, such a trigger may actually be a requirement in some settings.

The alternative is allowing illiteracy of Postgres internals or accidentally forgetting the ANALYZE clause. The consequences of which could spell the difference between a report script executing in a matter of minutes, or over the course of several grueling hours.

I know which scenario I’d prefer!


Tags: , , , ,

PG Phriday: Instant Runoff Through SQL

November 11th, 2016 | Published in Database, Tech Talk | No Comments


The United States held an election recently, and there has been some … mild controversy regarding the results. Many raised issues about this before the election itself, but what if we had used instant-runoff voting instead? More importantly, can we implement it with Postgres?

Well, the answer to the last question is a strong affirmative. So long as we don’t break the results down into voting districts, and make wild unsupported assumptions regarding rankings, that is. But what’s a little inaccuracy in a demo?

Let’s start with the base tables:

CREATE TABLE irv_election
(
  election_id    SERIAL PRIMARY KEY,
  election_name  VARCHAR NOT NULL
);
 
CREATE INDEX idx_election_election_name
    ON irv_election (election_name);
 
CREATE TABLE irv_candidate
(
  candidate_id    SERIAL PRIMARY KEY,
  election_id     INT NOT NULL REFERENCES irv_election,
  candidate_name  VARCHAR NOT NULL
);
 
CREATE INDEX idx_candidate_candidate_name
    ON irv_candidate (candidate_name);
 
CREATE TABLE irv_vote
(
  vote_id         SERIAL PRIMARY KEY,
  election_id     INT NOT NULL REFERENCES irv_election,
  candidate_list  INT[] NOT NULL
);
 
CREATE INDEX idx_vote_election_id
    ON irv_vote (election_id);
 
CREATE TABLE irv_scratch
(
  candidate_name  VARCHAR,
  votes           BIGINT,
  percent         NUMERIC,
  my_round        INT
);

Technically we could use this voting engine for the entire election, but we’d need to register each voting district separately. Ain’t nobody got time for that. Instead, we’ll just assume the popular vote works for the whole thing. These are the latest election results for the four major candidates while this post was being written:

  • Donald Trump – 60,071,650
  • Hillary Clinton – 60,467,245
  • Gary Johnson – 4,123,062
  • Jill Stein – 1,237,116

Given that our lame tabulation system hasn’t integrated the Electoral College, it’s winner-take-all, and therefore Clinton is the next President! Never mind that the margin of error results in a statistical tie; math doesn’t matter. Yet we still need an API for data entry to simplify everything.

To do that, we need three functions. One to register the election, one for candidates, and one for incoming votes. Here’s what those might look like:

CREATE OR REPLACE FUNCTION start_election(
  election VARCHAR
)
RETURNS VOID AS
$$
  INSERT INTO irv_election (election_name)
  VALUES (election);
$$ LANGUAGE SQL;
 
 
CREATE OR REPLACE FUNCTION register_candidate(
  election VARCHAR,
  candidate VARCHAR
)
RETURNS VOID AS
$$
  INSERT INTO irv_candidate (election_id, candidate_name)
  SELECT election_id, candidate
    FROM irv_election
   WHERE election_name = election;
$$ LANGUAGE SQL;
 
 
CREATE OR REPLACE FUNCTION register_vote(
  election VARCHAR,
  candidates VARCHAR[]
)
RETURNS VOID AS
$$
  WITH ranked AS (
    SELECT candidate_name, ROW_NUMBER() OVER () AS rank
      FROM unnest(candidates) candidate_name
  ),
  conv AS (
    SELECT c.election_id, c.candidate_id, r.rank
      FROM irv_candidate c
      JOIN ranked r USING (candidate_name)
     ORDER BY rank
  )
  INSERT INTO irv_vote (election_id, candidate_list)
  SELECT election_id, array_agg(candidate_id)
    FROM conv
   GROUP BY 1;
$$ LANGUAGE SQL;

Fancy! The start_election and register_candidate functions are pretty obvious, but what on Earth is going on in the register_vote routine?

Since we decided to use an array for the candidate vote ranking, we need to decode the candidate names into their corresponding IDs while maintaining the original ordering. There may be a better way to do this, but we started with a CTE to rank the candidates, another to convert the IDs, and a final aggregate to recombine the translated values. We could have avoided all of this complexity by simply using the text provided by the voter, but that’s too easy.

With the standard API out of the way, we still need to tabulate the votes themselves. IRV is a simple concept, and pretty easy to implement. It’s just a bit ugly:

CREATE OR REPLACE FUNCTION no_losers(orig INT[], remove INT[])
RETURNS INT[] AS
$$
DECLARE
  item INT; 
  ret_arr INT[] := orig;
BEGIN
  IF array_length(remove, 1) IS NULL THEN
    RETURN ret_arr;
  END IF;
 
  FOR item IN 1 .. array_length(remove, 1) LOOP
    ret_arr := ret_arr - remove[item];
  END LOOP;
 
  RETURN ret_arr;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;
 
 
CREATE OR REPLACE FUNCTION tabulate_votes(
  election VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  num_candidates INT;
  num_votes NUMERIC;
  vote_round INT;
  losers INT[] := '{}'::INT[]; -- Non-null empty array.
  winner VARCHAR;
  results RECORD;
BEGIN
 
  DROP TABLE IF EXISTS working_round;
  CREATE TABLE working_round (LIKE irv_scratch);
 
  -- Given the number of candidates in this election, we can
  -- reduce calculations.
 
  SELECT COUNT(*) INTO num_candidates
    FROM irv_candidate c
    JOIN irv_election e USING (election_id)
   WHERE e.election_name = election;
 
  -- We need the total number of votes cast for display
  -- percentages.
 
  SELECT COUNT(*) INTO num_votes
    FROM irv_vote v
    JOIN irv_election e USING (election_id)
   WHERE e.election_name = election;
 
  -- Record each round of the IRV process. At the end of each round,
  -- eliminate the worst performer and try again. Do this until we
  -- reach > 50% for one candidate.
 
  FOR vote_round IN 1 .. (num_candidates - 1) LOOP
    RAISE NOTICE '== Round % ==', vote_round;
 
    INSERT INTO working_round
    SELECT c.candidate_name, COUNT(v.vote_id) AS votes,
           round(COUNT(v.vote_id) / 
                 num_votes * 100.0, 2) AS percent,
           vote_round AS my_round
      FROM irv_vote v
      JOIN irv_candidate c ON (
             c.candidate_id = (no_losers(v.candidate_list, losers))[1]
           )
      JOIN irv_election e ON (e.election_id = v.election_id)
     WHERE e.election_name = election
     GROUP BY c.candidate_name;
 
    -- Print the results of the round for spot-checking.
 
    FOR results IN
        SELECT * FROM working_round
         WHERE my_round = vote_round
    LOOP
      RAISE NOTICE '%: % (%)', results.candidate_name,
        results.votes, results.percent;
    END LOOP;
 
    -- If this round has a winner, short-circuit so we can
    -- just report the results.
 
    SELECT candidate_name INTO winner
      FROM working_round
     WHERE percent > 50
       AND my_round = vote_round;
 
    IF winner IS NOT NULL THEN
      RAISE NOTICE 'Winner of % is %!', election, winner;
      EXIT;
    END IF;
 
    -- This is where we determine the loser of this round.
    -- It's just the lowest ranked result.
 
    SELECT array_append(losers, c.candidate_id) INTO losers
      FROM working_round w
      JOIN irv_candidate c USING (candidate_name)
     WHERE my_round = vote_round
     ORDER BY w.votes
     LIMIT 1;
 
  END LOOP;
 
END;
$$ LANGUAGE PLPGSQL;

The no_loser function is necessary due to a small issue with Postgres array handling. Primarily, we can’t subtract an array from another array. Imagine we maintain a list of losers for each subsequent IRV round. Totals would normally be a simple matter of obtaining the first element of the ranked candidates minus the list of losers. Even the intarray extension won’t help us, as it reorders the results, defeating the purpose of ranking them. Our function doesn’t have that problem, but it could probably be more efficient. We marked it as IMMUTABLE and STRICT as optimization flags, as static parameters will always produce the same results.

The tabulate_votes function is pretty hefty, but it does everything we need. This is where the “instant” in instant-runoff occurs. Since the candidates are pre-ranked, we can just use the data we already have in order to obtain a majority winner. Really we’re just running a plurality vote over and over again after slowly killing off the losers; it’s a macabre kind of mathematical evolution.

Now we just need some data. It’s probably possible to download the results from some website somewhere, but we’re already pretty inaccurate. It’s not possible to derive voter rank intent from that data in any case. Instead, we’ll just take the vote totals and reduce them by a factor of 1000 to make the election quick.

Regarding the rankings, we have to make a few decisions that are fairly coarse. For now, let’s just take a liberal versus conservative approach. To do that, we’ll assume Stein voters would choose Clinton as their second choice, and vice-versa. Trump and Johnson are the same story. This is a really polarized election, so neither group would debase themselves by humoring the other candidates.

SELECT start_election('Pres 2016');
 
SELECT register_candidate('Pres 2016', 'Clinton');
SELECT register_candidate('Pres 2016', 'Trump');
SELECT register_candidate('Pres 2016', 'Stein');
SELECT register_candidate('Pres 2016', 'Johnson');
 
SELECT register_vote('Pres 2016', '{Clinton, Stein}')
  FROM generate_series(1, 60467);
 
SELECT register_vote('Pres 2016', '{Stein, Clinton}')
  FROM generate_series(1, 1237);
 
SELECT register_vote('Pres 2016', '{Trump, Johnson}')
  FROM generate_series(1, 60071);
 
SELECT register_vote('Pres 2016', '{Johnson, Trump}')
  FROM generate_series(1, 4123);
 
SELECT tabulate_votes('Pres 2016');
 
NOTICE:  == Round 1 ==
NOTICE:  Clinton: 60467 (48.03)
NOTICE:  Stein: 1237 (0.98)
NOTICE:  Trump: 60071 (47.71)
NOTICE:  Johnson: 4123 (3.27)
NOTICE:  == Round 2 ==
NOTICE:  Clinton: 61704 (49.01)
NOTICE:  Trump: 60071 (47.71)
NOTICE:  Johnson: 4123 (3.27)
NOTICE:  == Round 3 ==
NOTICE:  Clinton: 61704 (49.01)
NOTICE:  Trump: 64194 (50.99)
NOTICE:  Winner OF Pres 2016 IS Trump!

And there you have it. Using completely invalid assumptions for a voting system we will never implement, Trump is the winner of the popular vote!

The important lesson in all of this is partially of scale. If we really had entered all of the data for every district into this system, we would have over 125-million rows. We could transfer all of that data to an external reporting system for tabulation, or we can just handle that process locally. The caller of tabulate_votes can check the working_round table to directly observe the results of each round and handle orders of magnitude less data.

We also used a pretty crazy range of techniques here. CTEs, window functions, functional hints, array manipulation, debug notices, and temporary tables are just a tiny fragment of what’s really available.

In the end we can all agree that the true winner of the election was Postgres.


Tags: , , ,

How Donald Trump Happened

November 9th, 2016 | Published in Rant | No Comments


I know a lot of people watched the election results in disbelief last night, or woke up this morning and thought something like this:

You maniacs! You blew it up!

There’s a bit of sad truth there. But the real problem is how we reached the point where this was even possible. The amount of incredulity on display here is actually quite shocking to anyone that was paying attention. I knew Trump had some small chance given the political environment in America right now, yet I never thought he could actually win. The reason behind that misapprehension should shock some sense into every single one of us.

The Echo Chamber

One of Trump’s primary criticisms regarding coverage of his campaign and of himself is media corruption. Can we really claim this was a mischaracterization? Back in March, the Washington Post ran 16 negative articles about Bernie Sanders in a single day. Considering the DNC itself was against his nomination, that’s hardly a surprise. Donna Brazile, the replacement DNC Chair (and former CNN correspondent) after Debbie Wasserman Schultz resigned (to work for Hillary), colluded with CNN for early access to debate questions.

But it’s not just direct DNC involvement. A ridiculous 91 percent of Trump news coverage from July to October has been negative. New York Times? Bought. ABC? Paid for. NBC? In the bag. It just goes on and on like this. Not even The Onion is safe, having been acquired by Univision, which just happens to be chaired by a top Clinton donor. Let’s not forget Google. Add in naked propaganda organizations like Correct The Record specifically designed to target social media, and you have a perfect storm of constant negative coverage.

It worked against Sanders, so I guess the theory was that Trump would suffer the same result. The problem is all of this positive Clinton news and negative Trump bias caused a feedback loop. Media was faced with the problem of selling their new narrative. Polling is acknowledged as an inexact science, so a little Democratic oversampling is easy to dismiss or ignore. Despite the real numbers, it would appear as though Hillary had the upper hand. This may or may not serve to discourage Republican voters, but the inherent danger here is that it does the opposite. Secure in their knowledge that Trump is a joke and has no chance at winning, Democrats might actually be more likely to skip the vote. Especially for a candidate they don’t enthusiastically support.

It became an endless litany on the news, in the papers, in social media, on forums, and everywhere else: Trump is a joke. Trump is sexist. Trump is evil incarnate and eats babies while pairing them with the wrong wine. The bottom line is that Trump has no chance, regardless of how much support he might actually have. By ignoring objectivity, media failed to identify legitimate voter concerns that might result in support for Trump.

We live in a time where media garners advertiser revenue through endless editorializing. They play one side against the other for views just as shamelessly and odiously as clickbait headlines. That our political landscape is a Coke versus Pepsi analogy—with all of the associated emotional investment—should be no surprise to anyone. So when Trump claims the media is biased, everyone believes him. As a result, little to nothing the media says about Trump could ever be taken seriously.

Naive, Not Evil

So who would vote for Donald Trump, that racist, misogynist, insecure, shameless, huckster clown? Well, over 59 million people did just that, and though mentally expedient, considering all of them to be misogynist racists is a mere application of othering. Framing opposing viewpoints as unworthy or even repulsive is a favorite pastime of Humanity. Maybe Trump supporters have concerns besides identity politics.

Michael Moore recently called Trump a “human Molotov cocktail“. Yet why might someone feel that they want to burn the system down in the first place? Well, we’ve lost 5 million manufacturing jobs since 2000 for one. Health care costs have tripled since 2001. Rents have quickly outpaced inflation while median income is stagnating. Meanwhile, our infrastructure is crumbling around us.

People most affected by these problems are scared and angry. If not redress, they demand and deserve recognition. Instead, they received disdain and open derision. These are the people Hillary described as a “basket of deplorables“. That kind of tone-deaf rhetoric has consequences, and is no way to ingratiate yourself to a voting demographic.

Trump on the other hand, had a different message. He wants to make America “great” again. Whether or not such a thing is possible, that’s the mantra. Which candidate would you vote for, given only one will even acknowledge your existence? Will Trump deliver on any of his promises? Of course not; the system doesn’t work that way. But as a technique for garnering support, it’s a flawless and battle-tested approach.

There’s also the constant undercurrent of cognitive dissonance this election. Many Republicans repudiated Trump by voting for Hillary this year, but not everyone is so willing. For better or worse, Trump was the GOP nominee for 2016, and as such, receives a lot of automatic base support. Tradition is a something of a double-edged sword. An average Republican voter faced with Trump as a candidate must make a decision. Has the GOP, the party they’ve supported all of their lives, the party their friends and family have vouched for and rallied behind for generations, be wrong? Or is the media just being its usual disingenuous self?

No, it is the voters who are wrong!

Given how much hyperbole has defined this election, it’s probably easier to believe Trump is misquoted, or has basic human flaws that are being exaggerated. There’s a reason Trump has been so ruthless in lambasting all forms of media and journalism; he knows nobody trusts them anymore and wants to reinforce that message. It’s just dirty politics trying to keep your guy out! How shameless can they get?!

A Flawed Candidate

I already wrote a long diatribe mostly centered on Hillary. The main defense I’ve heard is that Hillary only looks bad because the Republicans have been maligning her for 30 years. Well, so what? Even if that were the case and all of her scandals are conspiracy fever dreams, that’s the public perception. You know, the people who vote? Both Hillary and Trump are among the least favorable candidates in history. Nominating her was a critical mistake by the DNC. Despite all of the owed favors that led to the situation, this should have never happened in the first place. If it was so important to nominate a woman for 2016, why not someone with orders of magnitude less political baggage?

Consider the average undecided or even Republican voter. To them, Hillary is essentially riding Bill’s coattails and represents another family dynasty. As if George Bush I and II weren’t bad enough. She gives $200k speeches to the same soulless banks that are ruining the economy. She’s openly calling for a no-fly zone in Syria, which could cause another cold war with Russia. What is this, 1950? She deleted evidence after receiving a subpoena, and was never held accountable. Remember when Martha Stewart went to prison for merely lying to the FBI?

To the general public, she’s an unaccountable, corrupt, establishment candidate. Instead of the guy who regularly drew record breaking crowds, we got stuck with a widely hated demagogue because it was “her turn”. With a choice between Hillary and the embodiment of a narcissistic scam artist, why not vote to burn the whole thing down? It certainly sends a message.

Republicans hated her. Independents weren’t very enthusiastic for her. Even long-time Democrats begrudgingly accepted her as some Republicans probably accepted Trump. And still I believed she would win. She would be able to nominate at least one Supreme Court seat, after all. Given her health problems, it’s possible Tim Kaine would take over, and he’s about as unoffensive as possible. That would have been a much safer result than setting fire to the whole establishment by voting Trump. As the results started rolling in, I went from surprised to dismayed, just like too many others.

That means even I was taken in by the media narrative. I went into the election knowing everything I just wrote, and yet I assumed Trump was enough of a caricature that it wouldn’t matter. This is the true and terrible aftermath of ignoring inconvenient realities. The media never sufficiently did their job, instead electing to butter our asses with sensationalism and melodrama. This is the result. We now have a reality TV star who mugs like a 1930’s mob boss as our Commander in Chief.

Are you fucking kidding me?

Thanks to the bite-sized inoffensive pap the media has been feeding us for the last couple decades, we’re now Idiocracy. Thanks, assholes.


Tags: , , ,

PG Phriday: MySQL Mingle

October 28th, 2016 | Published in Database, Tech Talk | No Comments


Through the wonderful magic of corporate agreements, I’ve been pulled back into (hopefully temporarily) managing a small army of MySQL servers. No! Why can’t this just be a terrible nightmare?! Does anyone deserve such debasement?

Side effects of using MySQL may include...

Side effects of using MySQL may include…

Hyperbole? Maybe a little. If MySQL was really that terrible, it wouldn’t be in such widespread use. However, as a Postgres DBA for so many years, I’ve come to appreciate what really sets it apart from engines and development approaches like those showcased in MySQL.

Let’s explore a few of these.

Privileges

MySQL user management is an unholy abomination. As ridiculous as it sounds, this is no exaggeration.

This is how to create a superuser in Postgres:

CREATE USER some_bozo WITH PASSWORD 'cowsrule' SUPERUSER;

And here’s the same thing in MySQL:

CREATE USER 'some_bozo'@'%' IDENTIFIED BY 'cowsrule';
GRANT ALL ON *.* TO 'some_bozo'@'%';

That’s not so bad, right? Syntactically no, but the implication of the @ symbol can’t be ignored. Consider these two users:

CREATE USER 'just_a_girl'@'127.0.0.1' IDENTIFIED BY 'justwhy';
CREATE USER 'just_a_girl'@'192.168.56.10' IDENTIFIED BY 'you suck';

Yes, that’s two distinct users. It just so happens that the second IP address is for the local VM running MySQL itself. Depending on the host specified in the connection string, we would need to supply a different password to authenticate. In older MySQL installations, it’s not uncommon to accumulate dozens of entries for the same user. Occasionally one or more of these users will have a different password from the rest.

What about roles? Well, MySQL doesn’t have them. Postgres DBAs generally recommend managing grants through role assignment. Users can be transient, so granting direct access to database objects isn’t very transferable. Imagine we have two tables out of one hundred, meant especially for reporting. We could do this in Postgres:

CREATE ROLE bi_role;
CREATE USER angie WITH PASSWORD 'sillygoose';
CREATE USER fred WITH PASSWORD 'dinotopia';
 
GRANT SELECT ON tab_a TO bi_role;
GRANT SELECT ON tab_b TO bi_role;
 
GRANT bi_role TO angie;
GRANT bi_role TO fred;

The bi_role role can be granted to anyone who takes on that job title, be it a single person, or a whole team. They all have access to the same objects, and we don’t have to micro-manage the grants to each individual user. Here’s the same thing in MySQL:

CREATE USER angie@'%' IDENTIFIED BY 'sillygoose';
CREATE USER fred@'%' IDENTIFIED BY 'dinotopia';
 
GRANT SELECT ON test.tab_a TO angie@'%';
GRANT SELECT ON test.tab_b TO angie@'%';
GRANT SELECT ON test.tab_a TO fred@'%';
GRANT SELECT ON test.tab_b TO fred@'%';

Not so bad with two tables, but this compounds geometrically as the count of tables or team-members increases. As a consequence, it’s often easier to simply grant access to everything in a database, regardless of how appropriate that may be.

Beta Bumps

Quick! Can anyone figure out when the GTID feature was added to MySQL? It’s a huge improvement and makes things like multi-master possible, as well as enabling an entire tool chain that imparts functionality like automated failovers. It’s 5.6, right?

Wrong. It’s 5.6.5.

Let that sink in. Really let it simmer. I was outright incredulous when I realized what happened here. A critical feature introduced in a dot release, something normally reserved for bug fixes and security vulnerabilities. There’s a good reason most software projects do it that way, and the question that introduced this section is the first clue.

According to the MySQL documents, 5.6.5 is a Milestone release, and this is what they say about Milestones:

This is a milestone release, for use at your own risk. Significant development changes take place in milestone releases and you may encounter compatibility issues, such as data format changes that require attention in addition to the usual procedure of running mysql_upgrade. For example, you may find it necessary to dump your data with mysqldump before the upgrade and reload it afterward.

What they don’t say is that Milestones are part of their alpha, beta, and release candidate process. The first “real” version of 5.6 was really 5.6.10. Surprise! Hopefully nobody has a legacy system running 5.6.8 somewhere, thinking it was a general release.

What version is the first version of Postgres 9.6? 9.6.0. That’s important. There’s no ambiguity in that release designation.

Am I overreacting? Probably. I’m still in a bit of shock, and first reactions aren’t always sane.

Replication

This is one place MySQL may still maintain a slight advantage over Postgres. Postgres unequivocally has more accurate and stable replication with its binary transaction log stream. A replica really is a replica in every respect, right down to which page data is written to. Except… well, there’s a question of utility.

MySQL replication is more like database-wide logical replication. It’s as if pglogical were an integral part of Postgres. Don’t want to replicate a specific database in the instance? Exclude it. Don’t need certain tables? Ignore them. This is one reason MySQL is so popular for sharding. Using this kind of partial replication, shards can be distributed across a whole fabric of MySQL servers, with designated alternate locations.

Combine that with replication chaining, and you get a circular architecture of failover candidate shard masters. It’s multi-master without an explicit need for a global transaction ID. Of course by adding such functionality in 5.6, these types of replica arrangements become more robust.

Can Postgres do anything like that? Kinda. The citus extension is probably the closest approximation regarding distribution model. Tables can be selectively broadcast to various candidate servers, and shards can have a configurable amount of spare copies spread through the cluster. But all queries have to go through the central master, as it retains the full mapping of active shard locations. It’s a way to distribute data and parallelize queries, not to multi-master.

There’s the BDR extension, but it’s a replication strategy instead of a distribution technique. Every master must contain all data in the cluster. Short of using UNLOGGED tables, there’s no exclusion process. Yes it’s multi-master, but instead of spreading 5TB across 10 nodes, we end up with 10 5TB nodes. Yuck. BDR is also incompatible with foreign data wrappers, a hallmark feature of Postgres since 9.3. In fact, there’s a whole boatload of restrictions that drastically reduce Postgres’ capabilities.

This also ignores DDL. Not a single Postgres logical replication system will properly replicate a ‘CREATE TABLE’ statement, for example. Considering these statements are in the transaction log, as they must be in order to appear on the primary system, I’m not entirely sure why this is the case. Regardless, there’s no direct equivalent to MySQL’s replication system and its relative flexibility at this time. Even though MySQL’s replication is basically just a logical replay mechanism, being tightly interwoven into the core makes a massive difference in potential.

Sunny Side

Is there any way Postgres can save me from this torment? Is there anything I wish Postgres had? Oddly enough, the answers are no and yes respectively. This is something of a surprise, as I was fully expecting to hate every aspect of MySQL after giving it up back in the 3.23 days. It still has a lot of the old gotchas, but others have naturally evolved with the platform.

User management on MySQL systems is an utter nightmare I wouldn’t wish on my worst enemy. Due to MySQL’s replication system being logical, many tools don’t work without the GTID feature. In some cases, this may mean across-the-board upgrades before reliable high availability is even an option. On extremely old platforms, it’s generally better for existing database instances to continue as MySQL systems. There’s a reason why WordPress is still officially MySQL only. So I’m stuck with these servers for the duration.

On the other hand, MySQL replication is something I wish Postgres had an equivalent for. Not because the Postgres binary approach is bad or incomplete, but because of the flexibility a fully logical approach bestows. It would be a chore to deploy and probably maintain, but it’s entirely possible to create a five node cluster and distribute ten shards across them at two shards each. Add in replication and one shard from each node could exist on two additional nodes. That’s an impressive level of redundancy. And each shard is fully read/write on their active tables, while all the replicas are read capable.

It would require a responsible access layer to decode active shard locations depending on the current mapping, but those already exist. This is definitely an area where Postgres could grow. A Postgres extension could doubtlessly make it happen if there was enough demand and backing. Or maybe there’s already something in the works that hasn’t yet been released.

Either way, I plan on getting these new MySQL servers whipped into shape so they can be left to their own devices with little further intervention. I’m still a proud Postgres DBA, after all.


Tags: , , , ,

« Older Posts

Newer Posts »