PG Phriday: RESTing in the Corn

Last week we explored using Postgres as a central communication nexus between several data sources. At the time, I made a fairly hand-wavy allusion to REST interfaces. Since I hadn’t really explored further, I had assumed PLV8 could use core node.js or other similar libraries to invoke HTTP APIs. Of course as a trusted language, PLV8 isn’t allowed to do that. It’s more of a language for easily manipulating JSON and JSONB objects within Postgres.

Only slightly less violent than JSON

So we need some other way of calling out to an external HTTP resource, wrapping it in column trimmings, and sending the data to Postgres. As demonstrated last week, we could do this directly with a Python (or some other language) function. Alternatively we can use an extension that has access to Python. Then all we would need to do is write a python library, and suddenly we can interact with the REST interface as if it were an actual Postgres table. This is where Multicorn comes in.

One of the easiest ways to install Multicorn is through PGXN. It goes something like this:

sudo pgxn install multicorn

Otherwise users of Red Hat variants have a yum repo or Debian/Ubuntu can rely on the apt repo. Multicorn is popular enough that it’s included in both of these resources, which is very handy for us.

In any case, we need a basic REST service to poll. In the interests in keeping things simple, let’s use Flask to whip up an API for fetching all the words in a UNIX dictionary file. Since the jsonify method is extremely slow, it’s probably a good idea to cache the output too. The end result might look something like this:

from flask import Flask, jsonify
 
app = Flask(__name__)
 
counter = 1
dictionary = []
cached = None
 
words = open('/usr/share/dict/words', 'r')
 
for line in words:
    dictionary.append({'id': counter, 'word': line.strip()})
    counter += 1
 
words.close()
 
@app.route('/')
def index():
    global cached
 
    if not cached:
        cached = jsonify({'records': dictionary})
 
    return cached
 
if __name__ == '__main__':
    app.run(debug=True, port=9999)

Our REST interface replies to only one URL and accepts no parameters. It’s not exciting, but it does the job of transmitting data to HTTP requests. With this “service” running in the background somewhere, we can continue with our task of turning that output into a Postgres table.

The next thing we need to do is make use of the Multicorn Python libraries. In this arena, Python setuptools are our friend. With them, we can write a Python library that imports Multicorn and it will automatically install everything in a compatible location. Here’s a bare-bones setup.py file that might work:

import subprocess 
from setuptools import setup, find_packages, Extension
 
setup(
  name='dictionaryfdw',
  version='0.0.1',
  author='Shaun Thomas',
  license='Postgresql',
  packages=['fdw']
)

That takes care of actually installing the project library files. Next we need to actually use the Multicorn API. As a pure foreign data wrapper API, it has a lot more capability than we actually need. By default it can represent direct access to CSV files, filesystem objects, RSS feeds, and a bevy of other cute implementations.

Unfortunately none of these use cases apply to wrapping up /usr/share/dict/words. Multicorn can interact with authentication, pass predicates to help pre-optimize matches, and has several other capabilities we simply won’t leverage here. We just need one table from one file, with very little in the way of parsing.

To that end, we only need to define a single execute method in a class derived from ForeignDataWrapper. With that in place, Python has a very capable protocol request handler we can leverage to actually interact with our REST API. It provides everything we need to get the data and convert the HTTP response from JSON into Python dicts that represent each row of the dictionary contents.

There isn’t much exciting here:

import requests
from multicorn import ForeignDataWrapper
 
class DictionaryFDW(ForeignDataWrapper):
 
    def execute(self, quals, columns):
        response = requests.get('http://localhost:9999/')
        data = response.json()
 
        for row in data['records']:
            yield row

To install it, we merely invoke setup.py:

sudo python setup.py install

Everything is now in place. As a Python library, Multicorn has access to other Python tools installed on the system. This means the primary option to the wrapper will be a named Python class. So long as that class is compatible, we will have a functional Postgres extension.

It just so happens that Multicorn implements this at the SERVER level. This makes sense as each server generally has its own defined interaction model. The CSV server would take filenames as parameters for foreign tables, RSS feeds would need the URL for each external resource, etc. Our wrapper is somewhat less… sophisticated. It takes no parameters and offers no advanced functionality. As such, once we define the server that invokes the library, we’re basically done.

Here’s the code that would create our dictionary table and a sample invocation:

CREATE EXTENSION multicorn;
CREATE SERVER multicorn_srv
       FOREIGN DATA WRAPPER multicorn
       OPTIONS (wrapper 'fdw.DictionaryFDW');
 
CREATE FOREIGN TABLE dictionary (
  id    INT,
  word  VARCHAR
) SERVER multicorn_srv;
 
\timing ON
 
SELECT * FROM dictionary
 LIMIT 5 OFFSET 9000;
 
  id  |    word    
------+------------
 9001 | Lubbock
 9002 | Lubumbashi
 9003 | Lucas
 9004 | Luce
 9005 | Luce's
(5 rows)
 
Time: 167.939 ms

It works! Beyond that, notice how slow the results are. Though there are only about 100k rows, even fetching a mere handful takes an exorbitant amount of time. This happens for a few reasons.

First consider that this is an external API call. Each request must trigger an HTTP request, parse the results, and then encode and return each row. While we know what our REST process does, others may not be so forthcoming and are essentially black boxes. At least some of the delay is due to whatever work the REST service performs on its end. Then we have the unfortunate fact that, bless its heart, Python is generally a dog’s breakfast when it comes to performance.

Yum!

We could fix some of these problems. Multicorn does have the ability to pass along predicates after all. A more advanced library could implement basic regular expression, equality, or other logic to pass along WHERE clauses like these:

EXPLAIN ANALYZE
SELECT * FROM dictionary
 WHERE word LIKE '%house%';
 
                         QUERY PLAN
------------------------------------------------------------
 FOREIGN Scan ON dictionary
     (cost=20.00..20000000000.00 ROWS=100000000 width=36)
     (actual TIME=164.616..326.678 ROWS=189 loops=1)
   FILTER: ((word)::text ~~ '%house%'::text)
   ROWS Removed BY FILTER: 98982
 Planning TIME: 0.248 ms
 Execution TIME: 326.896 ms

While in our case Postgres manually removed all inapplicable matches, a more advanced wrapper could transform these into parameters to the API call itself. In such a case, the REST call might return a much smaller (and faster) row subset. Barring that, there’s another way to cheat the system: materialized views!

We have, after all, already used these to represent regularly updated external snapshots. If we treat the API like a rolling data window, we can just refresh a materialized view with its contents and enjoy direct manipulation of instantiated contents. That means indexes and all the trimmings! A major benefit to this approach over spending all of our time optimizing our wrapper, is that we don’t have to duplicate existing Postgres functionality.

We would naturally want to eventually fix the more obvious inefficiencies in our wrapper interface. Yet until that happens—and even afterwards, since performance enhancements are cumulative—we can do something like this:

CREATE EXTENSION pg_trgm;
 
CREATE MATERIALIZED VIEW mv_dictionary AS
SELECT * FROM dictionary;
 
ANALYZE mv_dictionary;
 
CREATE INDEX dictionary_word_trigram
    ON mv_dictionary USING GIST (word gist_trgm_ops);
 
EXPLAIN ANALYZE
SELECT * FROM mv_dictionary
 WHERE word LIKE '%house%';
 
                         QUERY PLAN
------------------------------------------------------------
 Bitmap Heap Scan ON mv_dictionary
     (cost=3.36..30.78 ROWS=10 width=13)
     (actual TIME=6.454..6.628 ROWS=189 loops=1)
   Recheck Cond: ((word)::text ~~ '%house%'::text)
   ROWS Removed BY INDEX Recheck: 1
   Heap Blocks: exact=41
   ->  Bitmap INDEX Scan ON dictionary_word_trigram
           (cost=0.00..3.35 ROWS=10 width=0)
           (actual TIME=6.435..6.435 ROWS=190 loops=1)
         INDEX Cond: ((word)::text ~~ '%house%'::text)
 Planning TIME: 0.152 ms
 Execution TIME: 6.783 ms

One cool trick Postgres has up its sleeves is that it can use leverage quirky index types. One of these is based on word trigrams. It’s a lossy format that will inevitably return more matches than we actually want on the first iteration, but remember that Postgres will apply a post-filter to remove excess matches. We can see that in the Recheck cond line from the execution plan above, which yanked one row that didn’t match our LIKE clause. That’s a small price to pay for reducing a 326ms runtime to 6ms!

Implementing trigram logic in our Python library or REST service would be a huge headache, as would duplicating LIKE handling. Since words are most likely to have inexact and fuzzy searches, we may want to avoid the whole mess. With a materialized view, we can let Postgres do all of the complicated data manipulation while the REST API focuses on bridging the gap to our extrernal resource.

Once again, we’re left with a database that contains none of our own data. Tables we create by wrapping and caching external contents gives us a transitive scratch zone and staging area for actual analysis.

In a real-world scenario, we could only hope the REST interface and its corresponding foreign data wrapper component wouldn’t be as lazy as the one presented here. Perhaps a better alternative would be to write a full REST-specific foreign data wrapper in Go. In such a world, feed sources and parameters, column decoding, caching layers, and any number of other optimizations would be efficient and native to web-driven content. One wrapper and server combination could represent any number of foreign tables instead of ours which only works with a single service.

The options, like Postgres, are endless and varied. That’s the world foreign data wrappers give us.

PG Phriday: Stuck in the Middle with Postgres

Earlier this year, I implied Postgres was some kind of super middleware for dragging data out of every external resource it could locate. But that example only used the Postgres foreign data wrapper to contact another Postgres server. Why be so unimaginative? The future is as unlimited as it is terrifying.

Meet the new Postgres mascot

Let’s start with a few prerequisites. We want to see the combined functionality of a few different Postgres capabilities, so let’s try and combine data from MySQL, Postgres, and Javascript using V8. To do that, we need a couple of easily obtained resources. Debian-based systems can do something like this:

sudo apt-get install postgresql-9.6-plv8 postgresql-9.6-mysql-fdw

Otherwise Postgres packages are available on the Postgres site or as some other name in your favorite distribution. All we need for this is some kind of MySQL install, and the MySQL and V8 Postgres extensions. More ambitious readers can try applying these techniques to SQL Server as well.

Now imagine we have a marketing infrastructure with data strewn all over the world, in a vast assortment of incompatible or legacy apps, each with its own portion of the data we want to see. Not an ideal situation really, and one normally solved by ETL to pull all of the data into a single location for perusal. But transformation takes time, and we’re exceptionally lazy. Instead of all that extract, transform, and load action, let’s just query the data directly.

We can get started by creating a basic market MySQL table with two measly rows for demonstration purposes:

CREATE TABLE market_data (
  ext_root   VARCHAR(10) PRIMARY KEY,
  trader     VARCHAR(20) NOT NULL,
  stock      VARCHAR(10) NOT NULL,
  action     CHAR(1) NOT NULL,
  price      NUMERIC(10,4) NOT NULL,
  xact_date  TIMESTAMP
);
 
INSERT INTO market_data (ext_root, trader, stock, action, price)
       VALUES ('IZ724JJ5:3', 'bsmith', 'GOOG', 'b', 800);
INSERT INTO market_data (ext_root, trader, stock, action, price)
       VALUES ('IZ724JJ5:4', 'bsmith', 'GOOG', 's', 842);

Easy, right? The rows should automatically have a timestamp applied as if they’re part of a rapidly accumulating feed. This data isn’t normalized because that would greatly complicate the data model. For the purposes of this activity, that’s not really necessary. If it helps, imagine this is just a view on top of all of the relevant tables and flattens the data to something an external resource can easily consume.

The next thing we need is some data in a Postgres database, because why not? Consider this on some server in a far-away land filled with unicorns and butterflies cavorting with innocent children in a field of daisies. It’s definitely not in our datacenter!

CREATE TABLE trade_calc (
  ext_root  VARCHAR(10) PRIMARY KEY,
  stdev     NUMERIC NOT NULL,
  mean      NUMERIC NOT NULL,
  delta     NUMERIC NOT NULL
);
 
INSERT INTO trade_calc VALUES ('IZ724JJ5:3', 13.7, 825, 11.3);
INSERT INTO trade_calc VALUES ('IZ724JJ5:4', 8.5, 832, 1.5);

In this case, the table contains supplementary information regarding the trade. Maybe an application is generating various calculations derived from the trade and storing it in a local resource for later consumption. Regardless, we’ve retained the external root identifier used to tie the rows together, which is a common feature of market data.

The next step is to introduce some magical binding substance of some kind.

Why not?

We’ll be installing all of these wonderful resources in a Postgres database dedicated specifically for that purpose. One with user accounts and grants in a centralized location where everyone and his pet fish can access and aggregate data stored there. The MySQL portion of our tentacle would look and function something like this:

CREATE EXTENSION mysql_fdw;
 
CREATE SERVER mysql_market
  FOREIGN DATA WRAPPER mysql_fdw
  OPTIONS (host 'localhost');
 
CREATE USER MAPPING FOR postgres
  SERVER mysql_market
  OPTIONS (username 'root');
 
CREATE SCHEMA mysql;
 
IMPORT FOREIGN SCHEMA market LIMIT TO (market_data)
  FROM SERVER mysql_market
  INTO mysql;
 
SELECT * FROM mysql.market_data;
 
  ext_root  | trader | stock | action |  price   |      xact_date      
------------+--------+-------+--------+----------+---------------------
 IZ724JJ5:3 | bsmith | GOOG  | b      | 800.0000 | 2017-03-10 15:09:32
 IZ724JJ5:4 | bsmith | GOOG  | s      | 842.0000 | 2017-03-10 15:09:45

One foreign MySQL resource allocated and tested. Check!

Without a readily available node.js or REST interface to contact, we can simulate polling some kind of JSON resource with the PLV8 language. This a good way to demonstrate tying elements to Postgres using something other than SQL. It also gives us a way to grab data for situations where no foreign data wrapper exists and we’re not confident enough to create one.

In this case, we’re calling a V8 function which gets the data through some dubious means and sends it back to us:

CREATE EXTENSION plv8;
 
CREATE SCHEMA js;
SET search_path TO js;
 
CREATE TYPE clearing_data AS (
  ext_root   VARCHAR(10),
  market     VARCHAR(10),
  bank       VARCHAR(10),
  is_funded  BOOLEAN
);
 
CREATE OR REPLACE FUNCTION get_clearing_info()
RETURNS SETOF clearing_data AS
$$
  plv8.return_next({
    "ext_root": "IZ724JJ5:3", "market": "NASDAQ",
    "bank": "Chase", "is_funded": FALSE
  });
  plv8.return_next({
    "ext_root": "IZ724JJ5:4", "market":
    "NASDAQ", "bank": "Citi", "is_funded": TRUE
  });
$$ LANGUAGE plv8;
 
CREATE VIEW all_clearing_data AS
SELECT * FROM get_clearing_info();
 
SELECT * FROM js.all_clearing_data;
 
  ext_root  | market | bank  | is_funded 
------------+--------+-------+-----------
 IZ724JJ5:3 | NASDAQ | Chase | f
 IZ724JJ5:4 | NASDAQ | Citi  | t

Success! Hard-coded data isn’t exactly robust, but demos are open season, and hey look over there! In a real scenario, we’d probably have a function that accepted parameters and performed a pull from some external resource. The view is just a way of hiding the function call for users who might have their head explode to know they can select rows from a function.

Lastly we need to contact that Postgres server out in the magical field of unicorns, butterflies, and candy canes. What, I didn’t mention candy last time? Well there’s candy now. Deal with it.

CREATE EXTENSION postgres_fdw;
 
CREATE SERVER pgsql_market
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (dbname 'market', host 'localhost');
 
CREATE USER MAPPING FOR postgres
  SERVER pgsql_market
  OPTIONS (USER 'postgres');
 
CREATE SCHEMA pgsql;
 
IMPORT FOREIGN SCHEMA PUBLIC LIMIT TO (trade_calc)
  FROM SERVER pgsql_market
  INTO pgsql;
 
SELECT * FROM pgsql.trade_calc;
 
  ext_root  | stdev | mean | delta 
------------+-------+------+-------
 IZ724JJ5:3 |  13.7 |  825 |  11.3
 IZ724JJ5:4 |   8.5 |  832 |   1.5

Essentially that was the same process we used with MySQL, except this time we used the Postgres FDW instead. Either way, it works and returns data exactly as expected.

The final step is to slap on the duct tape and invoke everything at once:

SELECT m.*, t.*
  FROM mysql.market_data m
  JOIN pgsql.trade_calc t USING (ext_root)
  JOIN js.all_clearing_data c USING (ext_root);
 
  ext_root  | trader | stock | action |  price   |      xact_date      |  ext_root  | stdev | mean | delta 
------------+--------+-------+--------+----------+---------------------+------------+-------+------+-------
 IZ724JJ5:3 | bsmith | GOOG  | b      | 800.0000 | 2017-03-10 15:09:32 | IZ724JJ5:3 |  13.7 |  825 |  11.3
 IZ724JJ5:4 | bsmith | GOOG  | s      | 842.0000 | 2017-03-10 15:09:45 | IZ724JJ5:4 |   8.5 |  832 |   1.5

The real power here is in combining this with the external resource gathering techniques we explored in the previously mentioned article. Is the external REST call way too slow? Slap on a materialized view and index the crap out of it. Have a super stable API for an internally developed resource? Write a foreign data wrapper so Postgres can invoke it directly. Then add a materialized view for good measure. Add a refresh schedule for any resources that adversely affect performance for direct inquiries.

What we end up with is a container database that represents 0% of its own data. It’s a resource we can truncate at will, may crash with no consequences, and might exist anywhere since it’s nothing but an interface layer. A Postgres instance designed this way becomes a SQL API to access assorted data feeds through a standardized execution model. Heck, everyone can have their very own.

Were we to attempt this with our favorite language or application stack, we would probably need to write our own data joining techniques, implement set theory, incorporate hashing and sorting algorithms, and so on. In the end, we’d have written a tiny database engine to handle data combination and aggregation, without most capabilities of even a mediocre out-of-the-box solution. In this scenario, Postgres bridges that gap for us.

If Postgres has already done all the hard work, why replicate it? And isn’t that what being lazy is all about?

No coding, only nap

PG Phriday: Through the Window

Now that we know how Postgres window functions work, why not play with them a bit to get a better understanding of their capabilities? So long as we understand window functions are applied after data gathering and aggregation steps, much of their mystery and complexity is defanged. Let’s start actually using them for stuff!

Captain Murphy is tired of your nonsense

(Note: I’m a bit under the weather today, so this Phriday will probably be a bit truncated and potentially incoherent thanks to the drugs. Apologies in advance.)

Let’s start off with the same example data as last week with one tiny alteration. To properly illustrate some of these concepts, we need some actual data variance, so we’ll be using random numbers for readings instead of modulo math.

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, round(random() * 100),
       CURRENT_DATE + INTERVAL '1d' - ((s.id * 10) || 'm')::INTERVAL
  FROM generate_series(1, 5000) s(id);
 
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

First things first: ordering matters. Window functions are either applied over the whole data-set or some partition of it. They’re also cumulative in nature, meaning we can change the aggregation results by altering the window itself. To better see what this means, consider these two sums:

SELECT location, reading, SUM(reading) OVER ()
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE
 ORDER BY location
 LIMIT 10;
 
 location | reading | SUM  
----------+---------+------
 1        |      12 | 7058
 10       |      26 | 7058
 100      |      98 | 7058
 101      |      99 | 7058
 102      |      46 | 7058
 103      |      84 | 7058
 104      |      60 | 7058
 105      |      35 | 7058
 106      |      58 | 7058
 107      |       6 | 7058
 
SELECT location, reading, SUM(reading) OVER (ORDER BY location)
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE
 ORDER BY location
 LIMIT 10;
 
 location | reading | SUM 
----------+---------+-----
 1        |      12 |  12
 10       |      26 |  38
 100      |      98 | 136
 101      |      99 | 235
 102      |      46 | 281
 103      |      84 | 365
 104      |      60 | 425
 105      |      35 | 460
 106      |      58 | 518
 107      |       6 | 524

Interestingly, we actually learned two things here. First, window functions ignore LIMIT clauses. That sum reflects every reading in the table for today. Second is that ordering by the sensor location resulted in a cumulative total for the sum for each row. This applies to all of the available window functions. A maximum would reflect the current maximum for each row until a higher value replaces it. An average is a cumulative average as values are processed, and so on.

This isn’t particularly useful for sums, but what if we wanted to watch our data converge upon the overall average?

SELECT location, reading,
       round(avg(reading) OVER (ORDER BY location), 2) AS running_avg,
       round(reading - 
           avg(reading) OVER (ORDER BY location), 2) AS variance
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE
 ORDER BY location
 LIMIT 10;
 
 location | reading | running_avg | variance 
----------+---------+-------------+----------
 1        |      12 |       12.00 |     0.00
 10       |      26 |       19.00 |     7.00
 100      |      98 |       45.33 |    52.67
 101      |      99 |       58.75 |    40.25
 102      |      46 |       56.20 |   -10.20
 103      |      84 |       60.83 |    23.17
 104      |      60 |       60.71 |    -0.71
 105      |      35 |       57.50 |   -22.50
 106      |      58 |       57.56 |     0.44
 107      |       6 |       52.40 |   -46.40

If we monitored those kinds of results second by second, we could flag any kind of appreciable jitter from the average as currently represented by the data. It may be completely expected for values to increase through the day, so the total average for all data is meaningless to us. Ordering allows us to control the aggregate’s construction in a way that isn’t really possible using other methods, barring some kind of recursive CTE magic.

Postgres’ new slogan

With ordering out of the way, there’s the small detail of data partitioning as well. The manual goes into much more detail about this, but it’s just a fancy way of saying GROUP BY within the window itself.

SELECT location, reading,
       SUM(reading) OVER (PARTITION BY location) AS all_readings,
       rank() OVER (PARTITION BY location ORDER BY reading)
  FROM sensor_log
 WHERE location::INT <= 10
 LIMIT 10;
 
 location | reading | all_readings | rank 
----------+---------+--------------+------
 0        |      32 |          260 |    1
 0        |      35 |          260 |    2
 0        |      37 |          260 |    3
 0        |      71 |          260 |    4
 0        |      85 |          260 |    5
 1        |      12 |          173 |    1
 1        |      14 |          173 |    2
 1        |      17 |          173 |    3
 1        |      44 |          173 |    4
 1        |      86 |          173 |    5

Again, we can learn a few different things from these results. First is that the window results are restricted to the partition we declared. We set the partition to limit sums, ranks, and other window functions to the domain of the location. Postgres will apply window aggregates specifically to each location as it appears in the query output.

Next consider that we partitioned the sum, but did not order it. In case it wasn’t obvious already, this tells us that PARTITION and ORDER BY clauses are independent, as are the window definitions. In this case, we want the reading total for each location, followed by the rank of each reading within that group. If we had ordered the sum window, we would have gotten a running total instead of an overall value. On the other hand, we want the rank to behave in an ordered manner.

Separate windows, separate effects. Of course, we may not actually want that to happen. If we end up using the same window over and over again, it doesn’t make sense to declare it for each column. Thankfully Postgres has a shorthand for that:

SELECT location, reading,
       SUM(reading) OVER (PARTITION BY location) AS all_readings,
       SUM(reading) OVER locs AS running_total,
       rank() OVER locs
  FROM sensor_log
 WHERE location::INT <= 10
WINDOW locs AS (PARTITION BY location ORDER BY reading)
 LIMIT 10;
 
 location | reading | all_readings | running_total | rank 
----------+---------+--------------+---------------+------
 0        |      32 |          260 |            32 |    1
 0        |      35 |          260 |            67 |    2
 0        |      37 |          260 |           104 |    3
 0        |      71 |          260 |           175 |    4
 0        |      85 |          260 |           260 |    5
 1        |      12 |          173 |            12 |    1
 1        |      14 |          173 |            26 |    2
 1        |      17 |          173 |            43 |    3
 1        |      44 |          173 |            87 |    4
 1        |      86 |          173 |           173 |    5

Perfect! Not only did we use the same window twice, but we were able to mix it with another direct declaration within the same query. That isn’t exactly ground-breaking territory, but it’s reassuring when things work as expected.

Keep in mind that the same window function limitations apply to this syntax as well. Since windows are evaluated after WHERE, GROUP BY, or HAVING clauses, the WINDOW portion of a query must appear after them as well.

Now go forth and leverage window functions to produce reports without awkward subqueries and CTEs muddying up the works! The only better way to use a window is to cool pies.

Mmmmm… pie!

PG Phriday: In the Window

I’ll be the first to admit that I found Postgres window functions fantastically confusing when I first encountered them. They’re a powerful and versatile tool for building reports and summaries, but that functionality hides behind a fairly steep learning curve. One of the ways to combat their inherent complexity is to fully explore how they work, instead of just trying to wrap our heads around the expected results.

Window doggies have gotten decidedly smug

To that end, let’s set up a quick set of data in the customary fashion:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE + INTERVAL '1d' - ((s.id * 10) || 'm')::INTERVAL
  FROM generate_series(1, 5000) s(id);
 
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

Yes, it’s the trusty sensor_log table once again. This time around, we only really care about demonstration in lieu of volume, so we’ve elected for five thousand rows in place of the usual five million. Our data represents one thousand sensors sequentially taking readings every minute. This should provide enough overlap to easily demonstrate what’s going on behind the scenes.

Let’s start with probably the easiest window function of the lot: row_number. All it does is number rows in the result set so we have a kind of counter that’s useful in a lot of different contexts. How do the first ten rows for today look?

SELECT location, reading, ROW_NUMBER() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 143      |      43 |          1
 142      |      42 |          2
 141      |      41 |          3
 140      |      40 |          4
 139      |      39 |          5
 138      |      38 |          6
 137      |      37 |          7
 136      |      36 |          8
 135      |      35 |          9
 134      |      34 |         10

Window functions must be called on some kind of data window. An empty set of () represents the entire data set, with no ordering, groups, or other shenanigans involved. We’re just numbering the results, and the output would have been no different if we removed the window function. This kind of use is very similar to Oracle’s ROWNUM pseudo-column.

Yet a row number by itself isn’t that interesting. Let’s number the rows in order of sensor reading, and fetch the first ten rows of those results:

SELECT location, reading, ROW_NUMBER() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 100      |       0 |          1
 101      |       1 |          2
 1        |       1 |          3
 102      |       2 |          4
 2        |       2 |          5
 103      |       3 |          6
 3        |       3 |          7
 4        |       4 |          8
 104      |       4 |          9
 105      |       5 |         10

If () was the window representing all rows, then (ORDER BY reading) is that same content after being sorted by the reading column. Not only did Postgres sort our results, but it numbered them in the post-sorted order. This is a very fine distinction! Consider what happens when we move the ORDER BY clause into the query proper.

SELECT location, reading, ROW_NUMBER() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 100      |       0 |         44
 1        |       1 |        143
 101      |       1 |         43
 2        |       2 |        142
 102      |       2 |         42
 3        |       3 |        141
 103      |       3 |         41
 104      |       4 |         40
 4        |       4 |        140
 105      |       5 |         39

What the heck happened here? The data looks exactly the same, but the artificial row numbers are seemingly arbitrary. Indeed they are! By design. This is part of the reason window functions are so difficult to explain and comprehend. The fact of the matter is that each window is a virtual and separate manifestation of the plain query results.

Anyone who has struggled with pointers in C or C++ know that abstracted structures introduce certain pitfalls into obtaining desired results.

Window functions and you!

To help unravel the mystery a bit, let’s look at the natural state of the results without any window function nonsense. Of course we must also shift the data by 40 rows so we can see some of the same information the window received.

SELECT location, reading
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10 OFFSET 40;
 
 location | reading 
----------+---------
 103      |       3
 102      |       2
 101      |       1
 100      |       0
 99       |      99
 98       |      98
 97       |      97
 96       |      96
 95       |      95
 94       |      94

This output represents the table rows as sorted by the index we created on reading_date. Since these results are not artificially sorted in any way, this what the window function is actually seeing without its own specific sort operation. We shifted the results by 40 rows and as expected, row 44 has the value of 0. The window function gave us exactly what we requested, but it numbered the rows before Postgres sorted them.

We can actually watch this in action by looking at the two query plans:

EXPLAIN
SELECT location, reading, ROW_NUMBER() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;
 
                              QUERY PLAN
----------------------------------------------------------------------
 LIMIT  (cost=15.91..16.08 ROWS=10 width=11)
   ->  WindowAgg  (cost=15.91..18.41 ROWS=143 width=11)
         ->  Sort  (cost=15.91..16.27 ROWS=143 width=11)
               Sort KEY: reading
               ->  INDEX Scan USING idx_sensor_log_date ON sensor_log
                     (cost=0.29..10.79 ROWS=143 width=11)
                   INDEX Cond: (reading_date > ('now'::cstring)::DATE)
 
EXPLAIN
SELECT location, reading, ROW_NUMBER() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;
 
                              QUERY PLAN
----------------------------------------------------------------------
 LIMIT  (cost=15.67..15.69 ROWS=10 width=11)
   ->  Sort  (cost=15.67..16.03 ROWS=143 width=11)
         Sort KEY: reading
         ->  WindowAgg  (cost=0.29..12.58 ROWS=143 width=11)
               ->  INDEX Scan USING idx_sensor_log_date ON sensor_log
                     (cost=0.29..10.79 ROWS=143 width=11)
                   INDEX Cond: (reading_date > ('now'::cstring)::DATE)

Note that the WindowAgg step occurs at the end of query execution in the first example, taking place directly after a sort operation. The second query sorts after the WindowAgg, indicating the window only has access to unsorted rows. The key detail is that window functions only have access to rows within the query as if it had executed without them. This also happens after other aggregates are applied, meaning it’s a bad idea (or even impossible) to mix regular aggregates with window functions.

The easiest way to comprehend how a window function works is to run the query without them. That’s the data the window has access to, regardless of how we slice and dice them within the window itself. It also explains why we’re unable to refer to window function elements in other query clauses. They’re unusable in predicates and we can’t leverage their calculations to group or limit results.

So imagine for a moment we don’t have the standard Postgres LIMIT clause. This is how we would snag the top ten results of our location readings:

SELECT *
  FROM (SELECT location, reading,
               ROW_NUMBER() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE ROW_NUMBER <= 10;
 
 location | reading | ROW_NUMBER 
----------+---------+------------
 100      |       0 |          1
 101      |       1 |          2
 1        |       1 |          3
 102      |       2 |          4
 2        |       2 |          5
 103      |       3 |          6
 3        |       3 |          7
 4        |       4 |          8
 104      |       4 |          9
 105      |       5 |         10

This is clearly silly when using row_number, but the trick works the same with other window functions. Here’s how we’d obtain the 10th ranked readings for today’s data:

SELECT *
  FROM (SELECT location, reading,
               dense_rank() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE dense_rank = 10;
 
 location | reading | dense_rank 
----------+---------+------------
 109      |       9 |         10
 9        |       9 |         10

Keep in mind that Postgres must fetch the full results internally to materialize them for the window functions. This is true whether there are 143 rows as with our example, or 143-million.

The more advanced use cases for window functions are a topic for another day. Consider this a very high-level introduction to how they work and their inherent limitations instead of a comprehensive guide. There’s a lot of material here that deserves closer inspection, so there’s no need to rush. Either way, don’t let window functions confuse you more than necessary. Like any independent agent, you just need to know what they’re doing behind the scenes.

Like this, but with slightly fewer bombs

PG Phriday: Getting Assertive

There are a lot of languages available for authoring Postgres functions, but there’s nothing quite like the the classic PL/pgSQL. It’s SQL! It’s not SQL! It’s a kind of horrifying mutant crossbreed suitable only for terrifying small children and generating complex reports from a cavalcade of dubious sources! And deep within its twisted entrails is an often overlooked feature usually only available in far more mature entities.

Just wait until they add sharding

That’s right, it’s obvious we’re referring to the ASSERT statement.

When we say “often overlooked”, we’re not kidding. The pertinent documentation offers a meager overview following a far more comprehensive summary of the RAISE statement. We thought it deserved better.

Let’s begin with a simple table example that can form the basis of a more complex function:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
ANALYZE sensor_log;

What can we do with five million rows of simulated data from 1000 sensors that collect data every ten seconds? A good place to start is a summary or report table of some kind. We’re only doing cursory analysis, so we should begin with aggregating the daily average, minimum, and maximum. If we get adventurous later, we can add standard deviations, jitter, deltas, and other fancy logic report owners might find useful.

Since this type of data is easy to tabulate, we require one single table and a function to manage its contents.

CREATE TABLE sensor_report (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading_date  DATE NOT NULL,
  avg_reading   BIGINT NOT NULL,
  min_reading   BIGINT NOT NULL,
  max_reading   BIGINT NOT NULL
);
 
CREATE INDEX idx_sensor_report_date ON sensor_report (reading_date);
 
CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.
 
  SELECT INTO last_update MAX(reading_date)
    FROM sensor_report;
 
  -- Bootstrap the report table if it's currently empty.
 
  IF last_update IS NULL THEN
    SELECT INTO last_update MIN(reading_date)
      FROM sensor_log;
  END IF;
 
  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.
 
  DELETE FROM sensor_report
   WHERE reading_date >= last_update;
 
  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.
 
  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), MIN(reading), MIN(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT COUNT(*) INTO insert_count
    FROM capture;
 
   RETURN insert_count;
END;
$$ LANGUAGE plpgsql;
 
SELECT sp_trickle_report();
 
 sp_trickle_report 
-------------------
            579000
 
ANALYZE sensor_report;

Thankfully this is a very simple (and somewhat naive) reporting function. Imagine this is our first iteration and we want to debug some of its activity. If we were using Python for example, we might pepper the code with assert statements or craft specific tests accomplish that same goal. Well PL/pgSQL allows us to do just that.

For the second iteration, let’s assume we want to avoid bootstrapping the report table if it’s empty. After all, bootstrapping only needs to be done once, so why pollute the function with that edge case? Besides, it’s entirely possible we might require a separate procedure to initialize the reporting table, and isolated functionality is generally less buggy.

If the table isn’t bootstrapped, it’s probably because we’re in a testing environment that is missing data. In this case, it makes sense to complain so the missing data is loaded, even though it’s not strictly an erroneous scenario. So let’s replace the bootstrap with an assertion. While we’re at it, let’s say a new requirement from the upstream code suggests a minimum of 2000 sensors are required for meaningful output. In such a case, we might need to revise our fixture data in the development and QA environments.

Here’s how that might look:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.
 
  SELECT INTO last_update MAX(reading_date)
    FROM sensor_report;
 
  ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';
 
  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.
 
  DELETE FROM sensor_report
   WHERE reading_date >= last_update;
 
  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.
 
  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), MIN(reading), MIN(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT COUNT(*) INTO insert_count
    FROM capture;
 
  ASSERT insert_count >= 2000, 'Not enough current sensor activity!';
 
  RETURN insert_count;
END;
$$ LANGUAGE plpgsql;

The function itself isn’t that much different; the important part is the result of our changes. The only truly meaningful modification is that the function no longer initializes the report table if it was empty. Ignoring that, we can treat the function exactly as before, as no output will change, and no new failure conditions are triggered.

This is due to the fact that assertions are ignored by default. Remember, they’re primarily intended for debugging purposes, so they must be explicitly enabled. We can do this by making an entry in postgresql.conf, but we can also activate them within our current session.

Let’s use our new assertions:

SET plpgsql.check_asserts = TRUE;
 
TRUNCATE TABLE sensor_report;
 
SELECT sp_trickle_report();
 
ERROR:  Need TO bootstrap report TABLE.
CONTEXT:  PL/pgSQL FUNCTION sp_trickle_report() line 12 at ASSERT
 
INSERT INTO sensor_report (
         location, reading_date, avg_reading,
         min_reading, max_reading
       )
SELECT location, reading_date::DATE,
       avg(reading), MIN(reading), MIN(reading)
  FROM sensor_log
 GROUP BY location, reading_date::DATE;
 
ANALYZE sensor_report;
 
SELECT sp_trickle_report();
 
ERROR:  NOT enough CURRENT sensor activity!
CONTEXT:  PL/pgSQL FUNCTION sp_trickle_report() line 38 at ASSERT
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 5000, s.id % 100,
       CURRENT_DATE - ((s.id * 1) || 's')::INTERVAL
  FROM generate_series(1, 50000) s(id);
 
SELECT sp_trickle_report();
 
 sp_trickle_report 
-------------------
              5000

We begin by truncating the report table so we can witness the first assertion. As we can see, Postgres correctly complains that the table is empty when it shouldn’t be. Since this is a true ERROR, execution immediately halts where the assertion failed. We see that error and use some alternative method for filling the report table, and try the function again.

Stop! Assert time!

The second execution fails because the sensor_log table only contains data based on 1000 sensors, yet we expect 2000. Besides telling us to update our fixture data to account for more sensors, we also learn what the function expects. So we decide to insert some fake data from 5000 sensors and try again. The last attempt works as expected, and we’re free to move on with our lives.

It seems odd initially that a formerly ignored statement gets escalated all the way to a fatal error, doesn’t it? But how else should assertions work? If we just needed information, we could have simply used RAISE NOTICE with the relevant details. The power of ASSERT is that we don’t need to remove the statements to deactivate them.

The reason assertions are fatal is because they raise an ASSERT_FAILURE exception. We can actually take advantage of that in debugging. We can catch that exception just like any other, meaning we can ignore it or escalate as necessary. Consider this example:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.
 
  SELECT INTO last_update MAX(reading_date)
    FROM sensor_report;
 
  ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';
 
  -- Since data comes in regularly, we only need to rebuild
  -- current data. Delete old aggregates so we can replace them.
 
  DELETE FROM sensor_report
   WHERE reading_date >= last_update;
 
  -- Perform the insert in a CTE so we can capture the count of
  -- inserted rows for further analysis or to return to our caller.
 
  WITH capture AS (
    INSERT INTO sensor_report (
             location, reading_date, avg_reading,
             min_reading, max_reading
           )
    SELECT location, reading_date::DATE,
           avg(reading), MIN(reading), MIN(reading)
      FROM sensor_log
     WHERE reading_date >= last_update
     GROUP BY location, reading_date::DATE
     RETURNING *
  )
  SELECT COUNT(*) INTO insert_count
    FROM capture;
 
  ASSERT insert_count >= 2000, 'Not enough current sensor activity!';
 
  RETURN insert_count;
 
EXCEPTION
  WHEN ASSERT_FAILURE THEN
    RAISE NOTICE 'Tripped over debugging code: %', SQLERRM;
    RETURN 0;
END;
$$ LANGUAGE plpgsql;

The only thing we added was a single exception block to catch any exceptions thrown within the function. Then we toned down the error slightly while maintaining the original assertion message, and return 0 from the function since it didn’t process any rows due to the exception.

We can see this for ourselves by truncating the report table again:

SET plpgsql.check_asserts = TRUE;
 
TRUNCATE TABLE sensor_report;
 
SELECT sp_trickle_report();
 
NOTICE:  Tripped OVER debugging code: Need TO bootstrap report TABLE.
 sp_trickle_report 
-------------------
                 0

Now the function itself doesn’t fail or introduce an error into any transaction contexts we might have open, but we can still see something is wrong. The function claims to have inserted no rows, and we see a notice specifically from our own debugging system. In the production environment, this code would be completely silent and the function would operate without the overly oppressive assertions.

Is ASSERT overlooked because there are more suitable approaches to solve these problems? Or is it because the feature was added in 9.5, and it’s extremely easy to miss in the deluge of other headline-grabbing advancements? Honestly it’s probably a combination of those two and a myriad of other causes. The important thing is that we know about it now.

And as we are all well aware…

G.I. Joe!