PG Phriday: Bodacious Benchmarks

September 23rd, 2016 | Published in Database, Tech Talk | No Comments


When it comes to putting Postgres through its paces, we often turn to benchmarks to absolutely bury it under a torrent of oppressive activity. It’s a great way to obtain maximum performance metrics and also observe how Postgres reacts and breaks down under such pressure. But these kinds of tests aren’t really practical, are they? After all, many such simulated workloads are nothing but bragging rights measured against previous Postgres releases, or for hardware comparisons. But while functionality beyond defaults is often overlooked, tools like pgbench are actually critical to the development process.

I’ve used pgbench frequently in the past for examples in these articles. It’s much like Apache’s own ab for bombarding a server with various workloads, except pgbench must bootstrap by creating a set of test tables. Or does it? What if we have our own existing tables from a project that we want to measure, either for a hardware migration, or for scalability concerns?

Let’s see how we can utilize pgbench with our trusty sensor_log table. It’s not really designed to experience any updates, so we need something else as well. Let’s say the sensor log is an endpoint capture for a very busy sensor table that is only updated when sensors detect a change in the environment. That should provide us with plenty of opportunities for contention!

CREATE TABLE sensor (
  sensor_id      SERIAL PRIMARY KEY NOT NULL,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  modified_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor (location, reading, modified_date)
SELECT s.id, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id % 60 || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE INDEX idx_sensor_location
    ON sensor (location);
 
ANALYZE sensor;
 
CREATE TABLE sensor_log (
  sensor_id     BIGINT NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
CREATE UNIQUE INDEX udx_sensor_log_sensor_id_reading_date
    ON sensor_log (reading_date, sensor_id);
 
CREATE OR REPLACE FUNCTION f_enforce_modified()
RETURNS TRIGGER 
LANGUAGE plpgsql AS
$$
BEGIN
  NEW.modified_date = now();
  RETURN NEW;
END;
$$;
 
CREATE TRIGGER t_sensor_timestamp_b_iu
BEFORE INSERT OR UPDATE
    ON sensor
   FOR EACH ROW
       EXECUTE PROCEDURE f_enforce_modified();
 
CREATE OR REPLACE FUNCTION f_capture_sensor_changes()
RETURNS TRIGGER 
LANGUAGE plpgsql AS
$$
BEGIN
  INSERT INTO sensor_log (sensor_id, location, reading, reading_date)
  VALUES (NEW.*);
 
  RETURN NULL;
END;
$$;
 
CREATE TRIGGER t_sensor_changes_a_iu
 AFTER INSERT OR UPDATE
    ON sensor
   FOR EACH ROW
       EXECUTE PROCEDURE f_capture_sensor_changes();

We’ve bootstrapped the sensor table with a relatively substantial amount of data. Five million rows isn’t really much these days, but it’s more than enough for demonstration purposes on a laptop or VM. With that said, there are a few things we can test with this basic stable structure. But what to analyze?

Let’s begin with a basic custom script that just emulates a simple update schedule with sensor readings coming in constantly from any random location. It’s a good place to start and allows us to tweak further. Note that this script only works with 9.6 and above, but previous iterations have an equivalent.

sensor-bench-1.sql
\SET sid random(1, 100000 * :scale)
\SET val random(1, 100)
 
SELECT * FROM sensor
 WHERE sensor_id = :sid;
 
UPDATE sensor SET reading = :val
 WHERE sensor_id = :sid;
 
SELECT MAX(reading) FROM sensor_log
 WHERE sensor_id = :sid
   AND reading_date > CURRENT_DATE - INTERVAL '5 minutes';

This script does a few things for us. Taken as a whole, we just grab a random entry from the sensor table, capture its current reading, and see if that affected the aggregate high over the last five minutes. This is something an application might do as part of an analysis cycle. The real magic starts when we actually execute the script with pgbench.

This is actually pretty cool, and full of valuable information. We’re also going to utilize a trick so we don’t taint our existing data. It’s not something recommended for production systems, but it’s perfect for QA or development environments.

createdb sensor_test -T sensors
pgbench --scale 50 --report-latencies --no-vacuum \
        --file sensor-bench-1.sql --time 30 sensor_test
 
transaction type: sensor-bench-1.sql
scaling factor: 50
query mode: simple
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 9200
latency average: 3.261 ms
tps = 306.636514 (including connections establishing)
tps = 306.672074 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.003  \set sid random(1, 100000 * :scale)
         0.001  \set val random(1, 100)
         0.291  SELECT * FROM sensor
         1.245  UPDATE sensor SET reading = :val
         1.713  SELECT max(reading) FROM sensor_log

The first command created a new database named sensor_test based on the sensors database that contained both of our sample tables, and includes any tables and data that already existed. This means we can run as many tests as we want, throw everything away, and start fresh as many times as we need. It takes a bit of time to copy data, so there is probably a limit to where this becomes difficult to justify, but small and medium data sets fit this model just fine.

The pgbench command itself deserves some explanation, though we did defer to the long-format parameters to help convey what we’re actually doing.

Let’s start with the scale. By default, pgbench uses 100,000 rows as the base, and our script retains that because why not? With a scale of 50, that’s our five-million rows. What’s interesting about this is that we can actually increase possible contention by reducing the scale so it focuses modifications on a smaller portion of the data. We’ll probably find this useful later.

We disable vacuuming because pgbench only knows to vacuum its own tables, so we disable that while using custom scripts. The script statistics at the end of the report is why we included the latency report. With this information, we can focus on the part of the script causing any slowdowns. Assuming we base our scripts on real application samples, this is an invaluable source of performance metrics.

Aside from the custom file we’re using for the test itself, we also set the test duration to thirty seconds. For any real kind of analysis, we’d want to use a much larger value, on the order of tens of minutes or even hours. That would give us a much better idea of average latencies after accounting for other system activity, including write throughput to the base data files during checkpoints.

Let’s see what happens if we crank down the scale to create a “hot spot” in our table, and increase the client concurrency to force more contention:

dropdb sensor_test
createdb sensor_test -T sensors
pgbench --client 4 --report-latencies --no-vacuum \
        --log --aggregate-interval 10 \
        --file sensor-bench-1.sql --time 30 sensor_test
 
transaction type: sensor-bench-1.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 30 s
number of transactions actually processed: 32425
latency average: 3.701 ms
tps = 1080.246903 (including connections establishing)
tps = 1080.360307 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.002  \set sid random(1, 100000 * :scale)
         0.001  \set val random(1, 100)
         0.654  SELECT * FROM sensor
         1.702  UPDATE sensor SET reading = :val
         1.332  SELECT max(reading) FROM sensor_log
 
cat $(ls -t pgbench_log.* | head -n 1)
 
1474658287 13068 36026916 122319506636 1060 17761
1474658297 10252 39970824 203678243092 1568 32993
1474658307 8371 39968669 270707139663 1910 82191

There isn’t a whole lot that has changed here, but we can see a couple important bits of information. First, the update statement is notably slower than before. This is a 2-CPU VM, so that’s expected, though incoming activity eclipsing CPU count on a system is somewhat common. Despite this, the aggregate maximum did not get slower, and in fact is actually slightly faster. Clearly the update is the issue, which is expected since we saddled it with two triggers.

That last bit of output is the result of including output logging and tweaking our aggregate interval to summarize instead of reporting the results of every transaction. This data gets logged based on the current PID of the pgbench command, so we cheated a bit and used cat to print the file for the most recent benchmark. Each column has a meaning, but the last two represent minimum and maximum latency in microseconds. Based on this, we can see that latency has some pretty wild swings from a mere 1ms to over 80ms.

This tells us a couple things, actually. Our previous models never included the sensor table, just sensor_log. We can always pull the most recent reading from a sensor log, so having a lot of updates and two triggers on the base table is a lot of unnecessary overhead. What if we flipped the model slightly and emulated the current state with a view instead?

CREATE TABLE sensor_log (
  sensor_id     SERIAL NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL DEFAULT now()
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id % 60 || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE UNIQUE INDEX udx_sensor_log_sensor_id_reading_date
    ON sensor_log (sensor_id, reading_date DESC);
 
ANALYZE sensor_log;
 
CREATE VIEW v_sensor AS
SELECT DISTINCT ON (sensor_id) *
  FROM sensor_log
 ORDER BY sensor_id, reading_date DESC;

This comes with a corresponding update to our script:

sensor-bench-2.sql
\SET sid random(1, 100000 * :scale)
\SET val random(1, 100)
 
SELECT * FROM v_sensor
 WHERE sensor_id = :sid;
 
INSERT INTO sensor_log (sensor_id, location, reading)
VALUES (:sid, :sid, :val);
 
SELECT MAX(reading) FROM sensor_log
 WHERE sensor_id = :sid
   AND reading_date > CURRENT_DATE - INTERVAL '5 minutes';

And finally, we run the altered benchmark:

dropdb sensor_test
createdb sensor_test -T sensors
pgbench --client 4 --report-latencies --no-vacuum \
        --file sensor-bench-2.sql --time 30 sensor_test
 
transaction type: sensor-bench-2.sql
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 1
duration: 30 s
number of transactions actually processed: 52091
latency average: 2.304 ms
tps = 1736.124362 (including connections establishing)
tps = 1736.281647 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.002  \set sid random(1, 100000 * :scale)
         0.001  \set val random(1, 100)
         0.463  SELECT * FROM v_sensor
         1.347  INSERT INTO sensor_log (sensor_id, location, reading)
         0.480  SELECT max(reading) FROM sensor_log

Maybe it comes as no surprise, but these modifications improved performance by over 70%! A good DBA probably would have said something about the sensor table for the same reasons we listed, but it’s better to have numbers to prove it. It’s certainly logical enough to keep a table with current values and keep an audit log for changes, but we found a better model if we need to increase throughput.

Keep in mind that this is a very simple example. In a real cluster, our benchmark script would likely be much more involved, and we’d probably have several more of them for various read/write percentage scenarios. This is the kind of tooling necessary to really squeeze the database for every last drop of performance, or otherwise reveal points of contention or excessive latency swings.

If you’re not already using it, pgbench is a powerful addition to any database developer or DBA arsenal. It’s much more than its defaults might suggest!


Tags: , , , ,

PG Phriday: Working Together

September 16th, 2016 | Published in Database, Tech Talk | No Comments


There seem to be quite a few popular Postgres conferences peppering the globe these days. This year, Simon Riggs of 2ndQuadrant gave the sponsored keynote at Postgres Open. I’m not entirely sure it was intentional since it wasn’t the title of his presentation, but he uttered the words “working together to make Postgres better for everyone” at one point. The phrase “Working Together” really stood out, because that’s a significant part of what makes Postgres so great. It resonated acutely with the impetus behind the Unconference track that remains a regular fixture at PGCon.

Then Simon dropped another bomb that shocked everyone in attendance. Everywhere I looked were visions of absolute disbelief and awe. He suggested that somewhere in the twisting catacombs of 2ndQuadrant was a beta version of WordPress running on Postgres instead of MySQL. Considering its roots in PHP before there were readily available database abstraction layers, and the fact many extensions call the MySQL functions outright, this is clearly an impossibility. This is clearly a cruel hoax, and I demand proof that this particular unicorn exists for reasons completely unrelated to exorcising MySQL from my own website.

Perhaps primarily accidental, but I seem to be a regular fixture at Postgres Open since it began in 2011. Most of this is because it started in Chicago, which made it extremely convenient to attend for the first four years of its existence. Now I’m just operating on pure momentum. So what did I learn this time around? I’ve been doing this too long to learn much regarding Postgres, and short of diving into writing extensions in C, that isn’t going to change. But maybe I can offer a useful perspective on the conference itself and the talks I visited.

PGLogical – the logical replication for PostgreSQL

First on the agenda was a talk on PGLogical by Petr Jelinek, yet another 2ndQuadrant contribution. It works by interpreting the Postgres WAL stream and decoding activity into generic SQL statements that can be replayed on a remote database. Because of this decoupling from the binary contents of the transaction log, there are numerous advantages and newly available replication options:

  • Multiple data sources can transmit content to a single aggregate table. This is great for data warehouses and centralized reporting systems.
  • Stripped of version-specific binary details, replication can span Postgres versions, and thus be used to link disparate instances, or facilitate upgrades.
  • Replication is normally an all-or-nothing exercise. In parsed form, replication can be filtered by intent or target.
  • Interpreting WAL contents removes trigger overhead common to logical replication solutions.

Unfortunately, PGLogical does retain some of the limitations of being content-driven. DDL is not included for example, so table modifications must be explicitly duplicated on targets, or passed through a companion function that will pass the changes on to replicas. This particular limitation is something I don’t quite understand. Postgres is ACID compliant, and as such, table modifications must be reflected in the transaction log somewhere. I wonder what makes them resistant to decoding automatically.

Either way, extensions like this help deprecate existing solutions like Slony, Londiste, or Bucardo. They had their time, but it’s time to move on.

A look at the Elephants trunk – PostgreSQL 9.6

In this talk, Magnus Hagander spent some time ruminating on the upcoming 9.6 features he thought were particularly noteworthy. And of course, I acted as a secondary filter. Many of these deserve much closer attention, but that’s for a future article. So what’s new in 9.6?

  • Tired of “idle in transaction” connections blocking things up? Well there’s finally a timeout to kill them. Huzzah!
  • VACUUM now skips already “frozen” pages. This makes it much faster on primarily read-based tables, because unchanged data isn’t checked again. Data warehouses will rejoice.
  • Database replicas will now actually show which primary they’re following. This used to only be available by manually checking recovery.conf.
  • Synchronous replication will now allow multiple synchronous standby servers. This means it’s possible to create a pool of five replicas, and allow any two of them to acknowledge writes, for example. This is not only safer in the long run, but much less likely to disrupt the primary unless several replicas become unavailable. I might actually start using the feature now!
  • There are a lot of lock-based, date/time, allocation, and memory management performance improvements. This includes tweaks to kernel page flushing that should dramatically reduce overwhelming storage with write spikes.
  • The Postgres foreign data wrapper can now push JOIN and ORDER BY to the remote database. Previously, data would be fetched locally and then joined and sorted. This should make an incredible difference to foreign query performance.
  • And of course, who could forget parallelism of various operations? It’s still pretty new and somewhat experimental, but now that 9.6 is a release candidate, it should see a lot more field testing.

The Elephant Meets the Whale – Bringing PostgreSQL to Production on Docker

Phil Vacca may not be a Docker expert, but he was certainly thorough in examining the “can we” versus “should we” debate regarding deploying Postgres on Docker containers. Docker is one of those things I didn’t know much about aside from its existence and general usage patterns. But once Phil related a Docker container to a glorified chroot environment, everything clicked.

It turns out that Postgres being a data-driven service makes it much harder to run in a container. There’s another session that goes into more depth in this regard, but short of overloading the Postgres container itself, there must be an entire constellation of related containers churning along with it. Each one of these will need a slew of configuration file overlays, environment variables, extension layers, permissions, and volume mappings.

I get really uncomfortable around so many moving parts, or otherwise fragile implementations. It would seem that, short of including a heavy stack management infrastructure, Postgres and Docker are not a good production environment match. This is a quickly evolving area though, so that statement is likely to expire soon.

Perhaps unrelated, but can anyone tell me why everyone is chasing Docker instead of LXC? A versioned, privately directed, arbitrarily controlled, quickly deprecated moving target seems like a bad thing to depend on. Well, unless you’re the developers of Docker. I must be missing something, here.

All the Big Data Sciency stuff you never knew Postgres could do

If there’s ever a notorious speaker, it’s Jim Nasby. This talk actually seemed a tame compared to his previous forays. Though he did write a data type designed to handle Python NumPy ndarray data structures, so it’s definitely topical! In other news, Postgres now supports a Python NumPy data type.

This talk actually featured a pretty good summary of all the different ways to use Postgres to manage scientific or complex analysis:

  • JSON & associated functionality
  • FDWs
  • Extensible types (such as numpy ndarray)
  • Various replication features, including logical replicaction and BDR
  • Parallel queries
  • Distributed approaches like citusdata and Postgres-XL
  • Column store approach (cstore_fdw)
  • CPU-driven support (PGStrom)
  • A REST API in PostgREST
  • ToroDB, a MongoDB API on top of Postgres. It basically transforms JSON into multiple Postgres tables. Theoretically this means access is available via JSON or SQL. Though MongoDB sharding isn’t supported yet, so scalability is limited.
  • MADlib, a machine learning algo system via SQL. This one is especially intriguing and I need to run it past our business intelligence team. If our data can support this kind of analysis, we have several TB of data to dump on it.

Just more support that Postgres is becoming more of a data middleware than simply a database engine.

Locked Up: Advances in Postgres Data Encryption

Vibhor Kumar brings us pgcrypto, the presentation. Beyond just covering the basics of using the encryption extension, he provided some encouraging performance results of various approaches such as:

  • Encrypting the data itself with pgcrypto functions.
  • Using disk-based encryption.
  • Both of these with and without SSL.

What’s particularly interesting here, and something I never considered, is that column-based encryption completely breaks indexes. In retrospect, this should have been fairly obvious. Once data is transformed, indexes only operate on the resulting encoded information, making them effectively meaningless. Short of intricate and magical mathematical analysis of data leakage through the encryption model, this is unfixable. Which suggests that the real conclusion of this talk is: use disk-based encryption and force SSL when data must be protected. It’s hard to argue with that.

Bitemporal Data Model: making it happened in Postgres

This is one of the only sessions with two presenters. Chad Slaughter and Henrietta Dombrovskaya usage of the Postgres interval type to track historical events is compelling and insightful. It turns out, simply having an event log for bookkeeping is problematic because that makes obtaining the current summary convoluted. So they combined an event log with intervals to track modification history for a rolling summary. It basically boils down to this:

  • Start with an effective interval to reflect when data should be included in reports.
  • Add an assertive interval to track when the modifications were asserted, or applied.

This effectively decouples modifications from presentation, yet retains both. Through the use of intervals, there are definite boundaries reports can target, and through the assertive interval, we can immediately tell how long the modification was valid through the single data point.

Maintaining this does require a lot of custom enforcement mechanics, though. Postgres constraints can’t incorporate that kind of logic automatically, so it falls to triggers and helper functions to step in. Still, the concept of a composite current and historical relation is an interesting theory, and definitely worthy of further exploration.

Administering Postgres using Docker and Openshift

Jeff McCormick over at Crunchy Data gave the second Docker talk I alluded to earlier. A pretty good summary of the presentation boils down to their Docker stack. While Phil presented the why, here we got to see how. Give the rather unwieldy nature of the suite itself, there really needs to be a management element. Apparently either OpenShift or another Kubernetes-based system can fill that role, but they’re hardly free in themselves.

That’s a ton of overhead to “just” run Postgres. Of course, that kind of setup is more of a dynamic allocation model. Given a reliable backing storage source, Postgres could reliably churn along in such an abstraction layer. I like that it prevents relying on administration of the underlying OS. Platform as a Service is something I’d like to see more in practice. It’s just not something I’m going to touch with a ten foot pole; I’ll let our infrastructure department worry about coordinating such a monstrosity.

Amazon RDS for PostgreSQL: New Features and Lessons Learned

I hate to say Grant McAlister presented an hour-long commercial for Amazon RDS, but I’d be lying if I didn’t. Though as one of the primary architects, he’d probably be remiss if he didn’t.

What I came away with, is that RDS has come a long way recently. There are a lot more available metrics and tracking, performance levels, resource credit balancing, and Postgres extensions. Filesystem and snapshot encryption are standard, and encrypted snapshots can even be shared, provided they have an associated custom key. Heck, they’ve even integrated logical replication already.

For organizations that don’t need custom everything and can budget potentially paying per-minute pricing, it seems like a pretty good approach.

Elephant Herd as a Service: Managing Hundreds of Postgres Instances

My own talk was on ElepHaaS, a Postgres instance wrangler if there ever was one. I designed it to manage over 100 instances spread across multiple data centers, environments, and servers. At one point, it served a crucial role in testing our disaster recovery readiness. All I had to do was click a checkbox, hit “go”, wait a couple minutes, and about a dozen herds had failed over to the most up-to-date replica. The tool allowed me to repurpose the old primary instances as new replicas, and repeat the process in reverse. All without SSHing to each individual server and carefully invoking the process.

After the presentation, Sean Chittenden (of Groupon fame) brought up Consul as a way to make the ele_tools service companion better at Postgres instance auto-discovery. Or maybe replace it entirely. I won’t know until I spend some time looking at what it does and how it works. Either way, it seems to be a nice way of making ElepHaaS better.

ElepHaaS could scale to thousands of instances, but we don’t have that many. Someone does, though. Pull requests welcome!

Big Data with PostgreSQL

Again with Simon Riggs, eh? Sure! This time around, the focus was on handling very large data sources. During this, he brought up a major shortcoming of sharding data on a primary key: secondary keys. In essence, related data in separate tables is likely to be sharded on different columns, forcing non-local data federation that adversely affects performance due to cross-communication requirements. This is why so many existing solutions require N-N node communication topographies.

There are basically two ways to circumvent this:

  1. Find a way to keep associated data together on the secondary key by following the primary as a principle hash. This doesn’t always work however, as the principle shard element may require further subdivision for larger secondary distributions.
  2. Implement a feedback system that pre-fetches applicable data into a front-end layer for local aggregation. There were a few slides showing pgpredict and 2UDA partially filling these roles.

And then of course, there was the requisite hawking of Postgres-XL. We’re still trying to get this one to work, but just recently ran into some kind of bug that corrupted a couple of system catalog tables in our development environment. That’s going to be fun to fix! But I’m still pushing it, because this approach really is the way to go in the long run for extremely large warehouses. Simon emphasized that they want to move as much of the Postgres-XL patches to Postgres core as possible, and then make the remaining functionality available as an extension. Maintaining such a divergence long-term isn’t really viable.

Oh, and he also mentioned that Postgres has a 32TB per table size limit. I immediately ran to our 50TB system and frantically checked the biggest tables before remembering we implemented partitioning specifically to avoid maintenance issues caused by vast monolithic tables. That of course, solved this problem as well. Otherwise, our largest table would have been 8TB. Not bad! These limitations are obviously documented, but it’s easy to forget.

Non-Relational Postgres

As the last talk for the conference, Bruce Momjian spent some time covering non-relational data types supported by Postgres. Starting with arrays, he continued through all of the greats like ranges, geometric types, XML, JSON and JSONB, composite row types, and even characters, since they can be interpreted any number of ways depending on tokenization contexts.

One particularly interesting point he raised was that range types index better for data that often follows a start/end dating model. This is because Postgres only really uses one index at a time per sub-clause. In addition, independent columns also have separately tabulated statistics. When these statistics are combined, they can lead to underestimations of row counts, which can lead to poor performance due to inefficient query plans.

That’s an important point that applies to many of the composite non-relational types. Keeping related data together in many cases is an optimization, even when a single column contains multiple data points. So it’s not just to reduce query complexity!

Bruce always keeps his presentations (including this one) on his website. I highly recommend perusing any that look interesting.

Closing Notes

As always, most of the presenters eventually link their slide decks on the Postgres Wiki. Though I personally wish the Postgres Open site itself eventually published them on the abstract pages for each talk. In either case, videos of all of these will eventually be on YouTube, so for any I didn’t cover, or sounded good enough to see first-hand, keep an eye out for them.

Of the big names that normally attend, Robert Haas and Josh Berkus were conspicuously absent. For now, I’ll just blame the newly formed Postgres Vision for scheduling too closely to Postgres Open. It’s not the first conference to cannibalize another for speakers, and it won’t be the last.

In the end, I think this was a pretty good summary of Postsgres past, present, and future. As always, I lament the general lack of end-to-end solution demonstrations like some of those we’ve seen in the past. That’s the risk of being a Postgres advocate: everyone wants to talk about some cool new method they’ve developed, but there isn’t a lot of real-world application.

Either way, there was definitely enough material to inspire a bit of intellectual exploration of database techniques. If you couldn’t attend, there are a couple more Postgres conferences this year, and a slew for 2017. Don’t be a stranger!


Tags: , , , ,

Unexpected Existentialism

September 13th, 2016 | Published in Contemplation | No Comments


There’s a bit of loneliness in the world, I think.

But not the kind we’ve all come to recognize. Not the feeling that we are alone, unknowable, or otherwise separated from our peers. It’s something I never expected to encounter, and yet that’s exactly what makes it so penetrating. It’s a kind of emotional nostalgia, and the realization that the novelty of life itself is fleeting. I used to wonder what adults thought to themselves as they watched us play and grow, forever discovering, always surprised and delighted or perturbed. Now that it’s been about 20 years since I graduated from high school, I think I know.

It’s a kind of sad envy. Once we’ve accumulated a few years and all of the new and unpredictable has long since become practiced and deciphered. Once we finally understand that there’s nothing new under the sun. No longer children ourselves, we can only live vicariously through those who still are. We nod in understanding to their plight, as we’ve been there before. We tell them: this too, shall pass. We know that one day, even terrible experiences of youth will pale to the looming specter of knowledge.

I can’t speak for anyone else in this, but I still feel like the teenager I once was. It’s as if I’ve awoken 20 years later, shocked by the time that’s passed. Yet I also remember going to my first Anime Iowa in 2000 when I was 23. While an incomprehensible whirlwind of events, I did indeed get married in 2008. Then I finished Rabbit Rue. Then I wrote a Postgres Book. With each of these, my impostor syndrome grew. The more experienced and accomplished at being alive I become, the less real everything seems.

Yet the truth is inescapable. I remember all of those things and the myriad of occasions in-between because it all happened. I did those things, and many more besides. It was all new then, but no longer. There will be no more crying over skinned knees. No more crab-walking while staring at the ceiling and giggling at how weird everything looked. No timid apprehension of a first kiss. Gone are the days of adolescent self-discovery. No more immaturity.

We’ve all heard the refrain: if only I could go back and do it all over again. But that’s an empty conceit, isn’t it? Even assuming that were possible, what could we do “again”? The same things, but re-experience the magic of naiveté? How would we know we were repeating old experiences, then? Different things, armed with the knowledge we now wield and the implications that brings? Playing life in God Mode would be an amusing diversion, but it would hardly replicate the feeling of newness simply because we were young again.

Because that part of us is gone forever. It’s something I miss quite distinctly now that I’ve noticed it’s missing. Yet more than that, it reminds me of how I occasionally wonder about the lives of others. What was the first movie he saw? Did she cry the first time she saw a dead animal? Billions of people with their own perspective on life and all the events that contributed to its formation. A universe of firsts, evolving through time as fads and technology flavor them to something only tangentially recognizable to previous generations.

And we all look back in contemplation, fond or otherwise. Maybe we wish we could recapture the spark of a first love, or the trepidation of the first day of school. In the back of my mind, it’s always been there. A wish that I could do something again, knowing I never could, and then feeling a deep and inexplicable sadness. Kids of today will never feel the thrill of opening a Michael Jackson single on vinyl, just as I’ll never know what it’s like to be born into a world where instantaneous lookup of any information via a tiny pocket computer is a reality.

Though I wish I could. Our humanity is so profoundly limiting, I no longer question why we embrace a consumer culture. We can never truly relate to one another, so we distract ourselves with widgets and geegaws, entertainment of all description, and every little scandal and noble cause du jour. Then maybe, just maybe, we can forget for a little longer, that we’re mildly more jaded than we were the day before. That we’re always a little lonely because there’s nobody that’s really like us, despite how much we have in common.

Of course, I could just be projecting. I’ve never denied the fact I don’t relate to the world or anyone in it, and never really have. But there’s always something new to me just over the horizon. None will never really carry the impact of those earlier experiences, but I’d never trade them for anything. Still, not a day goes by that I don’t wish Andy Weir’s The Egg was more than just a short story. We could be so much more if we didn’t each reside in our own tiny microcosm, separated even from our own past.


Tags: , , , ,

PG Phriday: Irrelevant Inclinations

September 9th, 2016 | Published in Database, Tech Talk | No Comments


Say hi to Princess Kittybutt. She’ll be our mascot (and subject) for today. We’ll get to her in a minute.

Princess Kittybutt

When the only tool you have is a hammer, everything becomes a nail, right? With Postgres becoming more of an environment than simply a database engine, this colloquialism is starting to resemble reality. Of course, that’s not necessarily a bad thing! As Postgres accumulates copious and varied extensions, its role as an adaptive middleware solidifies. When Postgres can do something for itself, the need for sprawling harnesses of support scripts steadily decreases.

Yet there’s also the accusation that’s often levied at Emacs regarding its role as a UNIX kitchen sink. Extensions aid in preventing Postgres from descending into bloated madness. So for now at least, we’re safe to contemplate the tools available before they gain sentience and destroy humanity. For instance, can we build a game using only Postgres?

Not some contrived simulation where every interactive function calls a maintenance routine to advance the game world and respond to input. Can Postgres be the game engine, with an independently operating universe that churns along regardless of a player’s presence? Can we do that without a mature LISTEN / NOTIFY external resource that does all of the heavy lifting in some other language? Do we have all of the parts?

A new extension named pg_cron recently hit the web which makes this much easier. Postgres has had job scheduling for a while thanks to pgAgent, but it always ran as an external daemon which just happened to use Postgres tables to store schedules. Now we have something a bit more native that leverages the Postgres 9.3+ background worker functionality to integrate scheduling more directly. True to its name, Postgres now has a cron service.

Does it work?

CREATE EXTENSION cron;
 
CREATE TABLE cron_test (message TEXT);
 
SELECT cron.schedule(
  '* * * * *',
  $$INSERT INTO cron_test VALUES ('Hello World')$$
);
 
\x
 
SELECT * FROM cron.job;
 
-[ RECORD 1 ]------------------------------------------
jobid    | 1
schedule | * * * * *
command  | INSERT INTO cron_test VALUES ('Hello World')
nodename | localhost
nodeport | 5432
DATABASE | postgres
username | postgres
 
SELECT pg_sleep(60);
 
SELECT * FROM cron_test;
 
   message   
-------------
 Hello World

Success! That job will insert “Hello World” every minute until we remove it. We won’t go through the installation process, as the README covers that well enough. However, we’d be extremely remiss if we neglected to thoroughly abuse it.

We came here to make a game, and that’s what we’re going to do! Well, a “game” anyway. We’re well aware that it’s possible to build a universe using mostly Postgres tools. We highly recommend checking out Schemaverse by the way, it’s a great way to learn Postgres features by example and practice.

Unfortunately, it would be impossible to cram all of that into a reasonable length article or ten. No, we’re going to give birth to a centralized virtual pet system. Pets exist when we aren’t around, need to be fed and played with, and might die if we neglect either for too long. Most importantly, it’s brain-numbingly easy to code. Perfect!

Let’s start with a schema.

CREATE TABLE db_pet (
  pet_id     SERIAL   PRIMARY KEY,
  player     VARCHAR  NOT NULL UNIQUE,
  pet_name   VARCHAR  NOT NULL,
  mood       INT      NOT NULL DEFAULT 24,
  food       INT      NOT NULL DEFAULT 24,
  is_dead    BOOLEAN  NOT NULL DEFAULT FALSE
);
 
CREATE OR REPLACE VIEW my_pet AS
SELECT pet_name,
       CASE WHEN food < 8 THEN 'STARVING'
            WHEN food < 16 THEN 'HUNGRY'
            ELSE 'FULL'
       END AS appetite,
       CASE WHEN mood < 8 THEN 'DEPRESSED'
            WHEN mood < 16 THEN 'BORED'
            ELSE 'HAPPY'
       END AS spirits,
       CASE WHEN is_dead THEN 'DEAD (you monster!)'
            WHEN food < 5 OR mood < 5 THEN 'SICK'
            WHEN food < 13 OR mood < 13 THEN 'OK'
            ELSE 'GREAT'
       END AS health
  FROM db_pet
 WHERE player = SESSION_USER;
 
GRANT SELECT ON my_pet TO PUBLIC;

Remember when we said this would be stupidly easy to code? We have one table to store the pets, and one view to display their current status to the owner. In this world, each clock tick is worth one hour. Our view suggest pets must be fed and trained three times a day or they’ll starve to death or die from boredom. Further, we slapped a unique index on the player so we can truly punish neglectful owners. One pet is all you get, and if it dies, too bad! Maybe a less cruel future version will allow players to have multiple pets and move the unique key to player and pet name. For now, let’s be awful sadists.

We spoke a bit about functions recently, and we’ll need a few to facilitate gameplay. In particular, we’ll need only three user functions:

  • One to create and personalize the pet. We want some emotional investment, after all!
  • One to feed the pet. This function should stave off starvation for 8 hours. That’ll give us roughly three feedings per day to keep it healthy.
  • One to train the pet. Let’s assume the pet is more resilient to boredom than starvation and each training is worth 12 hours of entertainment.

Here’s how they might look:

CREATE OR REPLACE FUNCTION new_pet(new_name VARCHAR)
RETURNS SETOF my_pet
LANGUAGE plpgsql AS
$$
DECLARE
  pet_state my_pet;
BEGIN
 
  SELECT * INTO pet_state FROM my_pet;
 
  -- If the user tries to create another pet, tell them no.
  -- If their pet died, that's too darn bad!
 
  BEGIN
    INSERT INTO db_pet (player, pet_name)
    VALUES (SESSION_USER, new_name);
  EXCEPTION
    WHEN unique_violation THEN
      IF pet_state.health ~ 'DEAD' THEN
        RAISE NOTICE 'Your pet is dead forever. Murderer.';
      ELSE
        RAISE NOTICE 'You can only ever have one pet!';
      END IF;
  END;
 
  SELECT * INTO pet_state FROM my_pet;
  RETURN NEXT pet_state;
END;
$$ SECURITY DEFINER;
 
CREATE OR REPLACE FUNCTION feed_pet()
RETURNS SETOF my_pet
LANGUAGE plpgsql AS
$$
DECLARE
  pet_state my_pet;
BEGIN
  -- Every feeding is good for 8 hours. Don't let the player
  -- gorge their pet beyond one day of food.
 
  UPDATE db_pet
     SET food = CASE WHEN food >= 16 THEN 24 ELSE food + 8 END
   WHERE player = SESSION_USER
     AND NOT is_dead;
 
  SELECT * INTO pet_state FROM my_pet;
  RETURN NEXT pet_state;
END;
$$ SECURITY DEFINER;
 
CREATE OR REPLACE FUNCTION train_pet()
RETURNS SETOF my_pet
LANGUAGE plpgsql AS
$$
DECLARE
  pet_state my_pet;
BEGIN
  -- Playing with a pet will keep it entertained for another
  -- 12 hours, up to a 24-hour limit.
 
  UPDATE db_pet
     SET mood = CASE WHEN mood >= 12 THEN 24 ELSE mood + 12 END
   WHERE player = SESSION_USER
     AND NOT is_dead;
 
  SELECT * INTO pet_state FROM my_pet;
  RETURN NEXT pet_state;
END;
$$ SECURITY DEFINER;

We created each function to always return the current status of the pet. Returning row sets is just too convenient not invoke, especially when interaction is a key part of game feedback. We also had to declare each of the functions as security definers so players could modify the state of their own pet without having access to the underlying table.

Beyond letting the user do things to their pet, we also need a game loop to push all the pets toward an untimely demise unless their owner intervenes. Then we need to actually schedule it with pg_cron to activate everything.

CREATE OR REPLACE FUNCTION pet_game_loop()
RETURNS VOID
LANGUAGE plpgsql AS
$$
BEGIN
  UPDATE db_pet
     SET food = food - 1,
         mood = mood - 1
   WHERE NOT is_dead;
 
  UPDATE db_pet
     SET is_dead = TRUE
   WHERE (mood < 1 OR food < 1)
     AND NOT is_dead;
END;
$$;
 
SELECT cron.schedule(
  '* * * * *', 'SELECT pet_game_loop()'
);

Eagle-eyed cron users will immediately notice we set the game loop to run once per minute instead of once per hour. Hey, this is an article and we have a hard deadline, here. We can’t just wait several hours for Princess Kittybutt to diedemonstrate the game loop!

With that out of the way, let’s all welcome Princess Kittybutt!

SELECT * FROM new_pet('Princess Kittybutt');
 
      pet_name      | appetite | spirits | health 
--------------------+----------+---------+--------
 Princess Kittybutt | FULL     | HAPPY   | GREAT

Ah, the addition of a new pet to the family! What more could anyone ask for? Princess Kittybutt is happy, healthy, and content with us as owners. Truly, it’s stupendous to be Princess Kittybutt on this fine evening. So long as we play with her at least twice a day, and feed her three times, she will continue to live on in our database, successfully occupying many bytes and not doing much else.

Despite the uh… enhancement to the game timing, let’s check back on her in about ten minutes and see how things are going.

SELECT * FROM my_pet;
 
      pet_name      | appetite | spirits | health 
--------------------+----------+---------+--------
 Princess Kittybutt | HUNGRY   | BORED   | GREAT
 
SELECT * FROM feed_pet();
 
      pet_name      | appetite | spirits | health 
--------------------+----------+---------+--------
 Princess Kittybutt | FULL     | BORED   | GREAT

We noticed Princess was feeling a tad peckish, so we fed her a bit and everything is right as rain. She’s a little bored, but that’s hardly the end of the world, is it? Let’s check back in another ten minutes, shall we?

SELECT * FROM my_pet;
 
      pet_name      | appetite |  spirits  | health 
--------------------+----------+-----------+--------
 Princess Kittybutt | HUNGRY   | DEPRESSED | SICK

Well, Princess Kittybutt is slightly under the weather, but we need to run a quick errand. That’s OK, right? We’re busy professionals after all, and to be honest, the life of a cat is pretty cushy. It’s not like she only has about five minutes to live or anything.

Back in five minutes! Promise!

SELECT * FROM my_pet;
 
      pet_name      | appetite |  spirits  |       health        
--------------------+----------+-----------+---------------------
 Princess Kittybutt | STARVING | DEPRESSED | DEAD (you monster!)

Dead Princess Kittybutt

W-what?! Princess Kittybutt! No! What have we done!? When the kids get home from school, they’re going to be utterly devastated! There has to be something we can do… Wait! We can just get another cat that looks just like her, right? Right? They’ll never know the difference.

Just a little bit of SQL and…

SELECT * FROM new_pet('Princess Kittybutt 2');
 
NOTICE:  Your pet IS dead forever. Murderer.
 
      pet_name      | appetite |  spirits  |       health        
--------------------+----------+-----------+---------------------
 Princess Kittybutt | HUNGRY   | DEPRESSED | DEAD (you monster!)

Why!? Poor Princess Kittybutt, you hardly had a chance to live. Oh, what cruel fate!

Anyway, what kind of practical use is an integral job scheduling system in Postgres? For one, few production databases have a 1-1 ratio between database users and OS users. As a consequence, it’s not always easy or convenient to schedule recurring events without using pgAgent or overloading OS user crontabs. Beyond that, do we really want decentralized scheduled events spread across multiple OS users anyway?

Of particular interest are data loading and manipulation operations. While most ETL systems have scheduling and transformation engines or suggest robust examples of such, sometimes it’s easier to frequently process new data internally. An active data logger or order processing system might have an army of fact tables and dimensions summarizing details at multiple levels of granularity. It’s not uncommon to keep these topped-up with functions that regularly aggregate everything to accelerate report generation.

Or of course, we could write a real game. Like Schemaverse, we might want multiple events to occur per clock tick. We can even create multiple game loop functions so each category of event follows a distinct schedule. It’s common for a game engine to assign independent jobs to available CPUs, giving the illusion of parallel activity. Well, that’s certainly a potential avenue here as well. As long as we don’t mind inhabiting a universe that only advances at a maximum rate of once per minute, that is.

That’s the only real downfall of pg_cron. Instead of being a generic scheduler, it married itself to cron’s limited syntax and granularity; it’s in the name, after all. For events that must occur more frequently than once per minute, we still need to rely on an external invocation routine of some description.

Considering how long cron has survived with that particular shortcoming, I doubt that will hold pg_cron back. It’s certainly a welcome addition to the Postgres extension family. Even though it murdered poor Princess Kittybutt.


Tags: , , , ,

PG Phriday: Cult of Functionality

September 2nd, 2016 | Published in Database, Tech Talk | 1 Comment


It’s no surprise Postgres does more than merely store data; you can’t do everything with SQL. Often, it’s often more beneficial to process data locally without transmitting it to a client interface. Local manipulation can save hours in network traffic alone, let alone client-side allocation and per-row processing. Databases like Postgres are specifically for bulk data operations, so why not take advantage?

But doing that requires functions—anonymous or otherwise—and a Turing-complete language to write them with. For data-hungry jobs that benefit from local evaluation, Postgres has a veritable smorgasbord of procedural languages ready to sate almost any coding appetite. But where do we start?

Tightly integrating Postgres with a project is getting easier with each release. So reasonable in fact, we’re liable to forget that Postgres has its own fairly capable language in PL/pgSQL. Let’s work with Postgres’ free language a bit and explore its functionality, shall we?

Just so we have something to manipulate, let’s start by defining a basic table and associated data:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 100, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id || 's')::INTERVAL
  FROM generate_series(1, 500) s(id);
 
ANALYZE sensor_log;

With that out of the way, let’s start with a short history lesson. It’s hard to ignore the blocky nature of SQL statement organization. This is almost a natural consequence of a language saddled with such high verbosity and numerous ubiquitous keywords. Each SQL statement becomes a series of high-level directives led by grouped operations, making contextual break-points for each. So we end up seeing SELECT ... FROM ... WHERE ... GROUP BY ... operational blocks.

Some language nerds might recognize this cadence. To make it a bit more obvious, here’s a very basic Postgres PL/pgSQL function:

CREATE OR REPLACE FUNCTION f_add_nums(x INT, y INT)
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val INT;
BEGIN
    ret_val := x + y;
    RETURN ret_val;
END;
$BODY$;

There’s no coincidence that this highly resembles Oracle’s PL/SQL. But Oracle was inspired by Ada, a language developed for use by the US government. Ada itself is highly influenced by Pascal. With this kind of pedigree and a very similar block-style approach, it shouldn’t be surprising so many database languages feel similar.

With that in mind, Postgres does take advantage of many of its own syntax choices within this realm, either to simplify statements or provide better integration with the core. For example, there are essentially two ways to loop through a SQL statement within a procedure:

-- First, using an explicit cursor:
 
CREATE OR REPLACE FUNCTION f_process_log()
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val BIGINT := 0;
    log_record RECORD;
 
    log_results CURSOR FOR
        SELECT reading
          FROM sensor_log;
BEGIN
    FOR log_record IN log_results LOOP
      ret_val := ret_val + log_record.reading;
    END LOOP;
 
    RETURN ret_val;
END;
$BODY$;
 
-- Next, with an implicit cursor as a shortcut:
 
CREATE OR REPLACE FUNCTION f_process_log()
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val BIGINT := 0;
    read_value BIGINT;
BEGIN
    FOR read_value IN
        SELECT reading
          FROM sensor_log
    LOOP
      ret_val := ret_val + read_value;
    END LOOP;
 
    RETURN ret_val;
END;
$BODY$;

Of these two approaches, the second is likely preferred by actual human beings. The cursor is still there, but exists ephemerally, doing its magic in the background. In using the explicit cursor, we have to ensure we haven’t opened it previously in the function, and addressing the results is a bit more cumbersome.

What about conditionals based on query results? Again, there are both hard and easy ways of going about this:

-- First, using a variable as an outright result:
 
CREATE OR REPLACE FUNCTION f_check_sensor(sensor_loc TEXT)
RETURNS INT 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    ret_val BIGINT;
BEGIN
    ret_val := (SELECT id
                  FROM sensor_log
                 WHERE location = sensor_loc
                 LIMIT 1);
 
    IF ret_val IS NULL THEN
        RETURN FALSE;
    END IF;
 
    RETURN TRUE;
END;
$BODY$;
 
-- Next, using built-in syntax:
 
CREATE OR REPLACE FUNCTION f_check_sensor(sensor_loc TEXT)
RETURNS BOOLEAN 
LANGUAGE plpgsql AS
$BODY$
BEGIN
    PERFORM id
       FROM sensor_log
      WHERE location = sensor_loc
      LIMIT 1;
 
    IF NOT FOUND THEN
        RETURN FALSE;
    END IF;
 
    RETURN TRUE;
END;
$BODY$;

This time we used the PERFORM syntax instead of SELECT. It’s one of many available while relying on PL/pgSQL. It’s meant for times when we don’t explicitly require query results, but still need to execute a query. Once again, we leveraged implicit-style syntax to simplify the code (and our thought process).

This also extends into exception handling. We’ve already covered exception performance considerations, but there are also cool syntax tricks available. Imagine we want to process all log readings for the current day, raising a warning when certain boundaries are violated.

Watch what happens if we just use a naked exception:

CREATE OR REPLACE FUNCTION f_process_current_log()
RETURNS VOID 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    log_rec RECORD;
BEGIN
    FOR log_rec IN
        SELECT location, reading, reading_date
          FROM sensor_log
         WHERE reading_date >= CURRENT_DATE
    LOOP
        -- Imagine this is some complicated procedure. Maybe we
        -- call a function, for instance, and consider a bad
        -- result or a passed exception as fatal.
 
        IF log_rec.reading > 98 THEN
            RAISE EXCEPTION 'location % reading % invalid at %!',
                  log_rec.location, log_rec.reading,
                  log_rec.reading_date;
        END IF;
    END LOOP;
END;
$BODY$;
 
SELECT f_process_current_log();
 
ERROR:  location 99 reading 99 invalid at 2016-09-02 07:58:21!

Well, we built our fake data, so we know for a fact that there should be five errors returned, but processing stopped right away. We could reduce the exception to a warning to prevent this, yet we can’t always count on downstream functions on being so forgiving. If this function invokes several other sub-functions, we’ll eventually get a fatal exception that will derail the entire processing loop. How can we prevent this?

Perhaps not so obvious, but Postgres blocks can go pretty much anywhere. Here’s the same function without its fatal processing issue:

CREATE OR REPLACE FUNCTION f_process_current_log()
RETURNS VOID 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    log_rec RECORD;
BEGIN
    FOR log_rec IN
        SELECT location, reading, reading_date
          FROM sensor_log
         WHERE reading_date >= CURRENT_DATE
    LOOP
        -- Imagine this is some complicated procedure. Maybe we
        -- call a function, for instance, and consider a bad
        -- result or a passed exception as fatal.
 
        BEGIN
            IF log_rec.reading > 98 THEN
                RAISE EXCEPTION 'location % reading % invalid at %!',
                      log_rec.location, log_rec.reading,
                      log_rec.reading_date;
            END IF;
        EXCEPTION
            WHEN OTHERS THEN
                RAISE NOTICE '%', SQLERRM;
        END;
    END LOOP;
END;
$BODY$;
 
SELECT f_process_current_log();
 
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:58:21!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:56:41!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:55:01!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:53:21!
NOTICE:  location 99 reading 99 invalid at 2016-09-02 07:51:41!

By wrapping the “dangerous” external call in another block, we were able to trap and override the exception it raised. This is effectively the same as a try/catch block in other languages, but may be potentially confusing because of its similarity to the outer block syntax. Regardless, PL/pgSQL has most of the trappings of a modern language, even if some appear a bit antiquated by current standards.

The idea with PL/pgSQL is that it highly resembles the SQL language itself, and allows directly embedding SQL statements. Within a database context, this is a massive time saver for native SQL speakers. Consider the mechanism for enabling a function to return row results:

CREATE OR REPLACE FUNCTION f_generate_and_return(new_count INT)
RETURNS SETOF sensor_log 
LANGUAGE plpgsql AS
$BODY$
DECLARE
    n_count INT := 0;
    new_row RECORD;
BEGIN
    FOR n_count IN 1..new_count
    LOOP
        INSERT INTO sensor_log (location, reading, reading_date)
        SELECT n_count, n_count % 100, now()
        RETURNING * INTO new_row;
 
        RETURN NEXT new_row;
    END LOOP;
END;
$BODY$;
 
SELECT * FROM f_generate_and_return(5);
 
 id  | location | reading |        reading_date        
-----+----------+---------+----------------------------
 501 | 1        |       1 | 2016-09-02 13:39:33.566244
 502 | 2        |       2 | 2016-09-02 13:39:33.566244
 503 | 3        |       3 | 2016-09-02 13:39:33.566244
 504 | 4        |       4 | 2016-09-02 13:39:33.566244
 505 | 5        |       5 | 2016-09-02 13:39:33.566244

How about that? Not only were we able to insert new rows, but we fetched them to immediately verify their presence. By using the SETOF decorator, we told Postgres that the function would return the declared result set. We could do this for any existing table or view within the database, or we could define our own type for special cases. In essence, we are transforming our function into a reentrant version with a persistent stack, and all that implies. Yet everything still resembles SQL, not some other language that happens to support SQL through function calls or other convoluted contortions.

Now, this is an extremely trivial application of this capability. In fact, we highly recommend against looping inserts, but some complicated business logic or external function calls might demand this, or a similar level of expensive processing. Try to consider this as a demonstration, rather than a bible for best practices.

In any case, we hope Pl/pgSQL becomes part of your everyday lexicon!


Tags: , , ,

« Older Posts