PG Phriday: JSON and JSONB Revisited

December 18th, 2015 | Published in Database, Tech Talk | 5 Comments


With Postgres 9.5 on the horizon, I figured it’s a good time to see how things have advanced since my last dive into that particular ocean. This is probably particularly relevant since even MongoDB, a JSON-driven NoSQL database, is now partially powered by Postgres. A lot of people found that particular revelation quite shocking, but maybe they shouldn’t, given the advancements embedded within the last couple of Postgres releases.

As it turns out, there are quite a few advancements that really make JSONB a powerful addition to Postgres. Note that this does not apply to JSON. While JSONB accumulates several new elements of functionality, JSON itself remains almost completely unchanged as of Postgres 9.5rc1.

As an example of this, let’s take a look at the new concatenation operator with regards to JSON and JSONB:

SELECT '{"Hairy": true}'::JSONB || '{"Wobbly": false}'::JSONB;
 
             ?COLUMN?             
----------------------------------
 {"Hairy": TRUE, "Wobbly": FALSE}
 
SELECT '{"Hairy": true}'::JSON || '{"Wobbly": false}'::JSON;
 
ERROR:  operator does NOT exist: json || json

It would probably be best to forget JSON even exists at this point. When interacting with JSON in a Postgres database, JSONB is clearly the way forward. Use it for table columns, use it for casting, use it for making delicious casseroles. But wait… didn’t we just do something that once required a PL/Python or PL/V8 function? Yes we did, and that’s only the beginning; the concatenate operator will also overwrite existing fields:

SELECT '{"Hairy": true, "Excited": false}'::JSONB || 
       '{"Excited": true, "Wobbly": true}'::JSONB;
 
                     ?COLUMN?                     
--------------------------------------------------
 {"Hairy": TRUE, "Wobbly": TRUE, "Excited": TRUE}

It’s nice to see that we don’t have to separate the operations, either. Replacing a value and adding a new one can both be done simultaneously. Alternatively, if we want to remove keys, the ‘-‘ operator will do that directly. For example:

SELECT '{"Hairy": true, "Excited": false}'::JSONB - 'Excited';
 
    ?COLUMN?     
-----------------
 {"Hairy": TRUE}

Note that this kind of interaction works with keys, not actual JSON. This might seem counterintuitive at first since ‘||’ and ‘-‘ are not complementary operations. But neither are the actions taking place. The first is more of a merge, in which case the ‘||’ operator is probably being misused. The second is a subtraction based on a path to JSON—an important distinction. This isn’t a ‘+’ vs. ‘-‘ situation, and can’t really be construed as such.

We also gain the ability to subtract a full path to a nested JSON element. Since keys can nest multiple levels deep, we would need an array to represent a path with a depth greater than one. We’d also probably want a different operator so we don’t mix keys with key paths. Thus 9.5 also introduces the ‘#-‘ operator:

SELECT '{"Hairy": true, "Status": {
         "Excited": true, "Burning": true}
        }'::JSONB #- '{Status,Burning}'::TEXT[];
 
                   ?COLUMN?                   
----------------------------------------------
 {"Hairy": TRUE, "Status": {"Excited": TRUE}}
 
-- This also works
 
SELECT '{"Hairy": true, "Status": {
         "Excited": true, "Burning": true}
        }'::JSONB #- '{Status,Burning}';
 
                   ?COLUMN?                   
----------------------------------------------
 {"Hairy": TRUE, "Status": {"Excited": TRUE}}

It’s nice that the TEXT array casting is optional, since that usage is somewhat esoteric in comparison to others. Interestingly, the opposite operation for ‘#-‘ isn’t ‘#+’, but a function call. Again, this may seem odd at first glance, but makes more sense upon further examination. To add JSON, we need a path to know where to put the data, and then the data itself. Operators can’t handle multiple parameters, so we need something that will.

And that’s exactly what the new jsonb_set function does. Here it is in action:

SELECT jsonb_set('{"Hairy": true, "Status": {
         "Excited": true, "Burning": true}
        }'::JSONB,
        '{Status,Burning}',
        'false'
       );
 
                           jsonb_set                            
----------------------------------------------------------------
 {"Hairy": TRUE, "Status": {"Burning": FALSE, "Excited": TRUE}}

The full documentation for JSON functions provides a lot more information about what jsonb_set does and how it works. I highly encourage reading up on it before doing a lot of extensive JSON work.

And finally, Postgres now has the ability to make JSON human readable. This is especially important since JSONB discards formatting since it is stored in Postgres as an arbitrary binary encoding. Deeply nested JSON can be quite ugly and resist mental parsing, so it’s great to see this addition. How does our hairy, burning, excited, yet wobbly data look?

SELECT jsonb_pretty('{"Hairy": true, "Status": {
         "Excited": true, "Burning": true},
         "Wobbly": false
        }');
 
       jsonb_pretty       
--------------------------
 {
     "Hairy": TRUE,
     "Status": {
         "Burning": TRUE,
         "Excited": TRUE 
     }
     "Wobbly": FALSE
 }

Given everything we’ve seen here, I almost want to suggest the developers rip out the existing JSON engine and replace it entirely with JSONB such that it’s merely an alias. I can’t think of a single reason to retain a less functional datatype that only serves to confuse new users. JSON is slower, uses more disk space, is harder to interact with, and has a mere fragment of JSONB’s capabilities.

Well, I can think of one reason: compatibility. Due to the different storage requirements, any existing tables using JSON columns would be incompatible with the JSONB engine. This is an unfortunate truth of JSON being a precursor to JSONB in the Postgres timeline. This reminds me of what happened to Oracle regarding VARCHAR vs VARCHAR2. Oracle’s recommendation is to always use VARCHAR2 for various reasons, and at this point, the same holds for Postgres and JSONB. I think a note to this effect in the Postgres documentation would be in our best interests.

In any case, this is an exciting time for Postgres in general. JSON seems to be replacing XML as an application communication format, so having such powerful internal compatibility with it is an important step forward. 9.5 can’t come soon enough!


Tags: , , ,

PG Phriday: Postgres-XL and Horizontal Scaling

December 11th, 2015 | Published in Database, Tech Talk | 2 Comments


With all of the upheaval in the Postgres world thanks to advancements in extensions, foreign data wrappers, and background workers, it’s getting pretty difficult to keep track of everything! One of these rapidly moving targets is Postgres-XL and its role in helping Postgres scale outward. Large warehouses have a critical need for horizontal scaling, as the very laws of physics make it effectively impossible to perform aggregate queries on tables consisting of several billion rows. Even furiously indexed, partitioned, and clustered tables, on over-provisioned servers loaded with the best NVRAM-based storage, 64 CPUS, and 1TB of RAM, become lumbering beasts when someone wants a cumulative summary of last year’s activity.

In my previous analysis of PMPP, I wondered how we could use it to possibly rebalance one of our larger databases so that it’s actually usable in linear time. But that also got me thinking about past Postgres projects such as Postgres-XC and its ilk. I’d long since forgotten about them due to how far they lagged behind Postgres core patches. But desperate times call for desperate measures, so I looked again, remembering how Postgres-XL had taken up the mantle of open-source big-data Postgres.

What I found was initially discouraging. Postgres-XL is currently stable on the 9.2 branch even though 9.4 has been out for over a year. But the devs claim to be feverishly merging 9.5 into Postgres-XL, so assuming they eventually complete that effort, this could be something great. As it turns out—though it’s a bit of a juggling act—Postgres-XL may actually be an option for our 30+TB instance.

Of course, first we have to install it. We don’t use RPM systems where I work, so I used the source tarball and installed with configure and make such that I could simply use the binaries normally. After that, the documentation suggests using pgxc_ctl to provision everything. Unfortunately the sample configuration file for that tool doesn’t adequately describe all of the variables, and the defaults refer to nodes names and a stack that isn’t well defined. So let’s just do this the “hard” way. One benefit to this approach is a more complete understanding of what’s going on.

Everything starts with GTM, the Global Transaction Manager that ensures transactions commit safely across the cluster, since we are working with several independent Postgres instances. We also need a data coordinator, and a few data nodes. The test system has eight CPUs, so it makes sense to allocate four data nodes to avoid saturating the system. GTM can be extremely active, and the coordinator does its own work, so we should assume this uses six of our eight CPUs.

initgtm -Z gtm -D /data/pgsql/pgxl/gtm
gtm_ctl -Z gtm -D /data/pgsql/pgxl/gtm start
 
initdb -D /data/pgsql/pgxl/coord1 --nodename coord1
initdb -D /data/pgsql/pgxl/data1 --nodename data1
initdb -D /data/pgsql/pgxl/data2 --nodename data2
initdb -D /data/pgsql/pgxl/data3 --nodename data3
initdb -D /data/pgsql/pgxl/data4 --nodename data4

We started GTM at this point, because there’s no reason not to. It’s a very small and light service, and we’ll need it later anyway. We can’t start the cluster itself just yet however, because we need to make a bunch of configuration changes. Note that we’ve only differentiated the node roles by name at this point, and all of the ports are still the default 5432 for any Postgres service. So the minimum to get all of this running is to change:

  • The port setting for each node. To keep things simple, let’s use 6543 for the coordinator, and 6601 to 6604 for the data nodes.
  • Set pooler_port to unique values. This is because any instance is configured by default as a coordinator, and coordinators start up an internal connection pool on a separate port. Without changing this value, none of the data nodes will start after the coordinator is running. Let’s bump the values by one and use 6743 for the coordinator, and 6701 to 6704 for the data nodes. We’ll be able to fix this later.
  • Set logging_collector to on. Unless we want our console flooded with output, it’s a good start.

Next, we can actually start the cluster by launching each node:

pg_ctl -D /data/pgsql/pgxl/coord1 -Z coordinator start
pg_ctl -D /data/pgsql/pgxl/data1 -Z datanode start
pg_ctl -D /data/pgsql/pgxl/data2 -Z datanode start
pg_ctl -D /data/pgsql/pgxl/data3 -Z datanode start
pg_ctl -D /data/pgsql/pgxl/data4 -Z datanode start

At this point, our cluster still isn’t usable. Postgres-XL adds a NODE primitive, and a new instance is only aware of itself as a coordinator, and is set to the default port of 5432. This means none of the nodes can actually communicate with each other. To fix that, we actually have to connect to each node and inform it of its actual role:

cat <<EOF | psql -p 6543
ALTER NODE coord1 WITH (PORT = 6543);
 
CREATE NODE data1 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6601);
CREATE NODE data2 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6602);
CREATE NODE data3 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6603);
CREATE NODE data4 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6604);
 
SELECT pgxc_pool_reload();
EOF
 
cat <<EOF | psql -p 6601
ALTER NODE data1 WITH (port = 6601, TYPE = 'datanode');
 
CREATE NODE coord1 WITH (TYPE = 'coordinator', HOST = 'localhost', PORT = 6543);
CREATE NODE data2 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6602);
CREATE NODE data3 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6603);
CREATE NODE data4 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6604);
 
SELECT pgxc_pool_reload();
EOF
 
cat <<EOF | psql -p 6602
ALTER NODE data2 WITH (port = 6602, TYPE = 'datanode');
 
CREATE NODE coord1 WITH (TYPE = 'coordinator', HOST = 'localhost', PORT = 6543);
CREATE NODE data1 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6601);
CREATE NODE data3 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6603);
CREATE NODE data4 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6604);
 
SELECT pgxc_pool_reload();
EOF
 
cat <<EOF | psql -p 6603
ALTER NODE data3 WITH (port = 6603, TYPE = 'datanode');
 
CREATE NODE coord1 WITH (TYPE = 'coordinator', HOST = 'localhost', PORT = 6543);
CREATE NODE data1 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6601);
CREATE NODE data2 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6602);
CREATE NODE data4 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6604);
 
SELECT pgxc_pool_reload();
EOF
 
cat <<EOF | psql -p 6604
ALTER NODE data4 WITH (port = 6604, TYPE = 'datanode');
 
CREATE NODE coord1 WITH (TYPE = 'coordinator', HOST = 'localhost', PORT = 6543);
CREATE NODE data1 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6601);
CREATE NODE data2 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6602);
CREATE NODE data3 WITH (TYPE = 'datanode', HOST = 'localhost', PORT = 6603);
 
SELECT pgxc_pool_reload();
EOF

And then we can restart all of the data nodes to get rid of the extra pools:

pg_ctl -D /data/pgsql/pgxl/data1 -Z datanode restart
pg_ctl -D /data/pgsql/pgxl/data2 -Z datanode restart
pg_ctl -D /data/pgsql/pgxl/data3 -Z datanode restart
pg_ctl -D /data/pgsql/pgxl/data4 -Z datanode restart

Finally. Finally the cluster is usable. And this is the minimum setup necessary to get a working Postgres-XL stack. With that out of the way, let’s revisit our trusty sensor_log table:

CREATE TABLE sensor_log (
  sensor_log_id  SERIAL PRIMARY KEY,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  reading_date   TIMESTAMP NOT NULL
 
) DISTRIBUTE BY HASH (sensor_log_id);
 
CREATE INDEX idx_sensor_log_location ON sensor_log (location);
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL
  FROM generate_series(1, 4000000) s(id);

This is very similar to the setup we used for testing the PMPP extension. We didn’t have to manually allocate the shards, but we do have to tell Postgres-XL how to distribute the table contents across our data nodes. We should also mention here that the INSERT will be extremely slow. The primary reason for this is that we’re relying on GTM to control the sequence values, and it seems to get progressively slower as bulk insert size increases. For instance, 1000 rows insert 5x slower than a regular Postgres database, while the full 4M is 25x slower. This would suggest sequence use either needs work, or should be avoided in favor of pre-generated values.

In any case, here are a couple queries we can use to test against a standard Postgres 9.4 install:

-- Query 1
 
SELECT COUNT(*)
  FROM sensor_log
 WHERE reading_date::TIME >= '14:00'
   AND reading_date::TIME < '15:00';
 
-- Query 2
 
SELECT COUNT(*)
  FROM sensor_log
 WHERE reading_date >= CURRENT_DATE - INTERVAL '1 day';
Query Postgres 9.4 (ms) Postgres-XL (ms)
1 977 303
2 46 12

We should probably eventually run some more comprehensive tests containing joins to see how it handles cross-database data comparisons. But this preliminary testing seems to indicate nearly linear scaling for simple data access. Given a wide enough cluster, we could reduce query times by two or three orders of magnitude on larger tables. Combine this with fact tables, and we could potentially provide real-time analysis over wide aggregate ranges.

My only real concern, aside from the performance of sequences, is that every node has to be aware of every other node. This suggests Postgres-XL uses a N-squared network mesh to facilitate node communication. This is fine for small values of N, but I suspect severely diminishing returns at higher node counts due to communication overhead. Multiple-CPU computational clusters abandoned this approach decades ago in favor of architectures that use more efficient pathways. Given this limitation, extreme-scale clusters might not be possible with Postgres-XL in its current state.

If someone out there has a 32+ node cluster using Postgres-XC or Postgres-XL, I’d love to hear from them. Solutions like this are currently one of the few ways of efficiently accessing several-billions of rows. At the very least, there’s a lot of potential here.


Tags: , , ,

PG Phriday: Displaced Durability

December 4th, 2015 | Published in Database, Tech Talk | 3 Comments


A lot of DBAs are quite adamant regarding ACID compliance. I count myself among them. But unlike the other parts of the acronym, there are times when data durability isn’t actually a high priority. Data staging holding areas, temporary tables that need visibility across sessions, and other transient information do not require zealous protection. As a DBA it feels weird saying it, but there’s just some data we simply don’t care about losing.

Which is why all versions of Postgres after 9.1 include the ability to create unlogged tables. Aside from not needing data durability, why even bother with such a feature? Believe it or not, enforcing durability comes with a lot of overhead. There’s a whole array of mechanisms that strive to guarantee committed transactions are permanent.

One of these, the write ahead log (WAL), is also part of the replication system. Due to their importance, WAL files are often archived as part of standard recovery procedure. That means there are potentially three redundancy levels for every table in our database! That is a vast amount of unnecessary disk writing and network activity for ultimately ephemeral data. Durability is great, except when it isn’t.

Here’s a quick demonstration at how these excess resources translate to time consumption:

\timing
 
CREATE TABLE regular_table AS
SELECT * FROM generate_series(1, 5000000);
 
TIME: 7423.529 ms
 
CREATE UNLOGGED TABLE unlogged_table AS
SELECT * FROM generate_series(1, 5000000);
 
TIME: 2843.311 ms

These timings varied somewhat, but the trend persisted through several iterations. This was on a test Postgres instance with one replica and no WAL archival. This means a production system utilizing WAL archival would potentially produce an even larger divergence. Taking the ‘D’ out of ACID has some very real performance benefits.

Of course, these advantages come with a gargantuan fire-breathing caveat. Let’s simulate a Postgres crash:

pkill -9 postmaster
pg_ctlcluster 9.4 main start

And then examine the aftermath:

SELECT COUNT(1) FROM regular_table;
 
  COUNT  
---------
 5000000
 
SELECT COUNT(1) FROM unlogged_table;
 
 COUNT 
-------
     0

Oh. Well, then. Because unlogged tables are not durable, they can’t be recovered. Following any unexpected shutdown, Postgres will truncate any unlogged tables upon the next startup. In addition, since unlogged tables are not part of the WAL system at all, they’re not streamed to any existing replicas. Replicas retain the table structure itself, but they remain entirely unusable.

This is what our clone has to say about the situation:

SELECT * FROM unlogged_table;
 
ERROR:  cannot access TEMPORARY OR unlogged relations during recovery

Given these constraints, unlogged tables are only useful on the primary server for transient data. This makes them great for use cases we mentioned earlier: data staging areas, persistent temp tables, raw COPY targets, etc. Used judiciously, they can greatly accelerate an ETL stack or multiple-step data import process.

They’re also good for application stacks that require data flushing. Certain types of queues leverage notifications or live messaging, neither of which want stale data. Some applications bootstrap one or more tables from a tertiary data source upon starting. Web apps could even store local session data in an unlogged Postgres table on each host, pending aggregation on a reporting server. There are many forms of application where phantom data is exactly what we want.

In the end, the primary difference between Postgres and a strictly ACID database, is that Postgres actively facilitates circumventing unnecessary durability overhead. But it does so safely, limiting the effect to tables specifically created for that purpose. Why pay for solid reality, when all you want is smoke and mirrors?


Tags: , ,

PG Phriday: Cluster Control

November 20th, 2015 | Published in Database, Tech Talk | No Comments


It has occurred to me that I may have been spending a bit too much time being excited about new Postgres features and developments in the community. One of the intents of this weekly article was for educational purposes, so this week, let’s get back to basics. To that end, the topic for this week boils down to the tools available for managing Postgres instances, and how to use them. Surprisingly, it’s not as straight-forward as you might think.

Having used Postgres for almost 15 years now, it’s easy to forget (or mentally block!) the early struggles. With new tools and wrappers always being developed, it’s not always clear what the best practices for managing a Postgres cluster actually are. Indeed, it often depends on how Postgres is installed.

Let’s start with a basic source-code installation. I won’t cover that process, but we can assume that after the smoke clears, the binaries are somewhere in /usr/bin or /usr/local/bin, and are thus in our execution path. Given that, let’s say we have an available mount point at /data and want to create a new cluster there. Here’s how that might go:

sudo mkdir /data/pgsql
sudo chown postgres:postgres /data/pgsql
sudo su - postgres
 
initdb -D /data/pgsql/my_db
pg_ctl -D /data/pgsql/my_db start

We now have an active Postgres instance at /data/pgsql/my_db. It’s extremely common to use the postgres OS user for this kind of thing, hence all of our sudo commands to prepare. It’s entirely possible to do this as a regular user, but I usually don’t recommend that approach.

In any case, this type of installation essentially depends on the pg_ctl command-line tool. It does everything related to controlling a Postgres instance. But it’s also annoying to use the -D parameter all the time when using Postgres tools, so there are several environment variables that can also do the job. This lets us prime our environment with .bashrc, for example. Let’s stop the instance:

export PGDATA=/data/pgsql/my_db
 
pg_ctl stop -m fast

Why the “-m fast” part? By default, Postgres is exceedingly polite. If we had simply asked it to stop, it would patiently wait for any pending transactions to complete before stopping. But if there are several users connected, some may leave transactions idle, or a really long process may be running, or maybe we’re just in a hurry. Using a fast shutdown tells Postgres to abort any transactions and stop as soon as it can—safely, of course. There’s no data loss, but a few users or applications may grumble a bit.

At the system level, it’s very common for a global service management to wrap much of this process. To start or stop Postgres on CentOS or RedHat, we’d do something like this:

sudo service postgresql-9.4 start
sudo service postgresql-9.4 stop

On a Debian or Ubuntu system, it would be this:

sudo service postgresql start
sudo service postgresql stop

And this is where things start to get complicated. These two major Linux flavors can’t even agree on what to name the service control mechanism, and both have extremely differing views on handling multiple versions. Both however, go to great lengths to “hide” the Postgres-provided binaries so that only the known wrappers can access them unless we cheat and add the binary location directly to our PATH variable.

By and large, that isn’t actually necessary, but it does add a major complication: standardization. As in, there isn’t any. Depending on distribution, wrappers will take different parameters, install Postgres in varying locations, and have divergent control methods. Let’s dive a little deeper into Debian/Ubuntu and their slightly more comprehensive wrappers.

Ubuntu assumes any number of Postgres versions may be installed at any one time. As such, starting and stopping the postgresql service on an Ubuntu system will cascade that operation to any configured instances hosted on the machine. What if we only want to manage a single instance? First, we need to know which instances are even available. This is where the cluster tools come in.

Because a single Postgres instance can represent several individual databases, it’s not uncommon to refer to it as a cluster. Debian and thus Ubuntu took that concept and ran with it and made something that’s actually pretty handy. Let’s get a look at a test system with multiple running clusters:

sudo su - postgres
 
pg_lsclusters
 
Ver Cluster Port Status Owner    Data directory               Log file
9.4 main    5432 online postgres /var/lib/postgresql/9.4/main /var/log/postgresql/postgresql-9.4-main.log

Neat, eh? This tool gives us a lot of information that’s immediately useful. With multiple clusters running on one system, we need to know which port each is running on. Knowing where the data resides, and where to find logs has obvious merit. And to differentiate each, the clusters are named. Let’s repeat that instance creation from above using a couple more wrapper tools:

pg_createcluster 9.4 my_db -D /data/pgsql/my_db
pg_ctlcluster 9.4 my_db start
pg_lsclusters
 
Ver Cluster Port Status Owner    Data directory               Log file
9.4 main    5432 online postgres /var/lib/postgresql/9.4/main /var/log/postgresql/postgresql-9.4-main.log
9.4 my_db   5433 online postgres /data/pgsql/my_db            /var/log/postgresql/postgresql-9.4-my_db.log

The pg_createcluster tool takes the place of initdb on Debian-based systems like Ubuntu. It’s aware of the other wrappers and recommended settings, locations, and so on. It also did us the favor of automatically incrementing the port so the clusters can run simultaneously. And pg_ctlcluster supplants pg_ctl for similar reasons. It should be fairly clear the tools are inherently designed such that the version is a required parameter. They’re pretty dedicated to the concept of hosting multiple clusters and Postgres versions on a single server.

Since they’re wrappers, they also accept many of the same parameters. If we wanted to stop the my_db cluster quickly, we could use a similar command as before:

pg_ctlcluster 9.4 my_db stop -m fast

Note that there’s one major omission here: we didn’t specify the data directory. Of course, the wrappers know where all of the clusters live, so we don’t need to. The version and name uniquely identify all clusters under this architecture, removing the necessity of memorizing cluster locations or hacking our environment with variables.

All of this comes with something of a caveat related to configuration. Normally, Postgres configuration files reside within the data directory itself, making the database a self-contained entity that is not tied to any one server. If the data is on a SAN for instance, it could be mounted on a replacement server and started with no modifications. But to have wrappers, the wrappers need a standard location to find information regarding the clusters they can control. On systems that use these tools, that’s the /etc/postgresql directory.

Here’s what the configuration directories look like on our test system:

find /etc/postgresql -type d
 
/etc/postgresql
/etc/postgresql/9.4
/etc/postgresql/9.4/my_db
/etc/postgresql/9.4/main

It’s a simple organization to remember, and closely resembles the version + name format all of the tools require. All of the configuration files the Postgres documentation or online articles refer to, actually live here instead. Most UNIX systems expect configuration files somewhere in the /etc directory, so that’s not unexpected. But it does introduce a rather unusual complication.

Some Postgres tools unrelated to this organization expect configuration files to be in the Postgres data directory. If we were to restore a backup of the data directory on an unrelated server, we would be unable to start it, and we would have no idea who should be allowed to connect. To compound the issue, the cluster tools themselves would have no way to administer it, because the /etc files would be missing.

The easy solution would be to include the corresponding /etc/postgresql folder in the backup process itself. We could (and should) also use configuration management to distribute these files to replacement or restoration servers. But that doesn’t fix external tools that look for config files in the data directory, so I tend to also do a bit of file linking. For example:

for x in /etc/postgresql/9.4/my_db/*.conf; do
  ln -s $x /data/pgsql/my_db
done

This way we don’t have to worry; the necessary files are all there and we don’t lose any benefits provided by the cluster tools.

In any case, I hope any developers reading this are now better equipped to experiment with local instances. It’s unfortunate there’s not enough standardization across the distributions in this regard, but at least now you have a starting point and some insight at how it got this way. Enjoy!


Tags: , , ,

PG Phriday: Parallel Sequence Scans

November 13th, 2015 | Published in Database, Tech Talk | 2 Comments


A couple days ago, Robert Haas announced that he checked in the first iteration of parallel sequence scans in the Postgres 9.6 branch. And no, that’s not a typo. One of the great things about the Postgres devs is that they have a very regimented system of feature freezes to help ensure timely releases. Thus even though 9.5 just released its second beta, they’re already working on 9.6.

So what is a sequence scan, and why does this matter? Past articles have covered this, but Postgres uses sequence scans when it needs to read the entire contents of a table. For larger entities consisting of tens or hundreds of millions of rows, this is a time-consuming process. Even the best processor can only handle a few million rows per second, so as scale increases vertically, there’s currently no way to address these larger data volumes efficiently without significant effort.

As slow as a sequence scan is, it’s also a relatively simple process, and a great starting point for the long road to a parallel Postgres engine. Postgres knows how big the table files are and obviously knows its own storage format, so it merely needs to set scan ranges and farm them out to workers that report the results back to a coordinator process. In theory, that’s a clean transition that avoids complicated aggregation rules or clause pushdown headaches.

But how well does it work? To see this for myself, I did something I’ve actually never done before now: download the Postgres git tree. Then I set up a test case:

CREATE TABLE test_tab AS
SELECT a.id, repeat(' ', 20) AS junk
  FROM generate_series(1, 20000000) a(id);
 
ANALYZE test_tab;

With this test table, there are 20-million rows of empty junk and no indexes, so we’re effectively forcing Postgres to use a sequence scan for any query. Then we have to enable the feature with the max_parallel_degree parameter. And finally we invoke a query with a naive WHERE clause applied to every row so the engine has to actually do some work.

SET max_parallel_degree TO 1;
 
EXPLAIN ANALYZE
SELECT *
  FROM test_tab
 WHERE junk LIKE '%s%';
 
                             QUERY PLAN
--------------------------------------------------------------------
 Gather  (cost=1000.00..265706.30 ROWS=1 width=25)
         (actual TIME=1832.456..1832.456 ROWS=0 loops=1)
   NUMBER OF Workers: 1
   ->  Parallel Seq Scan ON test_tab
         (cost=0.00..264706.20 ROWS=1 width=25)
         (actual TIME=1828.830..5489.878 ROWS=0 loops=1)
         FILTER: (junk ~~ '%s%'::text)
         ROWS Removed BY FILTER: 29594528
 Planning TIME: 0.062 ms
 Execution TIME: 1834.332 ms

There’s a lot going on here. First, we need to talk about how many processes actually worked on this query. The max_parallel_degree parameter controls how many background workers assist the main process, so there are actually two Postgres processes performing independent sequence scans. Some parallel systems use the parent as a mere coordinator, so we might expect the number of workers to be greater than 1 before actual parallel operation occurs. That isn’t the case with this implementation.

The query itself simply asks for something that doesn’t exist in our sample set. This helps us get a best-case scenario where no results are handed between the processes, but time is still consumed scanning table data. And the resulting plan from that query is rather different from a standard scan. We can see how many extra workers were involved and that the results were “gathered” by an extra execution layer.

The only odd detail is that 29-million rows were removed from the results of a 20-million row table. We’ll just chalk that up as an implementation quirk considering this is pre-alpha code. Otherwise, this patch appears to scale in a relatively linear manner. Let’s check out a few different variants of max_parallel_degree.

Workers Avg Time (s)
0 3.8
1 1.9
2 1.3
3 1.1
4 0.9

There’s a bit of jitter in the timings on our test system, but the trend is fairly clear. With no extra workers, one process can scan a 20M row table in about four seconds. With three extra workers, those four processes can perform the same task in about one second.

This iteration of the patch takes the size of the table into account, possibly to compensate for implicit worker management overhead. With 20M rows, we couldn’t get more than five dedicated workers, while 100M rows utilized seven workers to utilize all eight of our CPU cores. Beyond that are a couple important caveats:

  1. Aggregates are not currently handled.
  2. Certain clauses are not pushed down into the workers.

The first can actually be circumvented rather easily. For example:

\timing
 
SET max_parallel_degree TO 3;
 
SELECT COUNT(*) FROM (
  SELECT *
    FROM test_tab
   WHERE junk LIKE '%s%'
) s;
 
 COUNT 
-------
     0
(1 ROW)
 
TIME: 1111.904 ms

The second is a natural result of the patch’s immaturity. We need to have parallel functionality before it can be optimized. I’m perfectly content waiting for it to be done right. In the meantime, we have all of the functionality added to make this possible. After 9.4 added background workers, new Postgres extensions began leveraging them. And now 9.6 will probably use them for their original purpose, based on how stable the patch appears so far.

It’s exciting to see that Postgres will finally be able to scale vertically in a way that can handle the large tables some organizations have been accumulating. We have a 30TB database with tens of billions of rows. Even though we’ll be sharding that in the future, looming parallel features imply the shards themselves will scale as we cross over into trillions of rows.

It’s an exciting time to be a Postgres end-user.


Tags: , , ,

« Older Posts

Newer Posts »