PG Phriday: Date Based Partition Constraints

March 20th, 2015 | Published in Database, Tech Talk | 3 Comments

PostgreSQL has provided table partitions for a long time. In fact, one might say it has always had partitioning. The functionality and performance of table inheritance has increased over the years, and there are innumerable arguments for using it, especially for larger tables consisting of hundreds of millions of rows. So I want to discuss a quirk that often catches developers off guard. In fact, it can render partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning documentation. And the pg_partman extension at PGXN follows the standard partitioning model to automate many of the pesky tasks for maintaining several aspects of partitioning. With modules like this, there’s no need to manually manage new partitions, constraint maintenance, or even some aspects of data movement and archival.

However, existing partition sets exist, and not everyone knows about extensions like this, or have developed in-house systems instead. Here’s something I encountered recently:

CREATE TABLE sys_order
    order_id     SERIAL       PRIMARY KEY,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()

CREATE TABLE sys_order_part_201502 ()
       INHERITS (sys_order);

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-02-01'::DATE + INTERVAL '1 mon');

This looks innocuous enough, but PostgreSQL veterans are already shaking their heads. The documentation alludes to how this could be a problem:

Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month changes the right boundary of this range constraint into a dynamic value. PostgreSQL will not use dynamic values in evaluating check constraints. Here’s a query plan from PostgreSQL 9.4.1, which is the most recent release as of this writing:

SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Well, it looks like the PostgreSQL planner wants to check both tables, even though the constraint we added to the child does not apply. Now, this isn’t a bug per se, but it might present as somewhat counter-intuitive. Let’s replace the constraint with one that does not use a dynamic value and try again:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-03-01'::DATE);

SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Wait a minute… what happened here? There’s no dynamic values; the constraint is a simple pair of static dates. Yet still, PostgreSQL wants to check both tables. Well, this was a trick question of sorts, because the real answer lies in the data types used in the constraint. The TIMESTAMP WITH TIME ZONE type, you see, is not interchangeable with TIMESTAMP. Since the time zone is preserved in this type, the actual time and date can vary depending on how it’s cast.

Watch what happens when we change the constraint to match the column type used for order_dt:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::TIMESTAMPTZ AND
             order_dt < '2015-03-01'::TIMESTAMPTZ);

SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
 Append  (cost=0.00..0.00 rows=1 width=20)
   ->  Seq Scan on sys_order  ...

Now all of the types will be directly compatible, removing any possibility of time zones being cast to a different date than the constraint uses. This is an extremely subtle type mismatch, as many developers and DBAs alike, consider these types as interchangeable. This is further complicated by the fact DATE seems to be the best type to use for the constraint, since time isn’t relevant to the desired boundaries.

It’s important to understand that even experienced developers and DBAs can get types wrong. This is especially true when including information like the time zone appears completely innocent. In fact, it’s the default PostgreSQL datetime type for a very good reason: time zones change. Without the time zone, data in the column is bound to the time zone wherever the server is running. That this applies to dates as well, can come as a bit of a surprise.

The lesson here is to always watch your types. PostgreSQL removed a lot of automatic casting in 8.3, and received no small amount of backlash for doing so. However, we can see how subtly incompatible types can cause major issues down the line. In the case of partitioning, a type mismatch can be the difference between reading 10-thousand rows, or 10-billion.

Tags: , ,

PG Phriday: Interacting with JSON and JSONB

March 13th, 2015 | Published in Database, Tech Talk | No Comments

With the release of PostgreSQL 9.4, comes the ability to use binary JSON objects. This internal representation is faster and more capable than the original JSON included in 9.3. But how do we actually interact with JSON and JSONB in a database connection context? The answer is actually a little complicated and somewhat surprising.

Casting. Casting Everywhere.

Despite its inclusion as an internal type, PostgreSQL maintains its position as encouraging explicit casting to avoid bugs inherent in magic type conversions. Unfortunately, JSON blurs several lines in this regard, and this could lead to confusion on several fronts.

Let’s take a look at JSON first. Here are three very basic JSON documents for illustration:

{ "name": "cow-man" }
{ "weight": 389.4 }
{ "alive": true }

Nothing crazy. We have a string, a number, and a boolean. The PostgreSQL JSON type documentation suggests it handles these internally, which we can see for ourselves.

SELECT '{ "name": "cow-man" }'::JSON;

 { "name": "cow-man" }

SELECT '{ "weight": 389.4 }'::JSON;

 { "weight": 389.4 }

SELECT '{ "alive": true }'::JSON;

 { "alive": true }

Great! We can see the string, the number, and the boolean preserved in PostgreSQL’s encoding. Things start to go a bit sideways when we pull fields, though:

SELECT '{ "name": "cow-man" }'::JSON->'name';


So far, so good. The PostgreSQL JSON documentation for functions and operators says that the -> operator returns a JSON object. And indeed, we can re-cast this string to JSON:

SELECT '"cow-man"'::JSON;


What happens when we try to compare two JSON objects, though?

SELECT '{ "name": "cow-man" }'::JSON->'name' = '"cow-man"'::JSON;

ERROR:  operator does not exist: json = json

Wait… what? Hmm. Let’s try the same thing with JSONB:

SELECT '{ "name": "cow-man" }'::JSONB->'name' = '"cow-man"'::JSONB;


That’s something of a surprise, isn’t it? It’s pretty clear from this that JSON and JSONB are much more than simply how the data gets encoded and stored. It also drastically affects how it’s possible to interact with the data itself.

Don’t relax yet, though! JSON and JSONB casting only succeed on TEXT or VARCHAR similar types. For example, these don’t work:


But these do:


So even though PostgreSQL acknowledges JSON datatypes, it can’t convert between those and its own internal types. A PostgreSQL NUMERIC is similar to a JSON NUMBER, but they’re not interchangeable, and can’t even be casted without first going through some kind of TEXT type. This is the same for boolean values. The only type that is treated natively is a string-based value.

While it may seem inconvenient to always use another type as an intermediary when interacting with JSON, that’s the current reality.

Just use TEXT and JSONB

If we reexamine the JSON type documentation, we also see the ->> operator. This not only pulls the indicated field, but automatically casts it to text. This means that we can turn this ugly monstrosity:

SELECT ('{ "field": "value" }'::JSON->'field')::TEXT;

Into this:

SELECT '{ "field": "value" }'::JSON->>'field';

From here, we can perform any action normally possible with a text-based value. This is the only way to pull a JSON or JSONB field directly into a PostgreSQL native type.

All of this would suggest that the safest way to work with JSON or JSONB is through text. Ironically, text is also the only way to exchange comparisons between JSON and JSONB. Observe:

SELECT '"moo"'::JSON = '"moo"'::JSONB;

ERROR:  operator does not exist: json = jsonb

And yet:

SELECT '"moo"'::JSON::TEXT = '"moo"'::JSONB::TEXT;


Well, then. What this means is pretty clear: convert at the last minute, and always use some kind of text value when dealing with JSON and JSONB.

While I’m here, I’d also like to point out a somewhat amusing side-effect of how JSONB works as opposed to JSON. Textual data gets converted to JSONB automatically when JSONB is one of the equalities. What does that mean? All of these are valid, and note that I’m quoting everything so it’s treated as text:

SELECT '"moo"'::JSONB = '"moo"';
SELECT '365'::JSONB = '365';
SELECT 'true'::JSONB = 'true';

But all of these produce an error:

SELECT '"moo"'::JSON = '"moo"';
SELECT '365'::JSON = '365';
SELECT 'true'::JSON = 'true';

This alone suggests that the lack of interoperability between JSON and JSONB is more of an oversight, and that JSON is missing some casting rules. Hopefully, that means 9.5 will carry some corrections in this regard. It’s hard to imagine PostgreSQL will leave JSON as a lame-duck datatype that was only really useful for 9.3 while JSONB was being developed.

If not, I guess that means I don’t have to revisit this topic in the future. Everyone knows I love being lazy.

Tags: , ,

A Short Examination of pg_shard

March 12th, 2015 | Published in Database, Tech Talk | 5 Comments

For part of today, I’ve been experimenting with the new-ish pg_shard extension contributed by CitusData. I had pretty high hopes for this module and was extremely excited to try it out. After screwing around with it for a while, I can say it has a lot of potential. Yet I can’t reasonably recommend it in its current form. The README file suggests quite a few understandable caveats, but it’s the ones they don’t mention that hurt a lot more.

Here’s what I encountered while experimenting:

  • No support for transactions.
  • No support for the EXPLAIN command to produce query plans.
  • No support for COPY for bulk loading.
  • A bug that causes worker nodes to reject use of CURRENT_DATE in queries.

The first two are probably the worst, and the third is hardly trivial. I’m pretty long-winded, but here’s my view on potential impact.

By itself, lacking transaction support makes pg_shard more of a toy in my opinion. This breaks the A in ACID, and as such, reduces PostgreSQL from a legitimate database, to a fun experiment in assuming bad data never makes it into a sharded table. I would never, in good conscience, deploy such a thing into a production environment.

By not providing EXPLAIN support, it is not possible to see what a query might do on a sharded cluster. This is not only incredibly dangerous, but makes it impossible to troubleshoot or optimize queries. Which shards would the query run on? How much data came from each candidate shard? There’s no way to know. It is possible to load the auto_explain module on each worker node to examine what it did, but there’s no way to check the query plan beforehand.

And what about COPY? The documentation states that INSERT is the only way to get data into a table. Outside of a transaction, multiple inserts are incredibly slow due to round trip time, single-transaction context, fsync delays, and the list goes on. I created a VM and threw a measly 100k individual inserts at a regular, unsharded table, and the whole job took over a minute. Replaying the script in a transaction cut that time down to ten seconds. On the pg_shard copy of the table with two worker nodes, the same inserts required two minutes and twenty seconds. For 100k records. Presumably this could be corrected by utilizing several loader threads in parallel, but I haven’t tested that yet.

The primary reason sharding might be used, is to horizontally scale a very large table. Based on the times I saw, it would take 15 days to load a table with one billion rows. The sample table I used was only four columns and had no indexes to slow the loading process. Yet the COPY statement needed only 300ms for the same amount of data, and could load one billion rows of that table in under an hour. So even if I ignored the lack of transactions and EXPLAIN support, getting our 20-billion rows of data into pg_shard simply wouldn’t be feasible.

I really, really wanted to consider pg_shard on one of the large multi-TB instances I administer. I still do. But for now, I’m going to watch the project and check in on it occasionally and see if they eventually work out these kinks. It’s a great prototype, and having CitusData behind it suggests it’ll eventually become something spectacular.

Of course, there’s always the possibility that an as yet unnamed project somewhere out there already is. If so, please point me to it; pg_shard teased my salivary glands, and I want more.

Tags: , , , ,

PG Phriday: Materialized Views, Revisited

March 6th, 2015 | Published in Database, Tech Talk | 3 Comments

Materialized views are a great improvement to performance in many cases. Introduced in PostgreSQL 9.3, they finally added an easy method for turning a view into a transient table that could be indexed, mined for statistics for better planner performance, and easily rebuilt. Unfortunately, refreshing a materialized view in PostgreSQL 9.3 caused a full exclusive lock, blocking any use until the process was complete. In 9.4, this can finally be done concurrently, though there are still a couple caveats.

Fake data is always best to illustrate, so let’s create a very basic table and a materialized view based on its contents. Heck, why not use the table from last week’s PG Phriday? For the purposes of this article, we’ll modify it slightly so there’s more than one day of orders.

CREATE TABLE sys_order
    order_id     SERIAL     NOT NULL,
    product_id   INT        NOT NULL,
    item_count   INT        NOT NULL,
    order_dt     TIMESTAMP  NOT NULL DEFAULT now()

INSERT INTO sys_order (product_id, item_count, order_dt)
SELECT ( % 100) + 1, (random()*10)::INT + 1,
       CURRENT_DATE - ((random()*30)::INT || 'days')::INTERVAL
  FROM generate_series(1, 1000000) a(id);

ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);

ANALYZE sys_order;

Now, how would we create a materialized view from this? There are a million rows constructed of 100 products, with varying order totals. A good way to use materialized views is to collect the underlying data into some kind of aggregate. How about product order totals for the entire day?

SELECT order_dt, product_id, sum(item_count) AS item_total
  FROM sys_order
 GROUP BY order_dt, product_id;

CREATE UNIQUE INDEX udx_daily_orders_order_dt
    ON mv_daily_orders (order_dt, product_id);

ANALYZE mv_daily_orders;

Note that we added a unique index to the order date and product ID. This is a subtle but required element to concurrently refresh a materialized view. Without it, we would get this error trying a concurrent refresh:

ERROR:  cannot refresh materialized view "public.mv_daily_orders" concurrently
HINT:  Create a unique index with no WHERE clause on one or more columns of the materialized view.

In addition, we also analyzed the table. It’s always good practice to ANALYZE a table after significant modifications so statistics are fresh. The planner and your query run times will thank you. There is an automatic daemon that will analyze tables after a certain threshold of changes, but this threshold may be too low, especially for extremely large tables that are on the receiving end of nightly insert jobs.

The materialized view is much smaller than the original, making it theoretically a better choice for reports and summaries. Let’s take a look at the last five days of sales for product ID number 1:

SELECT order_dt, item_total
  FROM mv_daily_orders
 WHERE order_dt >= CURRENT_DATE - INTERVAL '4 days'
   AND product_id = 1
 ORDER BY order_dt;

      order_dt       | item_total
 2015-03-02 00:00:00 |       1875
 2015-03-03 00:00:00 |       1977
 2015-03-04 00:00:00 |       2150
 2015-03-05 00:00:00 |       1859
 2015-03-06 00:00:00 |       1003

Great! The 6th isn’t over yet, so let’s insert some more orders.

INSERT INTO sys_order (product_id, item_count, order_dt)
SELECT ( % 100) + 1, (random()*10)::INT + 1, CURRENT_DATE
  FROM generate_series(1, 1000) a(id);

Materialized views to not update when their parent table is modified. This means we would have to update the view ourselves by calling REFRESH MATERIALIZED VIEW. In order to illustrate that 9.4 doesn’t lock these during a refresh, we’ll send these commands to the first connection:


Notice that we didn’t end the transaction? That will preserve any locks PostgreSQL allocates while we attempt to access data in the materialized view. Next, we need a second connection where we try to use the view:

SELECT count(1) FROM mv_daily_orders;


If this were a 9.3 database, we couldn’t use the CONCURRENTLY keyword, and the second connection would have hung until the first connection issued a COMMIT statement. Otherwise, we have normal transaction isolation controls. The view contents would look as if they had not been refreshed until the transaction is committed. Let’s look at those totals again:

      order_dt       | item_total
 2015-03-02 00:00:00 |       1875
 2015-03-03 00:00:00 |       1977
 2015-03-04 00:00:00 |       2150
 2015-03-05 00:00:00 |       1859
 2015-03-06 00:00:00 |       1067

Now we have 64 more sales for product number 1.

But what about those caveats? First of all, this is from the PostgreSQL Wiki regarding the concurrent update process:

Instead of locking the materialized view up, it instead creates a temporary updated version of it, compares the two versions, then applies INSERTs and DELETEs against the materialized view to apply the difference. This means queries can still use the materialized view while it’s being updated.

This means a concurrent update needs to re-run the definition query, compare it against the existing rows of the view, and then merge in the changes using INSERT and DELETE statements. Note that in our case, it would have been much more efficient to simply ‘top off’ the view by getting the totals for the current day and replacing them manually. A “real” sys_order table would be much larger than a short 30-day window, and such volume would impart a profound performance impact.

But we can’t do that with the built-in materialized view structure. All manual attempts to modify the view produce this error:

ERROR:  cannot change materialized view "mv_daily_orders"

Drat. Granted, such variance would not be reflected in our view definition and would therefore not be recommended. However, it would be nice if we could define a function and bind that as a callback when REFRESH MATERIALIZED VIEW is invoked, especially when rebuilding the entire data set from scratch is slow and inefficient. The assumption in such a case would be that the function would leave the view in an accurate state when complete.

A concurrent refresh does not lock the view for use, but can require significant resources and time to rebuild and merge. Why this approach was chosen? The PostgreSQL TRUNCATE command is also transaction safe and entirely atomic. A new table is created based on the definition of the current one, but empty, and the new structures replace the old ones when no transactions refer to them any longer. This same process could have been used for the concurrent view updates, considering the definition query has to be re-run anyway. There are probably some complicated internals that would have made this difficult, but I still wonder.

Which brings us to the second caveat: VACUUM. Since the contents of the materialized view are directly modified with DELETE and INSERT behind the scenes, that leaves dead rows according to PostgreSQL’s MVCC storage mechanism. This means the view has to be vacuumed following a refresh or there’s a risk the structure will bloat over time. Now, the autovacuum daemon should take care of this for us in the long run, but it’s still an element to consider.

In the end, CONCURRENT is a vastly needed improvement. Once a materialized view is created, there’s a high likelihood it will require periodic updates. If end users can’t treat materialized views the same way as they would a normal table or view, they’ll eventually migrate away from using them at all. That’s the path of least resistance, after all; why use a materialized view when it could be exclusively locked for long periods of time.

Such conduct directly conflicts with the expectation MVCC normally provides, and end users have come to expect: writers do not block readers. Since it’s only the data being modified, and not the structure of the view, I personally wonder why CONCURRENTLY isn’t the default behavior. As such, if you’re using a 9.4 database in conjunction with materialized views, I strongly encourage using the CONCURRENTLY keyword at all times when performing refreshes.

Here’s hoping materialized views become even more viable in future releases!

Tags: , , ,

Recipe: Amazeballs Chili

March 4th, 2015 | Published in Recipe | No Comments

Every once in a while, I get the itch to cook something. This time around, chili was my poison of choice because this winter won’t end and I’m silently protesting. This is probably the best chili I’ve ever made, so I’m required by law to share it. Seriously, this stuff is amazeballs.


  • 1 lb ground pork
  • 1 lb ground beef
  • 1 28oz can crushed tomatoes
  • 1 28oz can stewed tomatoes
  • 2 15.5oz cans red kidney beans
  • 2 15.5oz cans great northern beans
  • 2 15.5oz cans black beans
  • 1 12oz can tomato paste
  • 1/4 – 1/3 cup chili powder
  • 1.5 tbsp cumin
  • 1 tbsp salt
  • 2 tsp paprika
  • 1 tsp cayenne pepper
  • 1 tsp white pepper
  • 1 tsp coriander


  1. Mix all dry ingredients (chili powder, cumin, salt, paprika, cayenne, white pepper, coriander) in a bowl.
  2. Brown ground pork and beef in large saucepan. Add half of the spice mixture to the meat as it cooks.
  3. Combine all ingredients in a large pot or crock pot (do not drain or clean beans) and simmer on low for 6-8 hours, stirring occasionally.

Conclusion / Notes

I didn’t add garlic because Jen is allergic. Otherwise, I consider this recipe basically perfect. Unfortunately I only eyeballed the spices as I was adding them, so the above measurements are only approximate.


« Older Posts

Newer Posts »