For part of today, I’ve been experimenting with the new-ish pg_shard extension contributed by CitusData. I had pretty high hopes for this module and was extremely excited to try it out. After screwing around with it for a while, I can say it has a lot of potential. Yet I can’t reasonably recommend it in its current form. The README file suggests quite a few understandable caveats, but it’s the ones they don’t mention that hurt a lot more.

Here’s what I encountered while experimenting:

  • No support for transactions.
  • No support for the EXPLAIN command to produce query plans.
  • No support for COPY for bulk loading.
  • A bug that causes worker nodes to reject use of CURRENT_DATE in queries.

The first two are probably the worst, and the third is hardly trivial. I’m pretty long-winded, but here’s my view on potential impact.

By itself, lacking transaction support makes pg_shard more of a toy in my opinion. This breaks the A in ACID, and as such, reduces PostgreSQL from a legitimate database, to a fun experiment in assuming bad data never makes it into a sharded table. I would never, in good conscience, deploy such a thing into a production environment.

By not providing EXPLAIN support, it is not possible to see what a query might do on a sharded cluster. This is not only incredibly dangerous, but makes it impossible to troubleshoot or optimize queries. Which shards would the query run on? How much data came from each candidate shard? There’s no way to know. It is possible to load the auto_explain module on each worker node to examine what it did, but there’s no way to check the query plan beforehand.

And what about COPY? The documentation states that INSERT is the only way to get data into a table. Outside of a transaction, multiple inserts are incredibly slow due to round trip time, single-transaction context, fsync delays, and the list goes on. I created a VM and threw a measly 100k individual inserts at a regular, unsharded table, and the whole job took over a minute. Replaying the script in a transaction cut that time down to ten seconds. On the pg_shard copy of the table with two worker nodes, the same inserts required two minutes and twenty seconds. For 100k records. Presumably this could be corrected by utilizing several loader threads in parallel, but I haven’t tested that yet.

The primary reason sharding might be used, is to horizontally scale a very large table. Based on the times I saw, it would take 15 days to load a table with one billion rows. The sample table I used was only four columns and had no indexes to slow the loading process. Yet the COPY statement needed only 300ms for the same amount of data, and could load one billion rows of that table in under an hour. So even if I ignored the lack of transactions and EXPLAIN support, getting our 20-billion rows of data into pg_shard simply wouldn’t be feasible.

I really, really wanted to consider pg_shard on one of the large multi-TB instances I administer. I still do. But for now, I’m going to watch the project and check in on it occasionally and see if they eventually work out these kinks. It’s a great prototype, and having CitusData behind it suggests it’ll eventually become something spectacular.

Of course, there’s always the possibility that an as yet unnamed project somewhere out there already is. If so, please point me to it; pg_shard teased my salivary glands, and I want more.

A Short Examination of pg_shard
Tagged on:                 

5 thoughts on “A Short Examination of pg_shard

  • (Ozgun from Citus Data here)

    Thank you for trying out pg_shard and writing about it. As you noted, pg_shard’s INSERT performance shines when you have multiple loader processes. We also made improvements for multiple loaders in the upcoming pg_shard 1.1 release.

    For example, at 64 shards and 256 concurrent processes, pg_shard 1.0 executed 14,532 inserts per second (ips) in our internal benchmark. In pg_shard 1.1, we had the throughput improve to 56,982 ips.

    We’re happy to hear about your feedback, and we’re actively working on making pg_shard better.

    1. That’s awesome to hear. I’m definitely keeping an eye on this, and will probably do a follow-up.

      I figured parallelism was a way to circumvent the performance problems I was having. A quick test piping it through ‘xargs -P’ didn’t work very well, but I’ll definitely put together something more robust next time—Probably a custom pgbench script.

Comments are closed.