PG Phriday: Growing Pains

Friday April 22, 2016

Page content

Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well. Performance tests showed huge improvements, initial deployments uncovered no outright incompatibilities, and the conversion was underway.

Then we started to use it.

Much to our chagrin, dev and test environments aren’t always indicative of real scenarios and utilization patterns. Once Postgres-XL reached our semi-production environment, everything started to go wrong. Here’s a list of what happened, and why our 43TB (and growing) database can’t use Postgres-XL until it matures a bit more. As a note, all of the examples listed here were observed using the latest 9.5r1 release.

I also want to strongly emphasize that this is not an indictment of Postgres-XL. Postgres greatly needs the capabilities it adds to truly break into the VLDB market. Sometimes though, projects aren’t as capable as we originally hoped, and subtle issues aren’t always readily apparent until software is field-tested. We’re still not ruling it out.

That said, on to the adventure.

Trigger Support

In short, there isn’t any. This at least, was something we knew about. In our case, it wasn’t a problem because a VLDB installation is commonly a warehouse of some description. These tend to depend on ETL or other source scripts to supply data that’s already prepared and doesn’t rely on post-insert transformations.

For anyone else, this could be a roadblock. Trigger support isn’t always essential to running an active database, but they’re common enough that a conversion process will have to contend with replacing them. This missing feature practically relegates Postgres-XL to warehouse use all by itself. Indeed, like us, many may consider triggers unnecessary in a horizontally distributed database or NoSQL databases would have never garnered so much attention.

But it is a missing feature a Postgres user would expect. This is the first lesson that Postgres-XL is a variant of Postgres with similar functionality, but it isn’t Postgres.

`SAVEPOINT` Support

Again, there isn’t any. This is one of those things we only found after scripts started failing. Savepoints allow transactions to retain integrity through several phases if the state and progress are verified. If something unexpected happens, the whole transaction doesn’t have to be abandoned outright. Clever database developers use this as something of a state machine to make multiple attempts, or perform alternate actions in case of failures within a transaction.

Perhaps due to internal transaction implementation details, Postgres-XL can’t retain this functionality. Distributing transactions and involving a transaction manager while also maintaining a state machine within it, is definitely not a trivial problem. That makes this a completely understandable omission, and since the feature is rarely utilized, many won’t even notice it’s missing.

Yet it was something we had to route around. Another thing to keep in mind when converting to Postgres-XL.

Sub-queries and `DISTINCT ON` Woes

This one caught us totally off-guard. The documentation on SELECT doesn’t seem to mention this, but it is indeed an issue with Postgres-XL. Observe:

CREATE TABLE foo (id INT);
CREATE TABLE bar (id INT, nid INT);

INSERT INTO foo
SELECT a.id FROM generate_series(1,100) a(id);

INSERT INTO bar
SELECT DISTINCT ON (f1.id) f1.id, f2.id
  FROM foo f1
  JOIN foo f2 USING (id)
 ORDER BY f1.id, f2.id DESC;

ERROR:  Postgres-XL does not currently support ORDER BY in subqueries

But is that error accurate? If we remove the INSERT preamble so the query executes on its own, we merely see the expected output of rows. So even if the DISTINCT ON is being internally transformed into a sub-query, it isn’t doing so consistently. The error is wrong anyway, because this works just fine:

INSERT INTO bar
SELECT a.id, b.id
  FROM foo a
  JOIN (SELECT id FROM foo ORDER BY id DESC) b USING (id);

Not only is that an ordered sub-query, it’s a JOIN on an ordered sub-query, an arguably more complex scenario. I can only assume this was missing functionality, but in implementing the necessary code changes to remove the warning, they missed a spot.

In any case, this really threw a wrench into our functions that acted as ETL components. We took the error at face value, and groaned at trying to rewrite DISTINCT ON without using sub-queries or ORDER BY. Ironically an even more convoluted query came to our rescue:

INSERT INTO bar
WITH sub AS (
  SELECT DISTINCT ON (f1.id) f1.id, f2.id
    FROM foo f1
    JOIN foo f2 USING (id)
   ORDER BY f1.id, f2.id DESC
)
SELECT * FROM sub;

This works because CTEs instantiate the results as a temporary table, which doesn’t trigger the error. Then the INSERT proceeds normally. Huh. Well, that’s a bit odd, but easily circumvented.

Then our luck ran out.

No TEMP TABLE For You!

Now we’re going from missing features and wonky implementations to outright bugs. The next issue that reared its ugly head was related to temporary tables. Well, maybe. We started seeing this in the logs:

ERROR:  could not open relation with OID 0

A quick Google led to this thread, which seemed to indicate a problem with temporary tables in stored procedures called two times consecutively within a session. Unfortunately the test cases provided in the thread do not replicate the problem on our installation. Regardless, there was a good chance it was related, so we removed temp tables anyway. Our solution leveraged UNLOGGED tables instead, because they have stable OIDs and were much less likely to invoke whatever edge case we were triggering. So our functions turned into this:

-- Functions like this caused the OID 0 errors.

CREATE OR REPLACE FUNCTION run_me()
RETURNS VOID AS
$$
BEGIN
  DROP TABLE IF EXISTS t_foo;
  CREATE TEMP TABLE t_foo (id INT);

  INSERT INTO t_foo
  SELECT a.id FROM generate_series(1,100) a(id);
$$
LANGUAGE plpgsql;

-- Using UNLOGGED tables and functions like this worked fine.
-- Of course, now concurrent function calls won't work.

CREATE UNLOGGED TABLE t_foo (id INT);

CREATE OR REPLACE FUNCTION run_me()
RETURNS VOID AS
$$
BEGIN
  TRUNCATE TABLE t_foo;

  INSERT INTO t_foo
  SELECT a.id FROM generate_series(1,100) a(id);
$$
LANGUAGE plpgsql;

And what would you know? Success! It wasn’t ideal, but we could limp along with the workaround until they fixed the bug. Or so we thought.

We Need ACID

Probably the most amusing part of this whole adventure is that the errors we kept encountering were almost immediately consecutive. Once we worked around one issue, we merely cleared the way for another completely unrelated problem. This time, we saw this error in the logs:

ERROR:  catalog is missing 25 attribute(s) for relid 36564

That’s not encouraging. Back to Google we went, giving us another thread detailing the problem. This time, the issue seems related to autovacuum somehow. It’s also probable that repeatedly truncating UNLOGGED tables wrought havoc on the Postgres system catalog, and Postgres-XL wasn’t properly propagating something. The theory at the end of the thread is unfortunately incorrect. Disabling autovacuum only reduces the bug’s prevalence, as the original reporter noted when I asked how to correct (or remove) the broken tables.

Regardless of the underlying cause, this was the final straw. Once those errors showed up, that meant the system catalog was corrupt. The UNLOGGED tables used by our functions were completely invalid, and worse, we couldn’t even drop them to start over. The only way out is a full dump/restore, a rather drastic measure because some ephemeral table became an actual Ghost in the Shell. And the real worry here—that the missing attributes would affect a real table—became the overriding concern.

Because the D in ACID stands for Duability, and this bug illustrates that the current incarnation of Postgres-XL has none. Database metadata is sacrosanct, because without it, all that remains is indecipherable binary blobs taking up disk space. There’s no way I’d dump 40TB of data on an installation, when history suggests some of that data could mysteriously go missing.

The End of the Road

And that’s the real misfortune. No matter how much we desperately wanted to deploy Postgres-XL into our warehouse, we literally can’t. Doing so at this point would be irresponsible and dangerous. Some of the problems we encountered are understandable and workarounds exist for most. But the outright deluge of roaches in our sundae, with a rotten cherry on top, is impossible to ignore.

I have every confidence in 2ndQuadrant eventually resolving the worst roadblocks. Before Postgres-XC became Postgres-XL, it practically languished in Development Limbo, accumulating code smell and copious edge cases from managing node interactions. I’ll continue installing every subsequent release in our dev and QA environments until there’s a legitimate reliability breakthrough. I still maintain that Postgres-XL is the way forward for highly distributed horizontal VLDB installations.

Just… not yet.

Trigger Support

SAVEPOINT Support

Sub-queries and DISTINCT ON Woes

No TEMP TABLE For You!

We Need ACID

The End of the Road

See Also

`SAVEPOINT` Support

Sub-queries and `DISTINCT ON` Woes