Foreign Tables are not as Useful as I Hoped

June 11th, 2014 | Published in Database, Tech Talk | 5 Comments


When I heard about foreign tables using the new postgres_fdw foreign data wrapper in PostgreSQL 9.3, I was pretty excited. We hadn’t upgraded to 9.3 so I waited until we did before I did any serious testing. Having done more experimentation with it, I have to say I’m somewhat disappointed. Why? Because of how authentication was implemented.

I’m going to get this out of the way now: The postgres_fdw foreign data wrapper only works with hard-coded plain-text passwords, forever the bane of security-conscious IT teams everywhere. These passwords aren’t even obfuscated or encrypted locally. The only implemented security is that the pg_user_mapping table is limited to superuser access to actually see the raw passwords. Everyone else sees this:

postgres=> SELECT * FROM pg_user_mapping;
ERROR:  permission denied for relation pg_user_mapping

The presumption is that a database superuser can change everyone’s password anyway, so it probably doesn’t matter that it’s hardcoded and visible in this view. And the developers have a point; without the raw password, how can a server-launched client log into the remote database? Perhaps the real problem is that there’s no mechanism for forwarding authentication from database to database.

This is especially problematic when attempting to federate a large database cluster. If I have a dozen nodes that all have the same user credentials, I have to create mappings to every single user, for every single foreign table, on every single independent node, or revert to trust-based authentication.

This can be scripted to a certain extent, but to what end? If a user were to change their own password, this breaks every foreign data wrapper they could previously access. This user now has to give their password to the DBA to broadcast across all the nodes with modifications to the user mappings. In cases where LDAP, Kerberos, GSSAPI, peer, or other token forwarding authentication is in place, this might not even be possible or advised.

Oracle solved this problem by tying DBLINK tables to a specific user during creation time. An access to a certain table authenticates as that user in all cases. This means a DBA can set aside a specific user for foreign table access purposes, and use a password that’s easy to change across the cluster if necessary. Grants take care of who has access to these objects. Of course, since postgres_fdw is read/write, this would cause numerous permissions concerns.

So what are we left with? How can we actually use PostgreSQL foreign tables securely? At this point, I don’t believe it’s possible unless I’m missing something. And I’m extremely confused at how this feature got so far along without any real way to lock it down in the face of malleable passwords. Our systems have dozens of users who are forced by company policy to change their passwords every 90 days, thus none of these users can effectively access any foreign table I’d like to create.

And no, you can’t create a mapping and then grant access to it. In the face of multiple mapping grants, which one would PostgreSQL use? No, if there’s a way to solve this particular little snag, it won’t be that convenient. If anyone has ideas, or would like to go into length at how wrong I am, please do! Otherwise, I’m going to have to use internal users of my own design and materialized views to wrap the foreign tables; extremely large tables will need some other solution.


Tags: , , , ,

PGCon 2014 Unconference: A Community

May 27th, 2014 | Published in Contemplation, Database, News, Tech Talk | 4 Comments


This May, I attended my first international conference: PGCon 2014. Though the schedule spanned from May 20th to May 23rd, I came primarily for the talks. Then there was the Unconference on the 24th. I’d never heard of such a thing, but it was billed as a good way to network and find out what community members want from PostgreSQL. After attending the Unconference, I must admit I’m exceptionally glad it exists; it’s something I believe every strong Open Source project needs.

Why do I say that, having only been to one of them? It’s actually fairly simple. Around 10AM Saturday, everyone piled into the large lecture hall and had a seat. There were significantly fewer attendees, but most of the core committers remained for the festivities. We were promised pizza and PostgreSQL, and that’s all anyone needed. Josh Berkus started the festivities by announcing the rules and polling for ideas. The final schedule was pretty interesting in itself, but I was more enamored by the process and the response it elicited.

I’m no stranger to the community, and the mailing lists are almost overwhelmingly active. But these conversations, nearly all of them, are focused on assistance and hacker background noise. The thing that stood out to me during the Unconference planning was its organic nature. It wasn’t just that we chose the event schedule democratically. It wasn’t the wide range of topics. It wasn’t even the fact core members were there to listen. It was the engagement.

These people were excited and enjoying talking about PostgreSQL in a way I’ve never witnessed, and I’ve spoken at Postgres Open twice so far. I’ve seen several talks, been on both sides of the podium, and no previous experience even comes close. We were all having fun brainstorming about PostgreSQL and its future. For one day, it wasn’t about pre-cooked presentations chosen via committee, but about what “the community” wanted to discuss.

When it came time for the talks themselves, this atmosphere persisted. We agreed and disagreed, we had long and concise arguments for and against ideas, clarified our positions, and generally converged toward a loose consensus. And it was glorious. I know we were recording the sessions, so if you have the time when the videos are available, I urge you to watch just one so you can see the beauty and flow of our conversations.

I feel so strongly about this that I believe PGCon needs to start a day earlier. One unfortunate element about the Unconference is that it happens on a Saturday, when everyone wants to leave and return to their families. Worse, there is a marathon on Sunday, meaning it is difficult or even impossible to secure a hotel room for the Saturday event. People tend to follow the path of least resistance, so if there is a problem getting lodging, they won’t go.

And that’s a shame. Having a core of interested and engaged community members not only improves the reputation of PostgreSQL, but its advocacy as well. If people feel they can contribute without having to code, they’ll be more likely to do so. If those contributions, no matter how small, are acknowledged, their progenitors will stick around. I believe this is the grass-roots effort that makes PostgreSQL the future of the database world, and whoever came up with the Unconference deserves every accolade I can exclaim.

We need more of this. PostgreSQL has one of the most open communities I’ve had the pleasure of participating in, and that kind of momentum can’t really be forced. I hope every future PostgreSQL conference in every country has one of these, so everyone with the motivation can take part in the process.

Finally, find your nearest PostgreSQL User Group, and join the community. You’ll be glad you did.


Tags: , , , ,

Foreign Keys are Not Free

May 14th, 2014 | Published in Database, Tech Talk | 6 Comments


PostgreSQL is a pretty good database, and I enjoy working with it. However, there is an implementation detail that not everyone knows about, which can drastically affect table performance. What is this mysterious feature? I am, of course, referring to foreign keys.

Foreign keys are normally a part of good database design, and for good reason. They inform about entity relationships, and they verify, enforce, and maintain those relationships. Yet all of this comes at a cost that might surprise you. In PostgreSQL, every foreign key is maintained with an invisible system-level trigger added to the source table in the reference. At least one trigger must go here, as operations that modify the source data must be checked that they do not violate the constraint.

This query is an easy way to see how many foreign keys are associated with every table in an entire PostgreSQL database:

SELECT t.oid::regclass::text AS table_name, count(1) AS total
  FROM pg_constraint c
  JOIN pg_class t ON (t.oid = c.confrelid)
 GROUP BY table_name
 ORDER BY total DESC;

With this in mind, consider how much overhead each trigger incurs on the referenced table. We can actually calculate this overhead. Consider this function:

CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
  END LOOP;

  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '                    '
     WHERE id = i;
  END LOOP;

  DROP TABLE test_fk CASCADE;

  FOR i IN 1..key_count LOOP
    EXECUTE 'DROP TABLE test_fk_ref_' || i;
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;

The function is designed to create a simple two-column table, fill it with 100,000 records, and test how long it takes to update every record. This is purely meant to simulate a high-transaction load caused by multiple clients. I know no sane developer would actually update so many records this way.

The only parameter this function accepts, is the amount of tables it should create that reference this source table. Every referring table is empty, and has only one column for the reference to be valid. After the foreign key tables are created, it performs those 100,000 updates, and we can measure the output with our favorite SQL tool. Here is a quick test with psql:

\timing
SELECT fnc_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(5);
SELECT fnc_check_fk_overhead(10);
SELECT fnc_check_fk_overhead(15);
SELECT fnc_check_fk_overhead(20);

On our system, these timings were collected several times, and averaged 2961ms, 3805ms, 4606ms, 5089ms, and 5785ms after three runs each. As we can see, after merely five foreign keys, performance of our updates drops by 28.5%. By the time we have 20 foreign keys, the updates are 95% slower!

I don’t mention this to make you abandon foreign keys. However, if you are in charge of an extremely active OLTP system, you might consider removing any non-critical FK constraints. If the values are merely informative, or will not cause any integrity concerns, a foreign key is not required. Indeed, excessive foreign keys are actually detrimental to the database in a very tangible way.

I merely ask you keep this in mind when designing or revising schemas for yourself or developers you support.


Tags: , , ,

Trumping the PostgreSQL Query Planner

May 8th, 2014 | Published in Database, Tech Talk | No Comments


With the release of PostgreSQL 8.4, the community gained the ability to use CTE syntax. As such, this is a fairly old feature, yet it’s still misunderstood in a lot of ways. At the same time, the query planner has been advancing incrementally since that time. Most recently, PostgreSQL has gained the ability to perform index-only scans, making it possible to fetch results straight from the index, without confirming rows with the table data.

Unfortunately, this still isn’t enough. There are still quite a few areas where the PostgreSQL query planner is extremely naive, despite the advances we’ve seen recently. For instance, PostgreSQL still can’t do a basic loose index scan natively. It has to be tricked by using CTE syntax.

To demonstrate this further, imagine this relatively common scenario: an order processing system where clients can order products. What happens when we want to find the most recent order for all current customers? Boiled down to its minimum elements, this extremely simplified table will act as our order system.

CREATE TABLE test_order
(
  client_id   INT        NOT NULL,
  order_date  TIMESTAMP  NOT NULL,
  filler      TEXT       NOT NULL
);

Now we need data to test with. We can simulate a relatively old order processing system by taking the current date and subtracting 1,000 days. We can also bootstrap with 10,000 clients, and make the assumption that newer clients will be more active. This allows us to represent clients that have left our services as time goes on. So we start with this test data:

INSERT INTO test_order
SELECT s1.id,
       (CURRENT_DATE - INTERVAL '1000 days')::DATE 
           + generate_series(1, s1.id%1000),
       repeat(' ', 20)
  FROM generate_series(1, 10000) s1 (id);

The generate_series function is very handy for building fake data. We’re still not ready to use that data, however. Since we want to find the most recent order for all customers, we need an index that will combine the client_id and order_date columns in such a way that a single lookup will provide the value we want for any particular client. This index should do nicely:

CREATE INDEX idx_test_order_client_id_order_date
    ON test_order (client_id, order_date DESC);

Finally, we analyze to make sure the PostgreSQL engine has the most recent stats for our table. Just to make everything easily repeatable, we also set the default_statistics_target to a higher value than default as well.

SET default_statistics_target TO 500;
ANALYZE test_order;

Now we’ll start with the most obvious query. Here, we just use the client_id column and look for the max order_date for each:

EXPLAIN ANALYZE
SELECT client_id, max(order_date)
  FROM test_order
 GROUP BY client_id;

The query plan is fairly straight-forward, and will probably include a sequence scan. On the virtual server we’re testing with, the total runtime for us ended up looking like this:

Total runtime: 1117.408 ms

There is some variance, but the end result is just over one second per execution. We ran this query several times to ensure it was properly cached by PostgreSQL. Why didn’t the planner use the index we created? Let’s assume the planner doesn’t know what max does, and treats it like any other function. With that in mind, we can exploit a different type of syntax that should make the index much more usable. So let’s try DISTINCT ON with an explicit ORDER clause that matches the definition of our index:

EXPLAIN ANALYZE SELECT DISTINCT ON (client_id) client_id, order_date FROM test_order ORDER BY client_id, order_date DESC;

Well, this time our test system used an index-only scan, and produced the results somewhat faster. Our new runtime looks like this:

Total runtime: 923.300 ms

That’s almost 20% faster than the sequence scan. Depending on how much bigger the table is than the index, reading the index and producing these results can vary significantly. And while the query time improved, it’s still pretty bad. For systems with tens or hundreds of millions of orders, the performance of this query will continue to degrade along with the row count. We’re also not really using the index effectively.

Reading the index from top to bottom and pulling out the desired results is faster than reading the whole table. But why should we do that? Due to the way we built this index, the root node for each client should always represent the value we’re looking for. So why doesn’t the planner simply perform a shallow index scan along the root nodes? It doesn’t matter what the reason is, because we can force it to do so. This is going to be ugly, but this query will act just as we described:

EXPLAIN ANALYZE
WITH RECURSIVE skip AS
(
  (SELECT client_id, order_date
    FROM test_order
   ORDER BY client_id, order_date DESC
   LIMIT 1)
  UNION ALL
  (SELECT (SELECT min(client_id)
             FROM test_order
            WHERE client_id > skip.client_id
          ) AS client_id,
          (SELECT max(order_date)
             FROM test_order
            WHERE client_id = (
                    SELECT min(client_id)
                      FROM test_order
                     WHERE client_id > skip.client_id
                  )
          ) AS order_date
    FROM skip
   WHERE skip.client_id IS NOT NULL)
)
SELECT *
  FROM skip;

The query plan for this is extremely convoluted, and we’re not even going to try to explain what it’s doing. But the final query execution time is hard to discount:

Total runtime: 181.501 ms

So what happened here? How can the abusive and ugly CTE above outwit the PostgreSQL query planner? We use the same principle as described in the PostgreSQL wiki for loose index scans. We start with the desired maximum order date for a single client_id, then recursively begin adding clients one by one until the index is exhausted. Due to limitations preventing us from using the recursive element in a sub-query, we have to use the SELECT clause to get the next client ID and the associated order date for that client.

This technique works universally for performing sparse index scans, and actually improves as cardinality (the number of unique values) decreases. As unlikely as that sounds, since we are only using the root nodes within the index tree, performance increases when there are less root nodes to check. This is the exact opposite to how indexes are normally used, so we can see why PostgreSQL doesn’t natively integrate this technique. Yet we would like to see it added eventually so query authors can use the first query example we wrote, instead of the excessively unintuitive version that actually produced good performance.

In any case, all PostgreSQL DBAs owe it to themselves and their clusters to learn CTEs. They provide a powerful override for the query planner, and helps solve the edge cases it doesn’t yet handle.


Tags: , , , ,

Announcing walctl for PostgreSQL Transaction Log Management

March 27th, 2014 | Published in Database, Tech Talk | 2 Comments


I’ve managed to convince my employer to open source one of the tools I recently wrote. That tool goes by the name of walctl, and I believe the importance of this kind of tool can not be overstated.

The PostgreSQL Write Ahead Log (WAL) files are key to crash recovery, point in time recovery, and all standby use not derived from streaming replication. WAL files are extremely critical to proper database operation. Yet their archival is treated as an afterthought. Some people use regular cp while others go as far as to use rsync to send data to a replica server.

This isn’t enough. A much safer architecture is to involve three servers. One server produces WAL files. One server acts as a storage and distribution location. One (or more) final server consumes the WAL files as necessary. In instances where streaming replication gets disconnected and the replica is too far behind for the WAL files the master server has available, the archive is a good mechanism for catching up.

Well, wallctl does all of that. It forces you to do all of that. Just set up a couple SSH keys prior to installation, and it needs nothing else. I added a database clone tool as a convenience which needs superuser access to the master database, but otherwise, walctl is very unobtrusive. This toolkit is simple, and will have a few limited advancements in updated versions. It’s meant to be small, fast, and light, following the UNIX philosophy of doing one thing well. In fact, I wrote this tool after examining several other PostgreSQL WAL and replication management systems. Almost all of them require transmitting WAL files directly from the master server to one or more slave systems. But this presumes you only have one replica, or forces the master server to do much more work by contacting several systems in a row. Why not let the slaves do all the hard work?

I highly recommend all DBAs use a WAL management tool of some kind, no matter which. Barman and repmgr are great alternatives that do much more than walctl. But if all you want to do is stash your WAL files in a safe location that multiple replica servers can utilize, this is the easier path.


Tags: , , ,

« Older Posts

Newer Posts »