Finally Done With High Availability

July 29th, 2014 | Published in Database, News, Tech Talk, Writing | 6 Comments


Well, my publisher recently informed me that the book I’ve long been slaving over for almost a year, is finally finished. I must admit that PostgreSQL 9 High Availability Cookbook is somewhat awkward as a title, but that doesn’t detract from the contents. I’d like to discuss primarily why I wrote it.

When Packt first approached me in October of 2013, I was skeptical. I have to admit that I’m not a huge fan of the “cookbook” style they’ve been pushing lately. Yet, the more I thought about it, the more I realized it was something the community needed. I’ve worked almost exclusively with PostgreSQL since at late 2005 with databases big and small. It was always the big ones that presented difficulties.

Back then, disaster recovery nodes were warm standby through continuous recovery at best, and pg_basebackup didn’t exist. Nor did pg_upgrade, actually. Everyone had their own favorite backup script, and major upgrades required dumping the entire database and importing it in the new version. To work with PostgreSQL then required a much deeper understanding than is necessary now. Those days forced me to really understand how PostgreSQL functions, which caveats to acknowledge, and which needed redress.

One of those caveats that still called out to me, was one of adoption. With a lot of the rough edges removed in recent releases of PostgreSQL, came increased usage in small and large businesses alike. I fully expected PostgreSQL to be used in a relatively small customer acquisition firm, for instance, but then I started seeing it in heavy-duty financial platforms. Corporate deployments of PostgreSQL require various levels of high availability, from redundant hardware, all the way to WAL stream management and automated failover systems.

When I started working with OptionsHouse in 2010, their platform handled 8,000 database transactions per second. Over the years, that has increased to around 17k, and I’ve seen spikes over 20k. At these levels, standard storage solutions break down, and even failover systems are disruptive. Any outage must be as short as possible, and be instantly available with little to no dependency on cache warming. Our backup system had to run on the warm standby or risk slowing down our primary database. Little by little, I broke the database cluster into assigned roles to stave off the total destruction I felt was imminent.

I was mostly scared of the size of the installation and its amount of activity. Basic calculations told me the database handled over a billion queries per day, at such a rate that even one minute of downtime could potentially cost us tens of thousands in commissions. But I had no playbook. There was nothing I could use as a guide so that I knew what to look for when things went wrong, or how I could build a stable stack that generally took care of itself. It was overwhelming.

This book, as overly verbose as the title might be, is my contribution to all of the DBAs out there that might have to administer a database that demands high availability. It’s as in-depth as I could get without diverging too much from the cookbook style, and there are plenty of links for those who want to learn beyond the scope of its content. The core however, is there. Anyone with a good understanding of Linux could pick it up and weave a highly available cluster of PostgreSQL systems without worrying, or having to build too many of their own tools.

If I’ve helped even one DBA with this high availability book, I’ll consider my mission accomplished. It’s the culmination of years of experimentation, research, and performance testing. I owe it to the PostgreSQL community—which has helped me out of many jams—to share my experience how I can.

Thanks, everyone!


Tags: , , ,

Friends Don’t Let Friends Use Loops

July 25th, 2014 | Published in Database, Programming, Tech Talk | 14 Comments


Programming is fun. I love programming! Ever since I changed my career from programming to database work, I’ve still occasionally dabbled in my former craft. As such, I believe I can say this with a fair amount of accuracy: programmers don’t understand databases. This isn’t something small, either; there’s a fundamental misunderstanding at play. Unless the coder happens to work primarily with graphics, bulk set-based transformations are not something they’ll generally work with.

For instance, if tasked with inserting ten thousand records into a database table, a programmer might simply open the data source and insert them one by one. Consider this basic (non-normalized) table with a couple basic indexes:

CREATE TABLE sensor_log (
  sensor_log_id    SERIAL PRIMARY KEY,
  location         VARCHAR NOT NULL,
  reading          BIGINT NOT NULL,
  reading_date     TIMESTAMP NOT NULL
);

CREATE INDEX idx_sensor_log_location ON sensor_log (location);
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

Now suppose we have a file with ten thousand lines of something like this:

38c-1401,293857,2014-07-25 10:18:38-05:00
69a-8921,209574,2014-07-25 10:18:25-05:00
9e5-0942,690134,2014-07-25 10:18:16-05:00

To load this data, our coder chooses Python and whips up an insert script. Let’s even give the programmer the benefit of the doubt, and say they know that prepared queries are faster due to less overhead. I see scripts like this all the time, written in languages from Java to Erlang. This one is no different:

import psycopg2

db_conn = psycopg2.connect(database = 'postgres', user = 'postgres')
cur = db_conn.cursor()

cur.execute(
  """PREPARE log_add AS
     INSERT INTO sensor_log (location, reading, reading_date)
     VALUES ($1, $2, $3);"""
)

file_input = open('/tmp/input.csv', 'r')
for line in file_input:
    cur.execute("EXECUTE log_add(%s, %s, %s)", line.strip().split(','))
file_input.close()

cur.execute("DEALLOCATE log_add");

db_conn.commit()
db_conn.close()

It’s unlikely we have the /tmp/input.csv file itself, but we can generate one. Suppose we have 100 locations each with 100 sensors. We could produce a fake input file with this SQL:

COPY (
    SELECT substring(md5((a.id % 100)::TEXT), 1, 3) || '-' ||
           to_char(a.id % 100, 'FM0999') AS location,
           (a.id * random() * 1000)::INT AS reading,
           now() - (a.id % 60 || 's')::INTERVAL AS reading_date
      FROM generate_series(1, 10000) a (id)
) TO '/tmp/input.csv' WITH CSV;

Whew! That was a lot of work. Now, let’s see what happens when we time the inserts on an empty import table:

real    0m1.162s
user    0m0.168s
sys     0m0.122s

Well, a little over one second isn’t that bad. But suppose we rewrote the python script a bit. Bear with me; I’m going to be silly and use the python script as a simple pass-through. This should simulate a process that applies transformations and outputs another file for direct database import. Here we go:

import psycopg2

file_input = open('/tmp/input.csv', 'r')
processed = open('/tmp/processed.csv', 'w+')

for line in file_input:
    parts = line.strip().split(',')
    processed.write(','.join(parts) + '\n')

file_input.close()
processed.seek(0)

db_conn = psycopg2.connect(database = 'postgres', user = 'postgres')
cur = db_conn.cursor()
cur.copy_from(
    processed, 'sensor_log', ',',
    columns = ('location', 'reading', 'reading_date')
)

processed.close()

db_conn.commit()
db_conn.close()

Now let’s look at the timings involved again:

real    0m0.365s
user    0m0.056s
sys     0m0.004s

That’s about three times faster! Considering how simple this example is, that’s actually pretty drastic. We don’t have many indexes, the table has few columns, and the number of rows is relatively small. The situation gets far worse as all of those things increase.

It’s also not the end of our story. What happens if we enable autocommit, so that each insert gets its own transaction? Some ORMs might do this, or a naive developer might try generating a single script full of insert statements, and not know much about transactions. Let’s see:

real    1m31.794s
user    0m0.233s
sys     0m0.172s

Oh. Ouch. What normally takes around a third of a second can balloon all the way out to a minute and a half. This is one of the reasons I strongly advocate educating developers on proper data import techniques. It’s one thing for a job to be three to five times slower due to inefficient design. It’s quite another to be nearly 250 times slower simply because a programmer believes producing the output file was fast, so logically, inserting it should be similarly speedy. Both scenarios can be avoided by educating anyone involved with data manipulation.

This doesn’t just apply to new hires. Keeping everyone up to date on new techniques is equally important, as are refresher courses. I care about my database, so when possible, I try to over-share as much information as I can manage. I even wrote and presented several talks which I periodically give to our application developers to encourage better database use. Our company Wiki is similarly full of information, which I also present on occasion, if only because reading technical manuals can be quite boring.

If my database is being abused, it’s my job as a DBA to try and alleviate the situation any way I can. Sometimes, that means telling people what they’re doing wrong, and how they can fix it. I certainly didn’t know all of this ten years ago when I was primarily a coder. But I would have appreciated being pointed in the right direction by someone with more experience in the field.

Your database and users deserve it.


Tags: , , , ,

Foreign Tables are not as Useful as I Hoped

June 11th, 2014 | Published in Database, Tech Talk | 5 Comments


When I heard about foreign tables using the new postgres_fdw foreign data wrapper in PostgreSQL 9.3, I was pretty excited. We hadn’t upgraded to 9.3 so I waited until we did before I did any serious testing. Having done more experimentation with it, I have to say I’m somewhat disappointed. Why? Because of how authentication was implemented.

I’m going to get this out of the way now: The postgres_fdw foreign data wrapper only works with hard-coded plain-text passwords, forever the bane of security-conscious IT teams everywhere. These passwords aren’t even obfuscated or encrypted locally. The only implemented security is that the pg_user_mapping table is limited to superuser access to actually see the raw passwords. Everyone else sees this:

postgres=> SELECT * FROM pg_user_mapping;
ERROR:  permission denied for relation pg_user_mapping

The presumption is that a database superuser can change everyone’s password anyway, so it probably doesn’t matter that it’s hardcoded and visible in this view. And the developers have a point; without the raw password, how can a server-launched client log into the remote database? Perhaps the real problem is that there’s no mechanism for forwarding authentication from database to database.

This is especially problematic when attempting to federate a large database cluster. If I have a dozen nodes that all have the same user credentials, I have to create mappings to every single user, for every single foreign table, on every single independent node, or revert to trust-based authentication.

This can be scripted to a certain extent, but to what end? If a user were to change their own password, this breaks every foreign data wrapper they could previously access. This user now has to give their password to the DBA to broadcast across all the nodes with modifications to the user mappings. In cases where LDAP, Kerberos, GSSAPI, peer, or other token forwarding authentication is in place, this might not even be possible or advised.

Oracle solved this problem by tying DBLINK tables to a specific user during creation time. An access to a certain table authenticates as that user in all cases. This means a DBA can set aside a specific user for foreign table access purposes, and use a password that’s easy to change across the cluster if necessary. Grants take care of who has access to these objects. Of course, since postgres_fdw is read/write, this would cause numerous permissions concerns.

So what are we left with? How can we actually use PostgreSQL foreign tables securely? At this point, I don’t believe it’s possible unless I’m missing something. And I’m extremely confused at how this feature got so far along without any real way to lock it down in the face of malleable passwords. Our systems have dozens of users who are forced by company policy to change their passwords every 90 days, thus none of these users can effectively access any foreign table I’d like to create.

And no, you can’t create a mapping and then grant access to it. In the face of multiple mapping grants, which one would PostgreSQL use? No, if there’s a way to solve this particular little snag, it won’t be that convenient. If anyone has ideas, or would like to go into length at how wrong I am, please do! Otherwise, I’m going to have to use internal users of my own design and materialized views to wrap the foreign tables; extremely large tables will need some other solution.


Tags: , , , ,

PGCon 2014 Unconference: A Community

May 27th, 2014 | Published in Contemplation, Database, News, Tech Talk | 4 Comments


This May, I attended my first international conference: PGCon 2014. Though the schedule spanned from May 20th to May 23rd, I came primarily for the talks. Then there was the Unconference on the 24th. I’d never heard of such a thing, but it was billed as a good way to network and find out what community members want from PostgreSQL. After attending the Unconference, I must admit I’m exceptionally glad it exists; it’s something I believe every strong Open Source project needs.

Why do I say that, having only been to one of them? It’s actually fairly simple. Around 10AM Saturday, everyone piled into the large lecture hall and had a seat. There were significantly fewer attendees, but most of the core committers remained for the festivities. We were promised pizza and PostgreSQL, and that’s all anyone needed. Josh Berkus started the festivities by announcing the rules and polling for ideas. The final schedule was pretty interesting in itself, but I was more enamored by the process and the response it elicited.

I’m no stranger to the community, and the mailing lists are almost overwhelmingly active. But these conversations, nearly all of them, are focused on assistance and hacker background noise. The thing that stood out to me during the Unconference planning was its organic nature. It wasn’t just that we chose the event schedule democratically. It wasn’t the wide range of topics. It wasn’t even the fact core members were there to listen. It was the engagement.

These people were excited and enjoying talking about PostgreSQL in a way I’ve never witnessed, and I’ve spoken at Postgres Open twice so far. I’ve seen several talks, been on both sides of the podium, and no previous experience even comes close. We were all having fun brainstorming about PostgreSQL and its future. For one day, it wasn’t about pre-cooked presentations chosen via committee, but about what “the community” wanted to discuss.

When it came time for the talks themselves, this atmosphere persisted. We agreed and disagreed, we had long and concise arguments for and against ideas, clarified our positions, and generally converged toward a loose consensus. And it was glorious. I know we were recording the sessions, so if you have the time when the videos are available, I urge you to watch just one so you can see the beauty and flow of our conversations.

I feel so strongly about this that I believe PGCon needs to start a day earlier. One unfortunate element about the Unconference is that it happens on a Saturday, when everyone wants to leave and return to their families. Worse, there is a marathon on Sunday, meaning it is difficult or even impossible to secure a hotel room for the Saturday event. People tend to follow the path of least resistance, so if there is a problem getting lodging, they won’t go.

And that’s a shame. Having a core of interested and engaged community members not only improves the reputation of PostgreSQL, but its advocacy as well. If people feel they can contribute without having to code, they’ll be more likely to do so. If those contributions, no matter how small, are acknowledged, their progenitors will stick around. I believe this is the grass-roots effort that makes PostgreSQL the future of the database world, and whoever came up with the Unconference deserves every accolade I can exclaim.

We need more of this. PostgreSQL has one of the most open communities I’ve had the pleasure of participating in, and that kind of momentum can’t really be forced. I hope every future PostgreSQL conference in every country has one of these, so everyone with the motivation can take part in the process.

Finally, find your nearest PostgreSQL User Group, and join the community. You’ll be glad you did.


Tags: , , , ,

Foreign Keys are Not Free

May 14th, 2014 | Published in Database, Tech Talk | 6 Comments


PostgreSQL is a pretty good database, and I enjoy working with it. However, there is an implementation detail that not everyone knows about, which can drastically affect table performance. What is this mysterious feature? I am, of course, referring to foreign keys.

Foreign keys are normally a part of good database design, and for good reason. They inform about entity relationships, and they verify, enforce, and maintain those relationships. Yet all of this comes at a cost that might surprise you. In PostgreSQL, every foreign key is maintained with an invisible system-level trigger added to the source table in the reference. At least one trigger must go here, as operations that modify the source data must be checked that they do not violate the constraint.

This query is an easy way to see how many foreign keys are associated with every table in an entire PostgreSQL database:

SELECT t.oid::regclass::text AS table_name, count(1) AS total
  FROM pg_constraint c
  JOIN pg_class t ON (t.oid = c.confrelid)
 GROUP BY table_name
 ORDER BY total DESC;

With this in mind, consider how much overhead each trigger incurs on the referenced table. We can actually calculate this overhead. Consider this function:

CREATE OR REPLACE FUNCTION fnc_check_fk_overhead(key_count INT)
RETURNS VOID AS
$$
DECLARE
  i INT;
BEGIN
  CREATE TABLE test_fk
  (
    id   BIGINT PRIMARY KEY,
    junk VARCHAR
  );

  INSERT INTO test_fk
  SELECT generate_series(1, 100000), repeat(' ', 20);

  CLUSTER test_fk_pkey ON test_fk;

  FOR i IN 1..key_count LOOP
    EXECUTE 'CREATE TABLE test_fk_ref_' || i || 
            ' (test_fk_id BIGINT REFERENCES test_fk (id))';
  END LOOP;

  FOR i IN 1..100000 LOOP
    UPDATE test_fk SET junk = '                    '
     WHERE id = i;
  END LOOP;

  DROP TABLE test_fk CASCADE;

  FOR i IN 1..key_count LOOP
    EXECUTE 'DROP TABLE test_fk_ref_' || i;
  END LOOP;

END;
$$ LANGUAGE plpgsql VOLATILE;

The function is designed to create a simple two-column table, fill it with 100,000 records, and test how long it takes to update every record. This is purely meant to simulate a high-transaction load caused by multiple clients. I know no sane developer would actually update so many records this way.

The only parameter this function accepts, is the amount of tables it should create that reference this source table. Every referring table is empty, and has only one column for the reference to be valid. After the foreign key tables are created, it performs those 100,000 updates, and we can measure the output with our favorite SQL tool. Here is a quick test with psql:

\timing
SELECT fnc_check_fk_overhead(0);
SELECT fnc_check_fk_overhead(5);
SELECT fnc_check_fk_overhead(10);
SELECT fnc_check_fk_overhead(15);
SELECT fnc_check_fk_overhead(20);

On our system, these timings were collected several times, and averaged 2961ms, 3805ms, 4606ms, 5089ms, and 5785ms after three runs each. As we can see, after merely five foreign keys, performance of our updates drops by 28.5%. By the time we have 20 foreign keys, the updates are 95% slower!

I don’t mention this to make you abandon foreign keys. However, if you are in charge of an extremely active OLTP system, you might consider removing any non-critical FK constraints. If the values are merely informative, or will not cause any integrity concerns, a foreign key is not required. Indeed, excessive foreign keys are actually detrimental to the database in a very tangible way.

I merely ask you keep this in mind when designing or revising schemas for yourself or developers you support.


Tags: , , ,

« Older Posts

Newer Posts »