2011 PGOpen is now Closed

Well, that was an interesting couple of days. Unfortunately my birthday came right after and I didn’t feel like writing anything for the duration. Now though? Why not!

In the end, I think my presentation, NVRam for Fun and Profit went over okay. Not a ton of people showed up, but I did get ambushed afterwards with questions after I got off the stage. Why those people didn’t ask while I was in Presentation Mode, I can’t quite understand. But Greg Smith was there, and he played the required heckler, so it wasn’t all bad.

I haven’t gotten any direct feedback yet, but Greg did say about the only comment he saw, was that attendees wished I’d gone into more depth. As to that, I totally agree. I also realize I completely mislabeled my talk. The issue of course, is that with dozens of talks available, reading all those abstracts is somewhat onerous. I certainly didn’t read them all, and really, what does NVRam have to do with PostgreSQL? It adequately described my talk, but wasn’t exciting enough to get someone to click and read the abstract and longer description. Apparently most of the conference planners immediately flagged it for acceptance, so my real problem is marketing. I should have called my talk: “Death at Ten Thousand Transactions per Second.”

The whole point, after all, was that our system takes about 250-million transactions on any given day, and that number is only going up as we garner clients for our platform. Knowing how to scale to that kind of IO is a critical need for a growing database community, so my talk went over our problem, methods to alleviate it without buying hardware, existing and bleeding edge tech that can help solve it, our ultimate choice, and the justifications therein. It’s good material, but now I realize I made a more fundamental error in judgment than merely misnaming my talk.

I like the theoretical approach, accumulation of techniques and information. I always have. Most things are academic to me. What I should have done for my talk however, is do the opposite of that. While I couldn’t provide a demonstration of our database, and pretty graphs tell part of the story, these people are hands-on bare-metal psychopaths. I should have showed a slide or two with raw iostat output. I should have put up a graph of our basic failover node architecture. I should have described our availability stack, and gone into all the gritty details of our XFS formatting and mount options, our LVM and DRBD support layer, our Pacemaker and Heartbeat controls, our bcfg2 cluster config model.

And then I should have explained why hate slony, and how it ultimately affects an extremely OLTP system. How we approached partial indexes to streamline our storage and lookup speeds. Our postgresql.conf settings both before and after the NVRAM upgrade and why those settings make sense for that kind of hardware. Then, and only then, should I have started laying out how PCIe cards help alleviate or even outright eliminate several potential and existing problems. There was some of that in my graphs and quoted RAID performance stats, but it was ultimately shallow because I was trying to cover too much material: our complete transition. But that took months of research, analysis, and testing.

But that’s what you learn from experience, right? So I’m not too broken up about it. There’s always a next time. I can even tweak this talk for future iterations, and include all of that nifty information. Knowing how to survive an extremely transactional environment is an important bit of information, and I’m not entirely convinced there’s adequate material out there that really describes functional approaches. So much stuff is home-brewed, that there’s quite a bit of experimentation and fiddling.

I can’t come up with a recipe, but I can at least provide a starting point for interested DBAs. FusionIO worked for us precisely because it integrated well with our IO needs, and now we’re migrating to a hybrid system so we can take advantage of more space while still getting the most of those random IOPS. Maybe next year I’ll be able to submit a talk on a horizontal partitioning strategy I’m examining, because we’re going to need it soon. Regardless of how capable one particular node may be, real high availability is won through zoning and shared-nothing data allocation bins. Facebook can bring down sections of its network without affecting the others; we can’t.

But we will. We’ll need to. The whole point of scaling is anticipating and engineering to promote robust application performance and uptime. We double our client count every year, and have done so consistently for the past four years. While that’s clearly not a sustainable growth curve, we’ll outgrow the capabilities of our current architecture in a couple years at most, and we better have an improvement in place by then. The PostgreSQL 9.x branch and my sharding system can give us that. Those are the kind of exciting scaling challenges and solutions I’d want to see presented at a conference!

Of course, I wouldn’t have to do all that if the PostgreSQL core hackers would just add auto-sharding and multi-master replication . . . lazy devs.