Review: PostgreSQL Server Programing
There comes a time in every DBA’s life, that he needs to add functionality to his database software. To most DBAs, and indeed for most databases, this amounts to writing a few stored procedures or triggers. In extremely advanced cases, the database may provide an API for direct C-language calls. PostgreSQL however, has gone above and beyond this for several years, and have continuously made the process easier with each iteration.
So once again, I’m glad to review a book by three authors in the industry who either work directly on PostgreSQL internals, or use it extensively enough to contribute vastly important functionality. Hannu Krosing, Jim Mlodgenski, and Kirk Roybal collaborated to produce PostgreSQL Server Programming, a necessary and refreshing addition to the PostgreSQL compendium. I don’t know who contributed each individual chapter, but I can make a fairly educated guess that anything PL/Proxy related came from Mr. Krosing, its original designer.
As usual for a book of this type, things start off with relative simplicity. The authors make a very important note I try to convey to staff developers regularly: let the database do its job. The database is there to juggle data, handle set theory, and otherwise reduce traffic to and from the application to a minimum. This saves both network bandwidth and processing time on the front end, which can be combined with caching to service vastly larger infrastructures than otherwise possible.
Beyond this, are the basics. Using stored procedures, taking advantage of triggers, writing functions that can return sets. The gamut of examples runs from data auditing and logging, to integrity control and a certain extent of business logic. One or more of the authors suggests that functions are the proper interface to the database, to reduce overhead, and provide an abstract API that can change without directly altering the application code. It is, after all, the common denominator in any library or tool dealing with the data. While I personally don’t agree with this type of approach, the logical reasoning is sound, and can help simplify and prevent many future headaches.
But then? Then comes the real nitty-gritty. PostgreSQL allows interfacing with the database through several languages, including Python, Perl, and even TCL/TK, and the authors want you to know it. Databases like Oracle have long allowed C-level calls to the database, and this has often included Java in later incarnations. PostgreSQL though, is the only RDBMS that acts almost like its own middle layer. It’s a junction that allows JSON (a Javascript encapsulation) accessed via Python, to be filtered by a TCL trigger, on a table that matched data through an index produced by a Perl function. The authors provide Python and C examples for much of this scenario, including the JSON!
And that’s where this book really shines: examples. There’s Python, C, PLPGSQL, triggers, procedures, compiled code, variants of several difficult techniques, and more. In the C case, things start with a simple “Hello World” type you might see in a beginning programming class, and the author steps through increasingly complex examples. Eventually, the C code is returning sets of sets of data per call, as if simulating several table rows.
In the more concrete, the authors provide copious links to external documentation and Wiki pages for those who want to explore this territory in more depth. Beyond that, they want readers to know about major sources of contributed code and extensions, all to make the database more useful, and perhaps entice the reader join in the fun. Everything from installing, to details necessary for writing extensions is covered, so that is well within the realm of possibility!
I already mentioned that at least one of the authors encourages functional database access instead of direct SQL. Well, there’s more than the obvious reasons for this: PL/Proxy is a procedural language that uses functions to facilitate database sharding for horizontal scalability. Originally designed for Skype, PL/Proxy has been used by many other projects. While it might not apply to everyone, sharding is a very real technique with non-trivial implementation details that have stymied the efforts of many development teams.
I actually would have liked a more extensive chapter or two regarding PL/Proxy. While several examples of functional access are outlined for a chat server, none of these functions are later modified in a way that would obviously leverage PL/Proxy. Further, Krosing doesn’t address how sequences should be distributed so data on all the various servers get non-conflicting surrogate keys. It would have been nice to see an end-to-end implementation.
All in all, anyone serious about their PostgreSQL database should take a look. Getting a server up and running is only half the story; making the database an integral component of an application instead of a mere junk drawer provides more functionality with fewer resources. It’s good to see a book that not only emphasizes this, but conveys the knowledge in order to accomplish such a feat. Hannu, Jim, and Kirk have all done the community a great service. I hope to see revisions of this in the future as PostgreSQL matures.