PG Phriday: A Postgres Perspective on MongoDB

July 15th, 2016 | Published in Database, Tech Talk | 17 Comments


I’ve been almost exclusively a Postgres DBA for a seemingly interminable length of time. While this is good for specializing, nobody wants to be a One-Trick Pony. And aside from the occasional bit of Python to write more advanced tools when Bash isn’t up to the job, it’s All Postgres All The Time. While few things would make me happier, it pays to branch out occasionally.

When NoSQL databases hit the scene a few years ago, I pretty much ignored them wholesale. I read about what they did and how they did it, and while intriguing, none of that seemed particularly better than what Postgres delivered. Cassandra could scale outwards, but has no real concept of NULL, and limits searches to “indexed” columns. Yet sites like Reddit prove how well it can scale when properly leveraged. MongoDB is basically a native JavaScript filesystem, which I ignored because Postgres support of JSON and JSONB effectively deprecated it before it was even on my radar. There are others of course, in CouchDB, Redis, HBase, and a plethora of alternatives.

There’s only so much time in a day though, and being as we use MongoDB for a couple of projects, it only makes sense to investigate how it really works. What can a Postgres user, who eats, sleeps, and breathes Postgres, hope to learn from a NoSQL database? I guess we can start with some simple and basic operations. Can I make a table, fill it with 1-million rows, update a few, delete a few, add an extra index for future searches, and perform a query or two?

Let’s start with the Postgres schema I use for practically everything:

CREATE TABLE sensor_log (
  id            SERIAL NOT NULL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
\timing
 
INSERT INTO sensor_log (id, location, reading, reading_date)
SELECT s.id, s.id % 1000, s.id % 100,
       CURRENT_DATE - ((s.id * 10) || 's')::INTERVAL
  FROM generate_series(1, 1000000) s(id);
 
TIME: 7236.906 ms
 
CREATE INDEX idx_sensor_log_date
    ON sensor_log (reading_date);
 
TIME: 804.044 ms

Perhaps the most appropriate question to ask at this junction would be: what is the MongoDB equivalent? Well, as strange as this might sound, there isn’t one. When I described MongoDB as a native JavaScript filesystem, I was being literal. There really are no tables, schemas, or databases as we might think of them. A MongoDB “collection” can be loosely interpreted as a table, in that it stores documents that can be thought of as rows, but that’s where the similarity ends.

A MongoDB collection has no structure at all. Let that sink in a bit. MongoDB collections can not be declared, and as such, literally anything can be stored in one. Check this out:

use mydb
 
db.myCol.insert( {animal: "cow", says: "moo"} )
db.myCol.insert( {length: 12, seven: [ 8, 9 ]} )
 
db.myCol.find()
 
{ "_id" : ObjectId("5788fc361b81473a2b3d1998"), "animal" : "cow", "says" : "moo" }
{ "_id" : ObjectId("5788fc3a1b81473a2b3d1999"), "length" : 12, "seven" : [ 8, 9 ] }

While it’s in our best interests to keep similar documents restricted to certain collections, there’s no kind of constraint that enforces this. This makes it easy to keep tangentially related objects together, with dozens or hundreds of non-contiguous attributes, so long as there is some common expected core. Yet, we don’t have to. As a consequence, we can’t actually declare a predefined structure, required keys, or any other kind of constraint.

In fact, we can learn quite a lot from that previous code block. First of all, the “mydb” database was never defined. Neither was “myCol”, the collection used to store our bogus documents. Simply storing a document made them exist. It might be better to think of MongoDB databases and collections as root folders and subdirectories in a filesystem. When a document is stored, it triggers the creation of any necessary parent elements. Once instantiated, the database and collection will persist after being emptied unless explicitly dropped.

Another thing that’s hard to miss, is the presence of the _id field in our output. All MongoDB documents require one of these, and if we don’t specify one, MongoDB gracefully provides it. Of course, the default MongoDB injects is of limited utility due to its arbitrary nature, but that means we can override that behavior.

With all of that in mind, let’s create the same table and fill it with sample data:

use mydb
 
var currentDate = new Date();
currentDate.setHours(0,0,0,0)
var batch = new Array()
 
var start = new Date()
for (var i = 1; i <= 1000000; i++) {
  batch[i] = {
        _id: i,
        location: i % 1000,
        reading: i % 100,
        readingDate: new Date(currentDate - i * 10000)
    }
}
 
db.sensorLog.insert(batch)
 
(new Date() - start) / 1000
 
31.791
 
start = new Date()
db.sensorLog.ensureIndex( { readingDate: 1 } )
(new Date() - start) / 1000
 
3.2

We can see that for small data like this, Postgres is a bit more time-efficient for allocating new data. If we examine the filesystem, it’s also apparent this fake sensor data requires 448MB on MongoDB, while Postgres stored everything in 93MB. This is mostly because MongoDB manages storage by preallocating large segments under the assumption the empty space will soon be filled.

In any case, we learn even more from all of the above code. One benefit of MongoDB being a JavaScript native engine, is that we can directly manipulate objects using standard JavaScript syntax. It’s also convenient the insert method allows passing document arrays, as inserting them one-by-one was orders of magnitude slower. Unfortunately, that means the mongo shell doesn’t provide automatic timing output like psql provides.

On the other hand, MongoDB allows us to define whole libraries of local functions that would only be valid for the current session, giving us macro potential Postgres users can only imagine. The necessity of wrapping everything in some kind of SQL-valid statement can often be a hindrance.

Of course, a natural implication of using a JavaScript Engine is that object names are case sensitive. This is a given for nearly any language aside from SQL, but some of us will need to note the context change or we can get ourselves in trouble. This is another good reason to always name Postgres database objects in lowercase and use underscores. We don’t want to learn any inconsistent capitalization habits that may adversely affect us in other systems.

Let’s check out a few statements using Postgres and our sensor_log table.

\timing
 
UPDATE sensor_log
   SET reading = reading + 1
 WHERE reading_date >= CURRENT_DATE - INTERVAL '8 days'
   AND reading_date < CURRENT_DATE - INTERVAL '7 days';
 
TIME: 114.671 ms
 
DELETE FROM sensor_log
 WHERE reading_date >= CURRENT_DATE - INTERVAL '9 days'
   AND reading_date < CURRENT_DATE - INTERVAL '8 days';
 
TIME: 16.248 ms
 
SELECT COUNT(*) FROM sensor_log;
 
 COUNT  
--------
 991360
(1 ROW)
 
TIME: 111.820 ms
 
SELECT *
  FROM sensor_log
 WHERE reading_date < CURRENT_DATE - INTERVAL '2 week'
 ORDER BY reading_date ASC
 LIMIT 5
OFFSET 20;
 
   id   | location | reading |    reading_date     
--------+----------+---------+---------------------
 999980 | 980      |      80 | 2016-03-21 06:16:40
 999979 | 979      |      79 | 2016-03-21 06:16:50
 999978 | 978      |      78 | 2016-03-21 06:17:00
 999977 | 977      |      77 | 2016-03-21 06:17:10
 999976 | 976      |      76 | 2016-03-21 06:17:20
(5 ROWS)
 
TIME: 1.034 ms

Let’s examine each MongoDB equivalent, with the timings in milliseconds. This first one is a translation of our UPDATE statement:

var currentDate = new Date()
currentDate.setHours(0,0,0,0)
 
start = new Date()
db.sensorLog.update({
    readingDate: {
        $gte: new Date(currentDate.getTime() - 8*86400000),
        $lt: new Date(currentDate.getTime() - 7*86400000)
    }
  },
  { $inc: { reading: 1 } },
  { multi: true }
)
(new Date() - start)
 
WriteResult({ "nMatched" : 8640, "nUpserted" : 0, "nModified" : 8640 })
77

It’s a bit faster than our Postgres version, but a bit more inconvenient to write. One universal element with MongoDB is the increased complexity of statements. The primary tradeoff for having such convenient access to JavaScript in MongoDB, is that everything must be a JavaScript object. Much like Postgres forces us to wrap everything in a SQL-based statement, MongoDB demands even the simplest query invokes a JavaScript function.

In the case of the update function, the first parameter is the search, and the second is the list of changes we want to make. It’s here where we encounter constructs like $inc to increment the value of a field. There are several of these specialized operators we need for manipulating our data.

The last parameter to update is a series of flags that modify how the update operates. For instance, we enabled multi, which updates all matching rows, overriding the default of updating a single row. That seems fairly nonintuitive to a longtime user of SQL-based systems.

Let’s examine how deletes work:

start = new Date()
db.sensorLog.remove({
    readingDate: {
        $gte: new Date(currentDate.getTime() - 9*86400000),
        $lt: new Date(currentDate.getTime() - 8*86400000)
    }
  }
)
(new Date() - start)
 
WriteResult({ "nRemoved" : 8640 })
145

While the update was a bit faster than Postgres, the delete was much slower. Beyond that, this is very similar to the update we did earlier. Often in MongoDB, the first parameter to a function will be some kind of query in the form of a JSON object. We should also point out that the default operation of the remove function is to act on all matches, the exact opposite of how update works. There is an argument to only remove the first match, but we didn’t use it here.

Let’s keep going and check out how queries themselves work:

start = new Date()
db.sensorLog.count()
(new Date() - start)
 
2
 
start = new Date()
db.sensorLog.find({
    readingDate: {
        $lt: new Date(currentDate.getTime() - 14*86400000)
    }
  }
).sort({readingDate: 1}).skip(20).limit(5)
(new Date() - start)
 
{ "_id" : 999980, "location" : 980, "reading" : 80, "readingDate" : ISODate("2016-03-21T11:16:40Z") }
{ "_id" : 999979, "location" : 979, "reading" : 79, "readingDate" : ISODate("2016-03-21T11:16:50Z") }
{ "_id" : 999978, "location" : 978, "reading" : 78, "readingDate" : ISODate("2016-03-21T11:17:00Z") }
{ "_id" : 999977, "location" : 977, "reading" : 77, "readingDate" : ISODate("2016-03-21T11:17:10Z") }
{ "_id" : 999976, "location" : 976, "reading" : 76, "readingDate" : ISODate("2016-03-21T11:17:20Z") }
 
2

Obtaining counts is refreshingly simple. One of the cool things about functional approaches like this, is that operations are easily chainable. We can either use the count function directly, or tack it to the end to return the total matches for a long query. We can see such combinations in the second query with the inclusion of skip and limit where Postgres would use OFFSET and LIMIT.

There are a lot of differences here. Diverse tools are a great resource for varying project requirements. It should be fairly clear that MongoDB is geared toward tight integration into JavaScript projects. The incredibly loose document structure requirements make it a perfect target for storing polymorphic data that consists of highly varying fields or otherwise convoluted structure. Usually we would prefer to abstract such things into a relational schema suitable for a standard database like Postgres. Yet in the case of highly nested object representations, that might not be practical.

Despite this, there is some small resemblance between Postgres and MongoDB. Mongo even provides a mapping to relate similar concepts. While multiple methodologies have a suitable equivalent, others do not. We already mentioned that collections can be treated like tables, but since their structure is dictated by their contents, that’s not strictly true.

Other concepts like JOIN didn’t even exist until the latest MongoDB incarnation of 3.2. Even then, that functionality is only available during aggregation. The only other alternative is to embed nested queries within a foreach loop of a base result set. There’s actually a good reason for that, too. MongoDB is a document store, and as such, one could argue that elements we’re trying to look up are essential attributes of the underlying document. This is a world that operates in the land of denormalization, where doing otherwise is an impediment to the platform.

There’s always a bit of a “not quite” caveat when comparing elements between these two engines. I certainly learned a lot while digging around in a default MongoDB instance. While some features were cool and extremely convenient, others frustrated and confounded me. It’s certainly a different world over there in NoSQL land, yet the need for it is undeniable. This and many other reasons contributed to JSON and JSONB becoming integral parts of Postgres. With them, we get the best of both worlds; loose structure amorphic documents and relational logic to cement relationships and enforce at least some integrity within our datasets.

It’s a bit of a foreign concept and technically breaks normalization rules, but I’m willing to consider the potential benefits. Rules are, after all, made to be broken.


Tags: , , ,

PG Phriday: All in a Name

July 8th, 2016 | Published in Database, Tech Talk | 3 Comments


Naming objects in a database can sometimes be an exercise in frustration. What kind of guidelines should we follow for schemas and tables? What about columns or variables? Should the same rules apply to indexes, constraints, and sequences? Functions and triggers are much different than all of those elements, yet still exist within the same namespace. Then we have to contend with Postgres reserved words, many of which are probably only obvious to someone well versed in database lingo.

Luckily, a few short rules essentially address of most of these questions directly. For objects that defy description or are otherwise elusive, we can still converge on something workable. Let’s start with a short list of basics:

  • Don’t quote.
  • Underscores separate words.
  • Motive counts.
  • Prefixes prevent problems.
  • Specific as possible.

Not only are these rules simple to apply, they make a great mnemonic acronym: DUMPS! Let’s explore each of these in depth and consider why they help make the database a place everyone can share and enjoy.

Don’t Quote

This is a rarity, but not uncommon in systems facilitated by an ORM or some kind of design tool that directly translates user inputs into object names:

CREATE TABLE "User Forms"
(
  "ID"            SERIAL NOT NULL PRIMARY KEY,
  "Quick Key"     VARCHAR NOT NULL,
  "EntryType"     VARCHAR NOT NULL,
  "Table"         JSON NOT NULL,
  date_created    TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);

Quoting allows us to override both case insensitivity and any reserved words Postgres might have—heck, we can even use spaces. While some developers and applications interpret this as consent, it greatly complicates actually using the objects. Quoted labels that contain mixed capitalization, special characters, or reserved words must be quoted forever. Whether it’s a DBA hand-writing a query, a reporting team trying to mine a series of tables for correlations, or an ORM.

The Postgres list of reserved words reveals that Postgres is actually more permissive than the standards dictate. The word “value” for example, might not be accepted in other database platforms, while Postgres will allow using it without quotes. This becomes a problem in mixed environments where various tools transmit data back and forth between database engines. The inconsistent behavior regarding reserved words means potential for unexpected results. There’s also the potential for Postgres to disallow common reserved words in the future.

Generally it’s best to simply avoid any database-esque words for object names and nullify the issue outright. Luckily following the specificity guideline pretty much solves syntax conflicts automatically, since our specific object name is very unlikely to be reserved by Postgres.

Underscores

Unlike most languages, SQL is case insensitive. Let’s take a look at what that means in practice:

CREATE OR REPLACE FUNCTION doSomethingCool(myVar INT)
RETURNS INT AS
$$
BEGIN
  RETURN myVar * myVar;
END;
$$ LANGUAGE plpgsql;
 
CREATE OR REPLACE FUNCTION DoSomethingCool(myVar INT)
RETURNS INT AS
$$
BEGIN
  RETURN myVar;
END;
$$ LANGUAGE plpgsql;
 
SELECT doSomethingCool(5);
 
 dosomethingcool 
-----------------
               5

While Postgres allows function overloading by specifying different arguments, that’s not what happened here. Instead, our carefully crafted CamelCase distinctions are discarded upon evaluation. As a consequence, the function names are identical, and the second definition overwrites the first one.

But it gets even more insidious than that. Check this out:

CREATE OR REPLACE FUNCTION check_me_out(nEst INT)
RETURNS INT AS
$$
DECLARE
  nest BOOLEAN; -- Should we nest the output?
BEGIN
  -- ... other code here.
  RETURN nEst;
END;
$$ LANGUAGE plpgsql;
 
SELECT check_me_out(5);
 
 dosomethingcool 
-----------------

Wait, what? This time, “nEst” is the same as “nest”, opening the potential for unexpected or undefined behavior. While this could arguably be called a Postgres bug since we shouldn’t be able to declare a variable with the same name as a functional parameter, it’s just one example.

Otherwise, the rule is simple: use underscores to split words. We don’t want onegiantunreadablestring, so it’s the only reliable way to retain user readability. Postgres doesn’t preserve mixed case at all in object names unless they’re quoted. As a consequence, it’s up to us to avoid those situations.

Here’s another illustration to drive the point home:

CREATE TABLE UserFormsAndOtherStuff (stuff INT);
 
\dt UserFormsAndOtherStuff
 
                 List OF relations
 Schema |          Name          | TYPE  |  Owner   
--------+------------------------+-------+----------
 public | userformsandotherstuff | TABLE | postgres

My eyes!

Motive Counts

This is essentially an extension of our specificity rule. This time, we’re incorporating the use case along with any necessary context and intent behind our objects. What does this mean? Take a look at these structures:

CREATE TABLE toll_sensor (...);
CREATE OR REPLACE VIEW v_all_active_sensors AS ...;
 
CREATE INDEX idx_sensor_ignore_inactive
    ON toll_sensor (sensor_id) WHERE is_online;
 
ALTER TABLE toll_sensor_log
  ADD CONSTRAINT chk_sensor_log_reject_invalid CHECK ...;
 
CREATE OR REPLACE FUNCTION f_log_traffic_encounter(...);
CREATE OR REPLACE FUNCTION f_nullify_logged_transponder(...);
 
CREATE TRIGGER t_sensor_log_advisory_audit_a_iud ...;

In a database context, it’s assumed tables will store multiple things. In fact, it’s probably easier to consider the name of the table as a description for a single row it contains. In our above toll_sensor table, each row within the table is a sensor. This suggests table names should be singular. Simultaneously, view names are an instantiation of a query, meaning they’re a result set and therefore inherently plural.

When we think about indexes, why is the index necessary? Is it just a basic index for a column, or is it necessary to facilitate a subset of searches? Views are essentially reports, so why does the view exist, and what is it returning? Constraints define how a table acts in many cases, and we should list those conditions or some summary of them. With functions, we want to know what the function does, and if possible, to what. Triggers can do a lot of things, and since they add quite a bit of overhead, we really want to explain their presence.

Just consider the context, and this should be easy.

Prefixes

Database schemas often become a literal dumping ground of database objects. Sometimes from multiple different competing applications. Imagine we just wrote an application to track automated highway tolling. We know in the future that there could be companion applications that use the same data, but might rely on tables of their own.

Here’s how that might look:

CREATE SCHEMA toll_system;
SET search_path TO toll_system;
 
CREATE TABLE toll_sensor (
  sensor_id  SERIAL NOT NULL,
  ...
);
 
ALTER TABLE toll_sensor
  ADD CONSTRAINT pk_toll_sensor PRIMARY KEY (sensor_id);
 
CREATE TABLE toll_transponder (
  transponder_id  SERIAL PRIMARY KEY NOT NULL,
  ...
);
 
CREATE TABLE toll_traffic_log (
  traffic_log_id  SERIAL PRIMARY KEY NOT NULL,
  ...
);
 
CREATE UNIQUE INDEX udx_traffic_log_encounter (...);
 
CREATE VIEW v_transponder_history AS
SELECT ...;
 
CREATE FUNCTION f_log_transponder(...)
RETURNS BOOLEAN AS ...;
 
CREATE TRIGGER t_traffic_log_audit_a_iud
AFTER INSERT OR UPDATE OR DELETE ON toll_traffic_log ...;

Things like tables, sequences, functions, views, indexes, and triggers all exist in the same namespace, and many even share the same context. We can avoid confusion as well as collisions by prefixing. The “rules” for this are straight-forward:

  • Label tables with the primary intent. For a tolling system, this would be “toll”, while an inventory system might prefer “inv”.
  • Use “v” for views.
  • Use “f” for functions. (Or “sp” for “stored procedure” if you prefer.)
  • Use “t” for triggers. (Optionally, suffix the trigger firing conditions: “b” for before, “a” for after, and any one of “i”, “u”, and “d” for insert, update, and delete.)
  • Indexes can be “idx” for a regular index, or “udx” for a unique index.
  • Indexes should also be prefixed with the parent table name, sans the table’s prefix.
  • Primary keys should use “pk”.
  • Foreign keys should use “fk”.
  • Boolean columns should be prefixed with “is”, “can”, “has”, etc.

Complicated? Maybe, but only slightly. It’s easy to summarize: abbreviate, shorten, and add. It’s hard to miss that we’re just using the object type as a letter prefix. This way, anyone looking at the schema can tell which things are views, tables, sequences, and so on. Further, examining the table will immediately show which indexes are unique, which constraints are a foreign key or a check, and so on. The point is consistency; this isn’t a bible.

These prefixes give us structure visibility, as well as some peace of mind in avoiding naming collisions in the giant blob of a namespace.

Specificity

Motive explains how and why, while being specific tells us what. Every once in a while, I come across a table like this:

CREATE TABLE storage
(
  id       SERIAL NOT NULL PRIMARY KEY,
  KEY      VARCHAR NOT NULL,
  TYPE     VARCHAR NOT NULL,
  DATA     JSON NOT NULL,
  created  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);

Databases can contain hundreds or thousands of tables, each of which may encapsulate dozens or hundreds of rows. What is “storage”? What are we storing? Data? What kind of data? From where? This kind of table is a nightmare to decipher. There are no clues to track down which app might be using it, and no way to tell how the table is related to other tables in the schema.

A column name such as “data” may seem like an acceptable idea for a single application, but when there are a dozen applications which have their own “data”, it quickly starts to lose meaning. Sure, the application developer probably knows what all of these extremely generic terms refer to, but does anyone else? In this case, the intent was probably to use “key” as some kind of lookup value as often seen with key/value pairs in associative arrays or noSQL-based composite text keys. And likewise the “data” column is probably the associated JSON object for that lookup key.

But why should anyone have to guess? This is only one example, but there are many like it. Let’s assume this table is used by an application that stores user testimonies. The application captures a lot of varying data from polymorphic forms, so the JSON can’t really be transformed into a reliable set of columns. With those constraints, we might try this instead:

CREATE TABLE user_testimony
(
  testimony_id   SERIAL NOT NULL PRIMARY KEY,
  lookup_key     VARCHAR NOT NULL,
  comment_type   VARCHAR NOT NULL,
  user_feedback  JSON NOT NULL,
  date_created   TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);

Isn’t that better? Now a quick glance reveals everything we need to know about the table and its contents. Tables are a back-end storage element, and are often shared by multiple applications and user groups. Shared resources should be as descriptive as possible to prevent confusion and potential misinterpretation or misuse.

This applies to any database object. Views should distinctly describe their purpose. Columns need to differentiate themselves from columns in other tables to prevent join ambiguity. Functions, sequences, everything should have a clearly distinguishable name.

Conclusion

This is a lot to consider. The DUMPS acronym helps, but an executive summary might be a simple: consistent and descriptive is best. A consistent naming scheme will prevent a lot of future headaches, and being descriptive forces us to think about the data we’re storing and why. Not only can this help us consider the architecture before setting it in stone, but it conveys meaning by simply existing.

Anyone who joins the company in the future can make sense of data that is at least partially self-documented. Queries can be more consistent and precise, with less ambiguity in every facet from column names to join conditions. Applications can be extended, or new APIs written to leverage old data. Why wade through ancient and probably undocumented code to determine original intent or business logic?

Just remember that access vectors are myriad and ephemeral, but data is forever. No matter how hard it is to think of good names, don’t let it get you down in the DUMPS.


Tags: , , ,

PG Phriday: EXCEPTIONal Performance

July 1st, 2016 | Published in Database, Tech Talk | 6 Comments


Like any programming language, the PL/pgSQL Postgres procedural language has plenty of handy control structures. Among those thankfully modern accoutrements is the humble EXCEPTION block. While not the more prevalent try/catch methodology, we can use BEGIN anywhere to start an embedded section for the same effect.

Knowing this is the case, what are the costs of actually using them? Postgres is fairly streamlined, and as such, can take several shortcuts when there are no exception blocks within a chunk of PL/pgSQL. Thus, it’s not uncommon for members of mailing lists to mutter about ‘performance costs’ and suggest that the worst performance hit is felt by even breathing the word ‘exception’ in the presence of a Postgres instance. The implication is that further leverage will be incremental by comparison.

But is that the case? And if not, what are the actual metrics observed while using exceptions in different contexts? Let’s go beyond nebulous hand waving and take a look, starting with a basic table to hold data:

CREATE TABLE sensor_log (
  id            SERIAL NOT NULL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
CREATE INDEX idx_sensor_log_date
    ON sensor_log (reading_date);

The above table is one we’ve used often before. It helps us illustrate using a function to insert data as one potential entry vector commonly associated with projects like PL/Proxy. Really this will apply to any application that is designed to interact through a database-side API of predefined functions.

A very basic example of this is a simple insert function:

CREATE OR REPLACE FUNCTION f_capture_log(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (location, reading, reading_date)
  VALUES (sensor_loc, sensor_val, now());
 
  RETURN TRUE;
END;
$$ LANGUAGE plpgsql;
 
\o /dev/NULL
 
SELECT f_capture_log('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);
 
TIME: 2567.990 ms

There’s nothing too strange here. All we have is a function that’s wrapping our insert and applying a timestamp when the function is called. We used 100,000 rows to push enough data that our timings remain mostly consistent. By and large, we were successful here; any variance in run times is generally within 5% on our otherwise idle test VM.

Knowing that, we can really start having fun now. We came here to examine exceptions, so let’s add one that does nothing but invert the success of our function call. Keep in mind that this exception will never fire, since we’re relying on a sequence to provide conflict-free primary key IDs.

CREATE OR REPLACE FUNCTION f_capture_log_except(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (location, reading, reading_date)
  VALUES (sensor_loc, sensor_val, now());
  RETURN TRUE;
EXCEPTION
  WHEN OTHERS THEN
    RETURN FALSE;
END;
$$ LANGUAGE plpgsql;
 
\o /dev/NULL
 
SELECT f_capture_log_except('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);
 
TIME: 3211.702 ms

It looks like we have a 25% performance hit simply by including the exception block. Does that justify the argument that there is a significant upfront cost associated with exceptions, and that once we cross that threshold, everything is smooth sailing? Indeed, 25% is rather substantial for an exception that literally does nothing and will never actually run. Maybe, or maybe not. Whatever we find, at least we have some numbers to work with!

So let’s screw up our function a little, and make it work for its dinner.

CREATE OR REPLACE FUNCTION f_capture_log_broken(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (id, location, reading, reading_date)
  VALUES (sensor_val, sensor_loc, sensor_val, now());
  RETURN TRUE;
EXCEPTION
  WHEN OTHERS THEN
    RETURN FALSE;
END;
$$ LANGUAGE plpgsql;
 
\o /dev/NULL
 
SELECT f_capture_log_broken('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);
 
TIME: 5235.468 ms

Ouch! Our function is now about twice as slow as the original incarnation that didn’t use an exception block. Still, this is a worst case scenario where literally every invocation of the function will cause an exception. If we reduce the amount of overlap and truncate the sensor_log table, run times are affected proportionally. In other words, if only 5% of the calls caused a key violation, we’d only see a further 5% performance cost.

So far, everyone has been correct. The initial investment cost from using exceptions is still the highest we’ve paid, barring ridiculous examples like this one. Does that change if we catch the actual error that’s occurring so there are two conditionals? Let’s see:

CREATE OR REPLACE FUNCTION f_capture_log_extra_broken(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (id, location, reading, reading_date)
  VALUES (sensor_val, sensor_loc, sensor_val, now());
  RETURN TRUE;
EXCEPTION
  WHEN unique_violation THEN
    RETURN FALSE;
  WHEN OTHERS THEN
    RETURN FALSE;
END;
$$ LANGUAGE plpgsql;
 
\o /dev/NULL
 
SELECT f_capture_log_extra_broken('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);
 
TIME: 5370.879 ms

It turns out that the answer is no. Using further exception conditions doesn’t appear to be statistically significant. There’s probably a small amount of overhead associated with the underlying code represented in the jump logic, but that’s an expected element of any language. But this is still boring and pointless; our exceptions still do nothing but tell us the insert failed.

What happens if we actually modify the exception to raise another exception that the caller can process? So long as the exception level we use isn’t fatal, it won’t short-circuit function loops, allowing us to feel the full cost of every uncaught exception.

CREATE OR REPLACE FUNCTION f_capture_log_super_broken(
  sensor_loc VARCHAR,
  sensor_val BIGINT
)
RETURNS BOOLEAN AS
$$
BEGIN
  INSERT INTO sensor_log (id, location, reading, reading_date)
  VALUES (sensor_val, sensor_loc, sensor_val, now());
  RETURN TRUE;
EXCEPTION
  WHEN unique_violation THEN
    RAISE WARNING 'This sucks!';
    RETURN FALSE;
  WHEN OTHERS THEN
    RAISE WARNING 'Ditto!';
    RETURN FALSE;
END;
$$ LANGUAGE plpgsql;
 
\o /dev/NULL
SET client_min_messages TO error;
 
SELECT f_capture_log_super_broken('x' || s.id % 1000, s.id % 100)
  FROM generate_series(1, 100000) s(id);
 
TIME: 6974.891 ms

Almost three times slower? Now we’re talking! Again, this is a worst case scenario, but it’s also proportional. Since the new function is about 30% slower than simply catching the key violation, we can extrapolate that there’s a 30% cost associated with raising an exception. In that case, if our data had a 10% overlap, we’d have the initial 10% from the exception logic, and another 3% for raising our own exceptions afterwards.

This is in addition to the initial 25% cost associated with even using exceptions in the first place. Again, the upfront price from having an exception block is worse than activating various functionality within it. Even in our highly contrived example, we’d need to call our function on data that had 20% duplicates before we’d start to eclipse the initial exception cost.

How likely is that, really? In most cases where APIs are used like this, there will be an occasional race condition every few hundred thousand or million rows. In that case, the full run time of even our most broken function would be indistinguishable from the best case that had a completely inactive exception block. In the end, that’s what we’ll likely see in real-world scenarios.

At least with Postgres 9.5 on an idle VM, it would appear that exceptions add a 25% performance penalty. Whether this is enough to justify a different approach, or bar them outright, is completely dependent on the expected use case. Indeed, we’re already paying a hefty price by using a function instead of a direct INSERT statement anyway, so it could be a moot argument. A highly sensitive transactional database would probably avoid the overhead, while a more common day-to-day system wouldn’t even notice.

Just keep that number in mind: 25%. It’s the cost of doing business with exceptions, but hardly a deal breaker.


Tags: , , , ,

PG Phriday: DIY in the CLI (Part 1)

June 24th, 2016 | Published in News | No Comments


On a higher level, Postgres has a bevy of libraries, interfaces, and clients for accessing a database instance. From language APIs to GUIs like pgAdmin, or SaaS entries like JackDB, every flavor of interaction is covered. And yet, that’s only a small part of the story. For those who dare to tread into the watery depths, there’s also the world of dark incantations that is the command-line.

While most are aware of psql, the Postgres command-line client for accessing databases, there are far more creatures lurking in the black oblivion which deserve more visibility. In the spirit of getting the most from our tools, let’s take a look at these and consider how they might be useful in scenarios that aren’t restricted to boring old DBA considerations. To that end, let’s cover some of the standard client applications that anyone might use.

To begin, there are several commands that cover some standard SQL operations. Things like creating or dropping databases, users, or procedural languages, are all possible without actually connecting to Postgres. Normally, we’d have to use the equivalent SQL commands:

CREATE DATABASE foo;
DROP DATABASE foo;
 
CREATE USER kitty_cat WITH PASSWORD 'M30w9|_|rr';
DROP USER kitty_cat;
 
CREATE LANGUAGE plpythonu;
DROP LANGUAGE plpythonu;

While these are all possible to use by forcing them through psql -c, that makes automation testing and other tasks somewhat awkward, or force sysadmins to learn SQL for no reason. So Postgres provides functional wrappers we can use instead. They’re named about what we might expect, too:

createdb foo
dropdb foo
 
createuser kitty_cat --pwprompt
dropuser kitty_cat
 
createlang plpythonu
droplang plpythonu

As would be expected, all of these commands include the optional parameters from their SQL equivalents. This means a couple of handy things: flags can be listed with -? or --help, and longer descriptions are available via comprehensive man pages. The Postgres online documentation is great, and having all of this information at our fingertips is even better.

Of course, we’re not limited to databases, users, and languages. There are a few commands to streamline basic maintenance as well, and as might be expected, these also have SQL equivalents. Well, mostly. Consider these three scenarios:

  1. We want to VACUUM every database every night following a busy day of transactions. We’ve disabled autovacuum because we have a very active database, and have instead, opted for nightly maintenance.
  2. Over the years, our indexes on the user_session and login_state tables have gotten a bit fragmented, and we want to rebuild them.
  3. We have clustered multiple tables, sorting their disk pages along beneficial indexes. Due to MVCC storage, tables that experience updates need periodic re-clustering to maintain this ordering. To do this, we need to run CLUSTER regularly.

Assuming we have two databases, named ‘trading’ and ‘operations’ for example, we could do these things using SQL commands:

\c trading
VACUUM ANALYZE;
CLUSTER;
REINDEX TABLE user_session;
REINDEX TABLE login_state;
 
\c operations
VACUUM ANALYZE;
CLUSTER;

This is all perfectly serviceable. Yet for administrators or end users that don’t really care about SQL syntax, or for operations we want to automate, there is an easier way. We could get the same result using three command-line tools:

vacuumdb --all --analyze
reindexdb --table=user_session --table=login_state trading
clusterdb --all
 
# Or with short options:
 
vacuumdb -az
reindexdb -t user_session -t login_state trading
clusterdb -a

Unlike their SQL equivalents, the command-line tools can combine operations or target multiple objects. We leveraged that to reindex both tables with a single command, and vacuum or cluster all databases in our instance. This is about the time our tools become more than functional wrappers of the SQL commands. Parameters like --all illustrate client-level looping at the very least, and provide a great segue into heavier utilities.

Backing up databases and restoring them, for instance, leverages multiple high and low level Postgres interfaces. Due to this complexity, there is no SQL analog. The Postgres backup and restore utilities have also matured a lot over the years, gaining parallel dumping functionality as recently as 9.3.

Taking this into account, we could clone one of the previously mentioned databases using provided tools in two different ways:

# Just copy a target database by itself:
 
pg_dump --jobs=2 --format=directory --file=backup_dir trading
createdb trading_clone
pg_restore --jobs=2 --dbname=trading_clone backup_dir
 
# Copy the whole freaking instance at the binary level.
 
pg_basebackup -D clone_dir

Just so we’re clear, both approaches have pluses and minuses as well as applicable scenarios beyond the scope of this article. We’re merely illustrating very basic usage. Beyond that, the first method is a variant of the common dump/restore pattern used since time immemorial to perform Postgres upgrades until pg_upgrade hit the scene in 9.0. One of the primary reasons it fell out of favor was due to the growing trend of immense databases.

Even using parallel functionality, dumping the contents of every table in a database 200GB or larger will be extremely slow. Restoring that information is even worse, as we not only have to import all of the same data, but all indexes and constraints must be rebuilt from scratch. Before pg_restore was capable of parallel restores in 8.4, restores were even more frustrating. This, along with the new online replication capabilities, is the origin of pg_basebackup.

By default, pg_basebackup merely utilizes the Postgres data replication stream and reconstructs the host instance in a specified directory. This mans we can start Postgres from that directory as a replica, or use pg_upgrade to test newer versions while the old one keeps running. This encourages online failbacks in case of failed upgrades, multiple upgrade test runs, or running applications on new versions to test for edge cases and compatibility concerns.

Principally, it allows us to separate instance management from content management. We still need pg_dump and pg_restore, but they’re no longer primarily backup tools. This is espicially true for pg_dump. It has multiple export formats which can theoretically be used to transfer table data into other storage systems. Or maybe we just want to export and inspect raw schema creation commands.

What else is left on the client end of things? For those that want to build Postgres extensions, or get information about the Postgres binaries installed on a server, we have pg_config. Its output is basically a bunch of environment settings that were used to build Postgres itself. This is especially useful with distributed builds that have long lists of arbitrary configure flags. After all, here’s how Ubuntu compiles it:

pg_config --configure
 
'--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-tclconfig=/usr/lib/x86_64-linux-gnu/tcl8.6' '--with-includes=/usr/include/tcl8.6' 'PYTHON=/usr/bin/python' '--mandir=/usr/share/postgresql/9.4/man' '--docdir=/usr/share/doc/postgresql-doc-9.4' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/9.4' '--bindir=/usr/lib/postgresql/9.4/bin' '--libdir=/usr/lib/x86_64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-tap-tests' '--enable-debug' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' 'CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -I/usr/include/mit-krb5 -fPIC -pie -DLINUX_OOM_SCORE_ADJ=0 -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -L/usr/lib/mit-krb5 -L/usr/lib/x86_64-linux-gnu/mit-krb5' '--with-krb5' '--with-gssapi' '--with-ldap' '--with-selinux' 'CPPFLAGS=-D_FORTIFY_SOURCE=2

Some of that might be nice to know on occasion.

We also have pg_isready, a “new” addition with 9.3. It’s just a quick connection check with an exit status that follows standard exit codes. This is good for basic monitoring, but not much else. On the other hand, we no longer have to run a bogus command through psql and deal with the much more verbose client-oriented output. On an automation front, that’s a major step forward.

And finally, there’s benchmarks. Perhaps we’re testing new hardware, or a new Postgres version, or maybe we’re just curious. Having recently escaped existence as a “mere” contrib tool in 9.5, now anyone and their dog can beat up Postgres for fun and profit. It’s great to obtain data for articles like this, too.

Here’s a quick example:

createdb pgbench
pgbench --initialize --scale=100 pgbench
pgbench --client=4 --jobs=2 --select-only --time=10 pgbench
 
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 4
number of threads: 2
duration: 10 s
number of transactions actually processed: 168447
latency average: 0.237 ms
tps = 16844.169409 (including connections establishing)
tps = 16853.718425 (excluding connections establishing)

Generally read/write tests should be larger than memory to really stress a system, but we’d need a much larger scale for that on most modern systems. At scale 100 we get about 1.5GB over 10M rows of data, so on a small 8GB test VM, we’d need a scale of 600 or higher to even start hitting the disks.

Beyond the basic test framework, we can also execute arbitrary SQL scripts. This makes it possible to stress-test transactional databases with multiple parallel clients, running anything from simple SELECT statements on our own tables, to intricate stored procedures that update fact tables. As such, it’s one of the most valuable tools in the arsenal of both users and DBAs, provided either even know it exists.

There are a couple more tools I’ve left out, but they’re related to logical and stream replication and are better covered in a more in-depth manner. Beyond that, take a look at your Postgres binary directory; you might be surprised what’s hiding there.

Because distributions like to hide all the cool stuff, we may have to go looking for some of these utilities. Debian and Ubuntu users can find binaries in /usr/lib/postgresql/9.x/bin, while RHEL variants like CentOS should look in /usr/pgsql-9.x/bin.

Explore often, but do so cautiously!


Tags: , ,

PG Phriday: Let There Be Jank

June 17th, 2016 | Published in Database, Tech Talk | No Comments


One way the Postgres project is subtly misleading, is that it becomes easy to forget that not all other projects are nearly as well managed. This becomes more relevant when delving into niches that lack sufficient visibility to expose the more obvious deficiencies. As much as we like Postgres, it’s not quite as popular as it could be. This makes some of the side projects infrequently used, and as a direct consequence, they can often resemble jerky automatons cobbled together out of spit and bailing wire.

A good example of this is the hdfs_fdw extension for accessing Hadoop. To be fair, a large portion of its complexity is due to dependency on Hive and hastily assembled APIs like Thrift and fb303. Unfortunately it also suffers from “Works for Me” syndrome, lacking an autoconf to autodiscover these requirements, and it also doesn’t use automake to properly bootstrap the make environment. As a result, most builds will fail outright or require hand-modifying the Makefile—a task many will simply balk at before abandoning the extension outright.

So, let’s install the hdfs_fdw extension, going through all the necessary steps, and not just some incomplete shorthand that makes assumptions regarding the build system. To facilitate this, no default install locations will be used at all, because hardcoded defaults are how we got into this mess in the first place.

The hdfs_fdw extension depends on Thrift, so let’s start there. As of this writing, the latest version is 0.9.3. Here’s the full build process we used:

wget http://apache.osuosl.org/thrift/0.9.3/thrift-0.9.3.tar.gz
tar -xzf thrift-0.9.3.tar.gz
cd thrift-0.9.3
./configure --prefix=/opt/thrift
make -j2
sudo make install

So far, this is pretty normal. What isn’t normal, is that the thrift source includes a contrib module named fb303 we also need. Unlike the Postgres build environment, configuration settings and subsequent Makefile components do not cascade to the contrib modules. This is where the trouble begins.

The first issue is that, unlike Thrift, fb303 defaults to only static linking, and won’t generate a dynamic object file unless explicitly told to do so. Considering these components are packaged together, this restriction is unnecessarily jarring. Why choose between dynamic or static linking for a system library that may be used as either? In our case, we want dynamic shared objects, so we need to configure with --disable-static.

The second roadblock comes from the Thrift team itself, which closed a bug as “Cannot Reproduce” despite the fact one of their include path references is simply wrong. The user that reported the issue even provided a patch. So we need to fix that, too.

The whole process looks like this:

cd contrib/fb303
./bootstrap.sh --prefix=/opt/thrift --with-thriftpath=/opt/thrift \
    --disable-static
sed -i 's%/include/thrift%/include%' cpp/Makefile.am
make -j2
sudo make install

With Thrift out of the way, it’s time to move on to hdfs_fdw itself. Unfortunately we can’t quite do that yet. The hdfs_fdw extension is distributed with a library it depends on, but that is not accounted for in the Makefile. We need to build and install it separately for some reason. Further, this dependency has hardcoded paths in its own Makefile, so we must modify it or end up with the library in an arbitrary location, or with it unable to find required headers.

So let’s build the libhive library:

git clone https://github.com/EnterpriseDB/hdfs_fdw.git
cd hdfs_fdw/libhive
sed -i 's%THRIFT_HOME=.*%THRIFT_HOME=/opt/thrift/include%' Makefile
sed -i 's%INSTALL_DIR=.*%INSTALL_DIR=/opt/thrift/lib%' Makefile
make -j2
sudo make install

We elected to install libhive in the same location as Thrift because they’ll be used together in our case. This prevents cluttering up our /usr/local/lib directory, as well as allowing us to easily redistribute these prerequisites to our other Postgres systems since we don’t have a nice package.

Finally, we can build the hdfs_fdw extension itself. Or can we?

Unfortunately, we’re still not done setting up. Because we elected to install Thrift as an optional piece of software, and because we use dynamic linking, we need to tell the operating system where to find libraries. Usually this means modifying /etc/ld.so.conf and running ldconfig to re-read library paths. On Ubuntu and other Debian variants, we can actually put these in a subdirectory in a less intrusive fashion.

Here’s how that might look:

echo /opt/thrift/lib | sudo tee /etc/ld.so.conf.d/hive.conf
sudo /sbin/ldconfig

Our particular build environment is an Ubuntu system, which is a Debian variant. As such, build tools like pg_config are actually wrappers around the real utilities, which are hidden away in deep paths to prevent accidental use. This is to help facilitate having multiple Postgres versions on the same server, but it also complicates installing extensions, since the wrappers always assume the most recent version. Suppose Postgres 9.6 beta is on our system, but wanted to install an extension for 9.5?

That means we need to alter our path before building hdfs_fdw itself. Debian variants put everything we need in /usr/lib/postgresql/[version]/bin, but other UNIX systems may use a different location. We’ll need that information to proceed. In addition, since hdfs_fdw doesn’t use a configure script, we can’t tell it where to find the Thrift and fb303 libraries. This isn’t strictly necessary because we modified ldconfig, but it’s always better to be safe.

So, assuming we’re still in the hdfs_fdw/libhive directory, we’d finish the extension installing hdfs_fdw like this:

export PATH=/usr/lib/postgresql/9.5/bin:$PATH
 
cd ..
sed -i 's%/usr/local/thrift%/opt/thrift%' Makefile
make -j2 USE_PGXS=1
sudo -E make install

We needed the -E flag to preserve our $PATH variable. Otherwise the root user’s path would be used, and then the extension would be installed into the most recent Postgres version, regardless of our wishes.

Since we modified ldconfig with the new library references, we also need to restart Postgres. Otherwise, it won’t have /opt/thrift/lib in its library cache, and as a result, would throw an error when trying to activate hdfs_fdw. Ubuntu systems use pg_ctlcluster for this, while others will use the Postgres pg_ctl tool directly. Let’s make that our final step to “activate” the library before using it.

sudo pg_ctlcluster 9.5 main restart

And finally… finally we’re done. But did it work? Let’s check:

CREATE EXTENSION hdfs_fdw;
 
CREATE SERVER hdfs_server
         FOREIGN DATA WRAPPER hdfs_fdw
         OPTIONS (host 'hive_host');
 
CREATE USER MAPPING FOR postgres
    SERVER hdfs_server;
 
CREATE FOREIGN TABLE hive_sensor_log
(
  id            BIGINT,
  location      VARCHAR(255),
  reading       BIGINT,
  reading_date  TIMESTAMP
 
) SERVER hdfs_server
OPTIONS (dbname 'default', TABLE_NAME 'sensor_log');
 
SELECT * FROM hive_sensor_log;
 
 id | location | reading |    reading_date     
----+----------+---------+---------------------
  1 | place    |      82 | 2016-06-17 08:15:31
  2 | stuff    |      22 | 2016-06-17 08:18:31

Well then, that was quite an adventure. In the end, we got something that worked, though the amount of hoops we had to jump through was a little disconcerting. It shouldn’t have to be this way.

On a personal note, this was actually the easy part. Hadoop is a truculent beast, and other elements in the stack—of which there are many—just make it more ridiculous. Hive itself is probably one of the most janky things I’ve ever encountered. Postgres has libpq, so why isn’t there an equivalent for Hive? Is it a protocol or not? Why do I need to install a freaking one-off Facebook library to access my Hadoop install with a SQL interface?

Worse, I needed to follow multiple incomplete tutorials online to get Hive working at all. Beyond simply installing it, it must be started with SASL disabled for hdfs_fdw. But doing that means it defaults to Kerberos authentication. If that isn’t set up, commands need to run as the user that launched Hive, since users are mapped from the operating system. To get that to work, I had to modify several more XML files. I feel like I’d need to read at least three books on this subject before even trying to approach this with any amount of confidence.

Then Hive crashed with an OOM error after the query output above. It turns out I could select data to my heart’s content, but following an insert through Hive (which took almost 20 seconds per row), there were ceaseless problems. Pulling data after an insert always caused it to go into an infinite uncaught OOM exception crash loop that required kill -9 to stop. When all of my work for this article was complete, I shut it all down and backed away slowly, lest it rend me to dripping gobbets because I pressed the wrong key in its presence.

Postgres, despite its versatility, just works. It can be used in more advanced architectures and be leveraged for even more power, but it still functions in its base configuration. After today, I sincerely wish more projects followed that philosophy.


Tags: , , , , , , ,

« Older Posts

Newer Posts »