Postgres theory, feature discussion, and advocacy are fun. But even I’ll admit it’s nice to have some practical application every once in a while. This week, we’re going to build an actual database.
But what would be small enough for a proof of concept, yet somewhat interesting? Well, I’m a fan of Hearthstone. It’s a silly card game much like Magic: The Gathering, but has the distinct aura of “eSports!” Regardless, it’s a fun little time waster, and has a few hundred data points we can manipulate.
I wasn’t able to write an article last week due to an unexpected complication regarding tests I was running to verify its contents. So this week, it’s going to be extra special! Also long.
What’s the fastest way to load a Postgres table? If you believe the documentation, the COPY command is the best way to unceremoniously heave data into a table. Fortunately after all of our talk about partitions, our minds are primed and ready to think in chunks.
I’ve been talking about partitions a lot recently, and I’ve painted them in a very positive light. Postgres partitions are a great way to distribute data along a logical grouping and work best when data is addressed in a fairly isloated manner. But what happens if we direct a basic query at a partitioned table in such a way that we ignore the allocation scheme? Well, what happens isn’t pretty. Let’s explore in more detail.
Most Postgres operators and informed users are aware that it uses MVCC for storage. One of the main drawbacks of this versioning mechanism is related to tuple reuse. In order to reuse the space, VACUUM must complete a cycle on the table. Unfortunately this isn’t always possible to “optimize” for larger tables. How so?
If a large table needs to have a calculated column added, or some other bulk query updates a large portion of its content, a large fragment of the table is now empty space.
This week we’ll be covering another method of Postgres partitioning. This is a technique I personally prefer and try to use and advocate at every opportunity. It’s designed to straddle the line between traditional partitioning and standard monolithic table structure by using table inheritance as a convenience factor. The assumption here is that end-user applications either:
Know that partitioning is in use. Only load “current” data and don’t care about partitions.