Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well.
I’ve maintained since about 2011, that the problem with scaling Postgres to truly immense installations could be solved by a query coordinator. Why? Most sharding systems utilize an application-level distribution mechanism, which may or may not leverage an inherent hashing algorithm. This means each Postgres instance can be treated independently of all the others if the distribution process is known. On a cleverly architected system, the application is algorithm aware, and can address individual shards through a driver proxy or accessor class.
With all of the upheaval in the Postgres world thanks to advancements in extensions, foreign data wrappers, and background workers, it’s getting pretty difficult to keep track of everything! One of these rapidly moving targets is Postgres-XL and its role in helping Postgres scale outward. Large warehouses have a critical need for horizontal scaling, as the very laws of physics make it effectively impossible to perform aggregate queries on tables consisting of several billion rows.
For part of today, I’ve been experimenting with the new-ish pg_shard extension contributed by CitusData. I had pretty high hopes for this module and was extremely excited to try it out. After screwing around with it for a while, I can say it has a lot of potential. Yet I can’t reasonably recommend it in its current form. The README file suggests quite a few understandable caveats, but it’s the ones they don’t mention that hurt a lot more.
Well, that was an interesting couple of days. Unfortunately my birthday came right after and I didn’t feel like writing anything for the duration. Now though? Why not!
In the end, I think my presentation, NVRam for Fun and Profit went over okay. Not a ton of people showed up, but I did get ambushed afterwards with questions after I got off the stage. Why those people didn’t ask while I was in Presentation Mode, I can’t quite understand.