PG Phriday: Design Pattern Workshop
Recently on the pgsql-performance mailing list, a question popped up regarding Postgres RAM usage. In this instance Pietro wondered why Postgres wasn’t using more RAM, and why his process was taking so long. There were a few insightful replies, and they’re each interesting for reasons that aren’t immediately obvious. Let’s see what is really going on here, and perhaps answer a question while we’re at it.
Pietro presents several postgresql.conf
settings, but here are the ones that matter:
shared_buffers = 24GB
effective_cache_size = 72GB
work_mem = 512MB
maintenance_work_mem = 1GB
The configured system has 96GB of total RAM. Keeping that in consideration, we see that shared_buffers
is set to 24GB, which is a quarter of the server’s available memory. The only potential issue here is that it’s possible a checkpoint could be a very painful experience if a significant portion of that memory becomes dirty. Since there are no complaints of high IO waits, let’s just assume that isn’t happening.
Common advice suggests setting effective_cache_size
to an amount equivalent to half of system RAM + shared_buffers. A setting of 72GB looks high, but Postgres does not allocate any of that on its own. This parameter merely influences the query planner, since higher values imply an increased probability a disk page is in either Postgres or OS caches. This is an optimization setting, not a memory allocation.
A work_mem
value of 512MB is extremely high. In some cases however, that’s perfectly acceptable. The amount specified here will be used for all internal query execution operations that require a memory allocation. This means sorts, hashes, bitmaps, materializations, and so on. Thus a single query can instantiate multiple 512MB segments for each necessary operation. Effectively the user has greatly reduced the likelihood Postgres will utilize disk storage for any query operations.
And finally we have maintenance_work_mem
. This is so often set to 1GB, it might as well be the default. This is the amount of RAM Postgres uses when doing things like creating indexes, vacuuming, and anything else that might need vast amounts of working space to proceed. We don’t know Pietro’s procedures, but we can make some inferences later. This setting really only matters if some part of his data onboarding includes adding new indexes or triggers an autovacuum.
Then we see how memory is performing once Postgres is running:
total used free shared buff/cache available
Mem: 94G 28G 46G 17M 19G 64G
Swap: 15G 0B 15G
Total: 109G 28G 61G
That’s not a lot of usage. Of that 28GB, we know Postgres has 24 simply due to the setting for shared_buffers
. This means that the actual queries are only consuming 4GB at most.
This is particularly relevant because if we skip to the end of the post, we see this output from top
:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9686 postgres 20 0 24.918g 214236 12628 R 100.0 0.2 2872:38 postgres
9687 postgres 20 0 24.918g 214212 12600 R 100.0 0.2 2872:27 postgres
9688 postgres 20 0 25.391g 709936 12708 R 100.0 0.7 2872:40 postgres
9691 postgres 20 0 24.918g 214516 12900 R 100.0 0.2 2865:23 postgres
9697 postgres 20 0 24.918g 214284 12676 R 100.0 0.2 2866:05 postgres
...
All 16 of the backends are pegged at 100% CPU, and have been for almost 48 hours each (2872/60). That definitely corroborates Pietro’s claim that this workload has been running for 48 hours. This suggests little to no waiting for disk IO during the entire process.
So will more RAM help? Probably not. Pietro says he’s doing “time series expansion and some arithmetics”. This kind of procedure doesn’t generally require vast amounts of RAM, as calculations don’t often invoke sorting or other temporary memory structures. So what’s really going on?
The second reply gets at the real heart of the matter immediately:
You’d be better off to ask for help in optimizing your queries IMHO.
Why would Scott say that?
Well, there are a lot of ways to do the same thing. In the database world, some of these are highly preferable to others. How exactly is Pietro expanding those time series, for example? He could be doing something like this:
DO $$
DECLARE
a INT;
b INT;
...
new_a INT;
new_b INT;
...
BEGIN
FOR a, b, ... IN SELECT col_a, col_b, ...
LOOP
new_a = some_calculation(a, b, c, ..., n);
new_b = other_calculation(a, b, c, ..., n);
INSERT INTO target_table VALUES (new_a, new_b, ...);
END LOOP;
END;
$$ LANGUAGE plpgsql;
If the necessary logic is complicated enough, that might be the right approach. However, it is extremely expensive to loop through each individual row from a driver query, allocate variables, perform calculations, and invoke a single insert. This kind of process would peg a CPU at 100% while still being highly inefficient by multiple orders of magnitude.
There’s also the possibility Pietro knows this, and the calculations were fairly simple. So maybe he decided the process could be inlined a bit more. Instead of that anonymous block (or something equivalent on the application side), he decides to write helper functions to do the calculations instead:
CREATE OR REPLACE FUNCTION some_calculation(a INT, b INT, ...)
RETURNS INT AS
$$
DECLARE
retval INT;
BEGIN
-- Some calculations here
RETURN retval;
END;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;
Being a smart guy, Pietro knows immutable function results are cached much better than otherwise, and strict functions won’t even run if a parameter is null. With these two declarations, if any invocations take the same parameters, or a parameter is empty, Postgres may not even execute the function at all. Avoiding a jump operation is a huge optimization all by itself, so consider the benefits of preventing the overhead of a function call and its associated logic.
By defining the calculations in this manner, his process would look like this instead:
INSERT INTO target_table
SELECT some_calculation(col_a, col_b, ...),
other_calculation(col_d, col_e, ...),
...
FROM ...
WHERE ...;
Or maybe Pietro knows the value of data staging. Caching is great, but in sufficiently complicated examples of business logic, transforming it into immutable functions may be somewhat difficult. In addition, some “calculations” may rely on the results of another query. This is where temporary and unlogged tables, or CTEs come in.
Consider the fact databases natively communicate using Set Theory. They are highly optimized, and specifically engineered to bulk-process rows in ways user-generated applications simply can’t match in all but the edgiest of cases. So the final iteration of the data onboarding might resemble this:
CREATE TEMP TABLE expand_series AS
SELECT ...;
CREATE TEMP TABLE relate_series AS
SELECT ...;
CREATE TEMP TABLE basic_tally AS
SELECT ...
FROM expand_series
JOIN relate_series ON (...);
INSERT INTO target_table
SELECT ...
FROM basic_tally
JOIN ...
...
WHERE ...;
Or maybe Pietro likes CTEs more, so uses those instead of multiple temporary table steps. Postgres will fully materialize everything anyway. The primary benefit to temporary tables is that repeated execution of multiple steps won’t require re-running all previous elements. If this were some kind of one-time report, a CTE would arguably make more sense as it operates atomically.
Regardless, this kind of iterative transformation allows us to fully embrace Set Theory. Don’t operate on one row, but all of them. If we can organize calculations into distinct categories or steps, then we’re actually requesting the database to transform the data as a whole for each operation. For this entire data set, add some derived date column. For this entire data set, produce a standard deviation variant along this window. For this entire data set, derive these additional columns from these formulae.
That’s how databases think, and Postgres is no different in this regard. By looping through each individual result, we’re actively disrupting that process and literally making it hundreds or thousands of times slower. We could have all the memory in the world, and even find some way to fully utilize it, and it wouldn’t influence that universal truth.
Now, we don’t know what Pietro is actually doing, just that he’s convinced more RAM might help. The evidence he provided suggests that isn’t the case. Having sixteen processes running at full blast for two days either means he has a massive amount of data, or there’s some unaddressed inefficiency in his overall approach.
I suspect the latter only because it’s so common. People don’t usually think like databases, and really, who can blame them? What kind of nutcase would willingly do this for a living?