PostgreSQL CPU, Disk, and I/O Tuning Guide
Level: intermediate · ~15 min read · Intent: informational
Audience: backend developers, database engineers, technical teams
Prerequisites
- basic familiarity with PostgreSQL
Key takeaways
- Most PostgreSQL CPU, disk, and I/O problems are first diagnosis problems. You need to identify whether the system is CPU-bound, read-I/O-bound, write-I/O-bound, or simply executing too much work before changing settings.
- The biggest wins usually come from better query shape, healthier maintenance, and storage-aware settings such as shared_buffers, effective_cache_size, checkpoint behavior, and I/O concurrency rather than from random tuning changes.
FAQ
- What is the first thing to check when PostgreSQL is slow?
- First classify the bottleneck. Check whether PostgreSQL is spending time on CPU work, waiting on disk I/O, stalling on writes and checkpoints, or being overwhelmed by too many queries or connections.
- Should I increase shared_buffers to fix disk I/O problems?
- Sometimes, but not blindly. PostgreSQL documents 25% of RAM as a reasonable starting point on a dedicated server, and very large shared_buffers settings are not always better because PostgreSQL also relies on the OS page cache.
- What PostgreSQL setting helps with read I/O concurrency?
- effective_io_concurrency is one of the main settings for that. PostgreSQL documents that it matters most on higher-latency storage where queries otherwise stall on I/O.
- Can WAL and checkpoints cause latency spikes?
- Yes. PostgreSQL's checkpoint and WAL settings affect write smoothing, fsync pressure, and transaction latency, especially on busy write-heavy systems.
PostgreSQL performance tuning gets much easier once you stop treating CPU, disk, and I/O as one big generic “database slowness” problem.
They are not the same.
A PostgreSQL system can feel slow because:
- the CPU is busy planning or executing too much work
- reads are waiting on storage
- writes are spiking during checkpoints or WAL flushes
- temp files are being created because sorts or hashes spill
- connection count is too high
- or the application is simply asking PostgreSQL to do far more work than it should
That is why good CPU, disk, and I/O tuning starts with one question:
What kind of work is the database actually struggling with?
This guide focuses on how to answer that question and what to tune next.
1. Start by Classifying the Bottleneck
Before changing settings, separate the problem into one of these buckets:
CPU-bound
Signs include:
- high CPU usage
- queries spending time on execution rather than waiting
- many joins, sorts, aggregations, or function-heavy plans
- too many queries or too much concurrent work
Read-I/O-bound
Signs include:
- high storage read latency
- low buffer hit effectiveness on hot paths
- scans or index reads waiting on disk
- large working sets that do not fit cache well
Write-I/O-bound
Signs include:
- checkpoint spikes
- WAL flush pressure
- fsync-heavy behavior
- heavy update/delete churn
- temp files and dirty-page writeback delays
Mixed or app-driven
Signs include:
- lots of small queries
- N+1 patterns
- too many client connections
- long transactions
- lock waits
- poor batching
- inefficient pagination
A lot of “I/O tuning” problems are really query-design problems wearing storage-shaped clothing.
2. Use the Right Monitoring Views First
PostgreSQL’s own monitoring chapter is very clear that database monitoring should be paired with OS tools like:
topiostatvmstatps
That matters because PostgreSQL views can tell you a lot, but they do not replace system-level visibility.
PostgreSQL views to prioritize
pg_stat_statements
Use this to find:
- highest total execution time
- highest average execution time
- most frequently called statements
This helps distinguish:
- one expensive query from
- thousands of moderate ones
pg_stat_activity
Use this to see:
- what is running right now
- long-running queries
- waits
- blocked sessions
- idle-in-transaction sessions
pg_stat_io
This is one of the most useful newer monitoring views for I/O. PostgreSQL documents it as cluster-wide I/O statistics broken down by backend type, object, and context.
It helps you see activity across:
- tables
- indexes
- WAL
- temp files
- and more
OS tools still matter
Use:
iostatfor storage latency and utilizationvmstatfor memory pressure and paging contexttoporhtopfor CPU saturation and process-level visibility
If you skip OS-level observation, you can easily misdiagnose the bottleneck.
3. Turn On track_io_timing When You Need Real I/O Timing
PostgreSQL documents track_io_timing as off by default because it repeatedly queries the operating system clock and may add overhead on some platforms.
But when you are troubleshooting real I/O issues, it is extremely useful.
With it enabled, I/O timing becomes visible in places like:
pg_stat_iopg_stat_databaseEXPLAIN (BUFFERS)pg_stat_statements
Why this matters
Without timing, you may know that a query read many blocks. With timing, you can see whether the reads were actually slow.
That is the difference between:
- “this query touched disk” and
- “this query is bottlenecked by slow disk”
Practical rule
Use track_io_timing intentionally:
- benchmark its overhead on your platform if needed
- then enable it when you need accurate I/O visibility
4. Tune CPU Problems by Reducing Work First
CPU tuning is often less about CPU settings and more about reducing wasted computation.
Common CPU-heavy patterns
- joining too many rows
- sorting large result sets
- large hash aggregates
- frequent expression-heavy filters
- too many parallel workers for the real workload
- too many queries overall
- badly cached application access patterns
- overactive JIT on workloads that do not benefit enough
First fixes to consider
- reduce row counts earlier
- add better indexes
- return fewer columns
- eliminate N+1 queries
- replace deep offset pagination
- move expensive repeated logic out of query hot paths
- inspect expensive query plans with
EXPLAIN ANALYZE
CPU is not only “bad query” CPU
It can also be:
- planning overhead on too many complex queries
- parallel worker overhead
- application churn creating too many short transactions
- background maintenance competing with foreground load
So always ask:
- is the CPU doing useful work or
- is it doing avoidable work?
5. Use shared_buffers Sensibly
shared_buffers is one of the most important PostgreSQL memory settings because it directly influences how much data PostgreSQL can keep in its own shared buffer cache.
PostgreSQL’s docs say that on a dedicated database server with 1 GB or more of RAM, a reasonable starting value is about 25% of system memory, and that it is unlikely more than 40% of RAM will work better than a smaller setting because PostgreSQL also relies on the operating system cache.
What this means practically
- too low: PostgreSQL may reread hot data more often than needed
- too high: you can starve the OS page cache and increase memory pressure elsewhere
Good rule
Treat shared_buffers as a starting-point tuning setting, not a badge of aggressiveness.
Larger is not automatically better.
6. Set effective_cache_size Honestly
This setting does not allocate memory. It tells the planner how much cache is likely available to a single query.
PostgreSQL documents that a higher effective_cache_size makes index scans look more attractive, while a lower value makes sequential scans look more attractive.
Why it matters
If this value is too low, PostgreSQL may underestimate how likely data is to be cached and choose less efficient plans.
If it is too high, PostgreSQL may become too optimistic about index access.
Practical rule
Estimate:
- PostgreSQL shared buffers plus
- the portion of OS cache likely available for PostgreSQL files minus
- realistic concurrency sharing
Then set effective_cache_size to something believable, not aspirational.
7. Understand seq_page_cost and random_page_cost
These planner cost settings help PostgreSQL estimate storage behavior.
PostgreSQL documents:
seq_page_costdefaulting to1.0random_page_costdefaulting to4.0
The docs also explain that lowering random_page_cost relative to seq_page_cost makes PostgreSQL more likely to choose index scans, while raising it makes index scans look more expensive.
Why this matters
These are not “speed” knobs directly. They are planner model knobs.
Good use of these settings
Change them only when your storage reality differs meaningfully from the planner’s default assumptions.
Examples:
- highly cached systems
- fast SSD/NVMe-backed storage
- storage where random access is much less painful than the traditional default assumes
- storage where random access is truly worse and cache hit behavior is poor
Important warning
PostgreSQL’s docs explicitly say there is no precise formula for these values and that changing them based on only a few experiments is risky.
That is exactly right.
Treat them as workload-wide averages, not single-query hacks.
8. Use effective_io_concurrency and maintenance_io_concurrency Properly
PostgreSQL’s docs say effective_io_concurrency matters most on higher-latency storage where queries otherwise experience noticeable I/O stalls, and that unnecessarily high values can increase I/O latency for all queries.
This setting also controls prefetch distance on systems that support prefetch advice.
When it helps most
- larger scans
- higher-latency storage
- storage systems with enough IOPS and parallel I/O capability
- tablespaces where read concurrency can benefit from more aggressive prefetching
maintenance_io_concurrency
This is similar but used for maintenance operations. It matters for:
- vacuum
- index builds
- and other maintenance work that can benefit from concurrent I/O
Practical rule
These settings are much more worth testing on:
- SSD/NVMe
- cloud block storage
- larger analytic scans
- maintenance-heavy environments
than on tiny single-disk systems where the bottleneck is elsewhere.
9. Watch Checkpoints and WAL for Write Latency Spikes
A very common PostgreSQL write-latency problem is badly smoothed checkpoint behavior.
PostgreSQL’s WAL and checkpoint docs make two things clear:
- checkpoint settings affect write distribution over time
- WAL growth and checkpoint pressure can create latency spikes if the system is constantly forced into heavier writeback behavior
Key settings to understand
checkpoint_timeout
Controls the maximum time between automatic checkpoints.
max_wal_size
Soft limit for WAL growth between automatic checkpoints.
checkpoint_completion_target
This is especially important because PostgreSQL documents that it spreads checkpoint writes across more of the interval.
Practical tuning idea
If checkpoints are too frequent or too bursty:
- checkpoint-triggered write pressure increases
- fsync bursts may become more visible
- latency can get spiky during heavy write periods
In many write-heavy systems, smoother checkpoint behavior matters a lot more than people realize.
10. Enable log_checkpoints on Systems You Care About
This is one of the easiest practical observability wins.
If you are tuning write performance, checkpoint visibility matters. Logs can show:
- how often checkpoints happen
- how much data is written
- whether checkpoints are requested too often because WAL fills too quickly
- and whether write timing looks unhealthy
This helps you distinguish:
- normal checkpoint behavior from
- a workload that is constantly forcing stressful checkpoints
11. Understand Background Writer and Writeback Controls
PostgreSQL provides a few settings that influence how dirty data is written in the background rather than being left for foreground backends or checkpoints to flush more abruptly.
Important ones include:
bgwriter_delaybgwriter_lru_maxpagesbgwriter_lru_multiplierbgwriter_flush_afterbackend_flush_after
Why this matters
PostgreSQL’s docs note that these settings can reduce transaction latency by limiting how much dirty data accumulates in the kernel page cache before fsync, but they also note that performance can degrade in some cases, especially when the workload sits in the gray zone between shared buffers and the OS page cache.
Practical lesson
These are powerful settings, but they are not “always increase them” settings.
They are most useful when:
- you see writeback-related stalls
- checkpoint smoothing matters
- backend processes are doing too much of their own write work
- latency spikes correlate with flush behavior
12. Size work_mem Carefully for Sort and Hash Work
work_mem is often one of the biggest hidden I/O levers.
PostgreSQL documents that it is applied per internal sort or hash operation, which means total memory use can be many times the configured value when several operations or sessions run concurrently.
Why this matters
If work_mem is too low:
- sorts spill to disk
- hashes spill to disk
- temp file I/O grows
- CPU and I/O both get worse
If it is too high:
- concurrent queries can consume far more memory than expected
- the system can become unstable under load
Good workflow
Do not tune work_mem in the abstract.
Use:
EXPLAIN (ANALYZE, BUFFERS)- temp file logs
- query patterns
- concurrency expectations
to decide whether a specific workload is spilling too often.
13. Distinguish Read I/O From WAL I/O
This is another place people blur different problems together.
Data-file reads and writes
These involve:
- tables
- indexes
- temp files
- heap and index access
WAL I/O
This involves:
- WAL buffer writes
- WAL flushes
- checkpoint-related write patterns
- synchronous commit behavior
These are related, but not identical.
A system can have:
- healthy table-read performance
- but painful WAL flush latency
or:
- fine WAL
- but poor data-file read performance due to storage latency or bad plans
PostgreSQL’s newer pg_stat_io visibility helps separate these more clearly.
14. Tune Planner and Storage Together, Not Separately
Planner cost settings like:
random_page_costseq_page_costeffective_cache_size
are storage-modeling settings.
That means they should be tuned with real knowledge of:
- cache behavior
- storage latency
- query mix
- and concurrency
The mistake is changing planner costs without checking whether the actual problem is:
- a bad query
- a bad index
- a stale statistic
- or a write-path problem
These settings matter, but they are not the first fix for everything.
15. Use Tablespace-Level I/O Settings Where Appropriate
PostgreSQL’s docs note that tablespaces can override settings like:
seq_page_costrandom_page_costeffective_io_concurrencymaintenance_io_concurrency
This is useful when different tablespaces sit on different storage classes.
Example use cases
- hot transactional tables on faster storage
- archive/history tables on slower storage
- maintenance-heavy objects on storage with better parallel I/O behavior
This is more advanced, but for larger systems it can be much cleaner than pretending all storage behaves the same.
16. Do Not Ignore Query Shape While Tuning Hardware-Like Settings
This is the biggest practical warning in the whole topic.
Many CPU/disk/I/O tuning problems are really caused by:
- scanning too many rows
- poor indexes
- deep offset pagination
- N+1 query patterns
- wide rows
- bad join design
- unhealthy churn-heavy tables
If one API endpoint is issuing 220 queries, changing effective_io_concurrency is not the main fix.
If one report is sorting millions of rows it did not need to sort, increasing shared_buffers is not the real solution.
Always pair infrastructure-shaped tuning with query-shaped diagnosis.
A Good PostgreSQL CPU, Disk, and I/O Tuning Workflow
When a system feels slow, use this order:
Step 1
Confirm whether the system is CPU-bound, read-I/O-bound, write-I/O-bound, or mixed.
Step 2
Use:
pg_stat_statementspg_stat_activitypg_stat_ioEXPLAIN (ANALYZE, BUFFERS)- and OS tools like
iostatandvmstat
Step 3
Fix waste first:
- better indexes
- smaller result sets
- fewer queries
- better pagination
- shorter transactions
Step 4
Then tune memory and planner modeling:
shared_bufferseffective_cache_sizework_memrandom_page_costseq_page_cost
Step 5
Then tune I/O and write smoothing:
effective_io_concurrencymaintenance_io_concurrency- checkpoint settings
- background writer / flush-after behavior
That order prevents a lot of aimless tuning.
FAQ
What is the first thing to check when PostgreSQL is slow?
First classify the bottleneck. Check whether PostgreSQL is spending time on CPU work, waiting on disk I/O, stalling on writes and checkpoints, or being overwhelmed by too many queries or connections.
Should I increase shared_buffers to fix disk I/O problems?
Sometimes, but not blindly. PostgreSQL documents 25% of RAM as a reasonable starting point on a dedicated server, and very large shared_buffers settings are not always better because PostgreSQL also relies on the OS page cache.
What PostgreSQL setting helps with read I/O concurrency?
effective_io_concurrency is one of the main settings for that. PostgreSQL documents that it matters most on higher-latency storage where queries otherwise stall on I/O.
Can WAL and checkpoints cause latency spikes?
Yes. PostgreSQL’s checkpoint and WAL settings affect write smoothing, fsync pressure, and transaction latency, especially on busy write-heavy systems.
Conclusion
PostgreSQL CPU, disk, and I/O tuning is not about finding one magic setting.
It is about understanding:
- what kind of work the system is doing
- where time is actually being spent
- and whether the real problem is compute, read latency, write smoothing, or application waste
The biggest wins usually come from:
- better monitoring
- less wasted work
- healthier cache and planner assumptions
- smarter I/O concurrency settings
- and smoother checkpoint behavior
That is what turns tuning from guesswork into engineering.