How to Reduce PostgreSQL Table Bloat
Level: intermediate · ~14 min read · Intent: informational
Audience: backend developers, database engineers, technical teams
Prerequisites
- basic familiarity with PostgreSQL
Key takeaways
- The best way to reduce PostgreSQL table bloat is usually prevention: healthy autovacuum, shorter transactions, fewer unnecessary updates, narrower rows, and table/index design that avoids excessive churn.
- Standard VACUUM is usually the right routine tool because it reclaims space for reuse without blocking normal reads and writes, while VACUUM FULL and rewrite-style fixes are heavier tools for cases where reclaiming disk space immediately is truly necessary.
FAQ
- What causes PostgreSQL table bloat?
- Table bloat usually comes from MVCC row versioning combined with frequent updates or deletes, long transactions that delay cleanup, and maintenance that cannot keep up with the workload.
- Does VACUUM reduce PostgreSQL table size?
- Standard VACUUM usually makes dead space reusable inside the table but does not usually shrink the table file at the operating-system level. VACUUM FULL rewrites the table and can return space to the OS, but it is much more disruptive.
- Should I run VACUUM FULL regularly?
- Usually no. PostgreSQL's own guidance is to use regular VACUUM often enough that VACUUM FULL is rarely needed, because VACUUM FULL is slower and requires an ACCESS EXCLUSIVE lock.
- Can indexes bloat too?
- Yes. Index bloat is a separate issue from table bloat, and REINDEX is the usual tool when the index itself has become bloated.
PostgreSQL table bloat is one of those problems that usually builds slowly and then suddenly becomes impossible to ignore.
At first it looks like:
- a table growing faster than expected
- updates becoming more expensive
- scans taking longer
- autovacuum running more often
- or disk usage rising even though row counts are not rising proportionally
Then eventually the question becomes: why is this table so much bigger than it should be?
The short answer is MVCC.
PostgreSQL keeps old row versions around long enough to preserve concurrency guarantees. That is part of why PostgreSQL behaves so well under concurrent reads and writes. But it also means updated and deleted rows do not disappear instantly. If cleanup cannot keep up, dead space accumulates and tables become bloated.
This guide explains:
- what table bloat actually is
- why it happens
- how to reduce it without overreacting
- and when heavier tools like
VACUUM FULL,CLUSTER, table rewrites, orREINDEXare justified
What PostgreSQL Table Bloat Actually Is
Table bloat is excess space in a table caused by dead or no-longer-useful row versions that have not been fully reclaimed into a healthy steady state.
PostgreSQL uses MVCC, so an UPDATE does not overwrite a row in place in the simple sense many developers imagine. Instead, it creates a new row version, and the old version remains until it is no longer visible to any transaction and can be cleaned up by vacuuming.
That means frequent:
UPDATEDELETE- and some forms of churn-heavy workload
create dead tuples.
If VACUUM can keep up, PostgreSQL can reuse that space efficiently.
If it cannot, the table grows larger than the useful live data really requires.
Standard VACUUM vs Real Shrinking
This is the most important conceptual distinction.
Standard VACUUM
Standard VACUUM removes dead row versions and makes the space available for future reuse inside the same table. It usually does not return the table file’s space to the operating system. :contentReference[oaicite:1]{index=1}
VACUUM FULL
VACUUM FULL rewrites the table into a new compact file and can return space to the operating system, but it is much slower and requires an ACCESS EXCLUSIVE lock on the table while it runs. :contentReference[oaicite:2]{index=2}
That is why most bloat reduction work should start with:
- preventing bloat
- improving steady-state reuse
- and making routine vacuuming effective
not jumping straight to VACUUM FULL.
The Main Causes of PostgreSQL Table Bloat
1. Update-heavy tables
Every update creates a new row version. If a table is updated constantly, dead tuples accumulate quickly.
Common examples:
- job queues
- sessions
- order status tables
- event processors
- records with “last_seen_at” style updates
- frequently rewritten JSONB documents
2. Large delete bursts
Mass deletes can leave behind a huge amount of dead space. If the workload pattern is:
- delete lots of rows
- then not refill the table quickly in the same areas
bloat can become very visible.
3. Long-running transactions
Even if autovacuum wants to clean up dead tuples, it may be unable to reclaim space as effectively while old transactions are still holding visibility horizons open.
This is one of the most common reasons a table stays bloated longer than expected.
4. Autovacuum not keeping up
Autovacuum is usually the right default mechanism, but some tables generate churn faster than the default settings are prepared for.
5. Wide rows
If rows are large, every extra dead tuple wastes more space. Wide rows make bloat more expensive.
6. Frequent updates to indexed columns
Updating indexed columns can create more index churn in addition to heap churn, which means the bloat problem may affect both tables and indexes.
Step 1: Confirm Whether You Have Table Bloat, Index Bloat, or Both
A common mistake is treating every large table as a table-bloat problem.
Sometimes the table is large because:
- it really contains a lot of live data
- or the index is the thing that is bloated, not the heap
This distinction matters because:
- table bloat points you toward vacuuming, churn reduction, HOT-friendlier design, and rewrite choices
- index bloat points you toward
REINDEXor index-strategy changes
Good first checks
Look at:
- table size over time
- index size over time
- live versus dead tuple estimates
- churn rates
- autovacuum frequency
- whether row count growth explains the storage growth
The goal is to avoid using a table rewrite when the real issue was an oversized index.
Step 2: Let Autovacuum Work, Then Tune It Where Needed
PostgreSQL’s own guidance is clear that for many installations, autovacuum is sufficient, and it is generally unwise to disable it unless the workload is extremely predictable. :contentReference[oaicite:3]{index=3}
That matters because many bloat problems are not caused by autovacuum being conceptually wrong. They are caused by:
- the workload being hotter than the defaults expect
- one or two specific tables needing more aggressive settings
- or long transactions preventing cleanup from being effective
Practical rule
Do not start with:
- “disable autovacuum and vacuum manually”
Start with:
- “is autovacuum keeping up on the hot tables?”
What to review
On churn-heavy tables, look at:
- how often autovacuum runs
- whether dead tuples stay elevated
- whether vacuum work is lagging behind writes
- whether analyze frequency also makes sense
Good pattern
Table-specific autovacuum settings are often better than global overreaction.
That lets you tune:
- hot queue tables
- hot status tables
- hot session tables
without making the whole database more aggressive than it needs to be.
Step 3: Reduce Unnecessary Updates
One of the best ways to reduce bloat is to create fewer dead tuples in the first place.
That usually means reviewing application behavior.
Common backend mistakes
- updating
updated_attoo aggressively - rewriting a whole JSONB blob for a tiny field change
- polling loops that rewrite “heartbeat” fields constantly
- idempotent updates that still write even when nothing changed
- status flapping tables with excessive churn
Better patterns
- avoid no-op updates
- avoid rewriting unchanged columns
- isolate frequently changing fields from large wide rows
- move append-only history into separate tables when the domain fits
- rethink whether some mutable state should actually be modeled differently
If one table is updated every few seconds for every active user, bloat is not surprising. The design needs to acknowledge that.
Step 4: Keep Hot Tables Narrow
Bloat hurts more when rows are large.
That is why wide hot tables are such a bad combination.
Common sources of wide rows
- large text columns
- oversized JSONB documents
- bytea/blob fields
- many sparse optional columns
- denormalized payloads mixed directly into transactional tables
Better approach
Keep the hot table lean and move bulky or infrequently read data into companion tables when appropriate.
Instead of one giant table like:
create table sessions (
id bigint generated always as identity primary key,
user_id bigint not null,
status text not null,
last_seen_at timestamptz not null,
metadata jsonb,
audit_payload jsonb,
full_client_context jsonb
);
consider whether the hottest row path should really include all of that.
A leaner row:
- wastes less space per dead tuple
- fits cache better
- updates more cleanly
- and reduces the cost of bloat if churn still exists
Step 5: Encourage HOT-Friendly Update Patterns
Heap-only tuple (HOT) updates help PostgreSQL avoid extra index churn when an update does not need to touch indexed columns.
This matters for bloat because updates that force broad index maintenance are generally more expensive than updates that stay HOT-friendly.
Better HOT conditions usually mean
- fewer indexes on frequently updated columns
- separating frequently changing attributes from heavily indexed lookup fields
- avoiding unnecessary updates to indexed fields
Practical design lesson
If one table has:
- many indexes
- and high update churn
then every update may become more expensive and contribute more broadly to storage and maintenance pressure.
That does not mean “never index.” It means:
- do not overindex hot mutable tables
- and do not put constantly changing columns into indexes unless the workload really needs it
Step 6: Use Fillfactor Intentionally on High-Churn Tables
For some update-heavy tables, fillfactor can help by leaving free space on pages for future row versions.
That can improve update behavior and reduce page splits or page-level churn in the right workloads.
When it makes sense
Consider fillfactor when:
- the table is updated frequently
- rows often remain on the same page size class
- churn is high enough that page reuse matters
Important caution
Fillfactor is not a universal bloat cure. It is a workload-specific tuning lever.
Use it when you understand:
- why updates are churning
- and how page-level space reservation might help that exact table
Do not use it as a blanket setting for everything.
Step 7: Use TRUNCATE Instead of Periodic Full-Table DELETE Where Appropriate
PostgreSQL’s docs explicitly note that if a table’s entire contents are deleted periodically, TRUNCATE may be a better choice than DELETE followed by VACUUM, because TRUNCATE removes the contents immediately without requiring later vacuum work to reclaim the space. :contentReference[oaicite:4]{index=4}
Good use cases
- staging tables
- scratch tables
- import buffers
- periodically reset work tables
- short-lived queue-like tables with full-table resets
Important caution
TRUNCATE has different semantics from DELETE, including stronger effects on visibility and locking behavior.
So only use it when the application behavior actually fits.
Step 8: Know When VACUUM FULL Is Actually Justified
VACUUM FULL should not be your routine plan.
PostgreSQL’s own docs say administrators should generally strive to use standard VACUUM and avoid VACUUM FULL, because standard vacuum can run alongside production operations while VACUUM FULL takes an ACCESS EXCLUSIVE lock and is much slower. :contentReference[oaicite:5]{index=5}
When VACUUM FULL may make sense
- there was a massive delete or rewrite event
- you genuinely need disk space returned to the OS soon
- the table is badly bloated and will not simply refill soon afterward
- a maintenance window exists and the lock is acceptable
When it usually does not make sense
- as a weekly routine
- as a substitute for fixing autovacuum lag
- when the table will just grow back immediately
- when the real problem is index bloat, not heap bloat
Practical rule
Use VACUUM FULL as a repair tool, not a default habit.
Step 9: Use Table-Rewrite Options Carefully
The PostgreSQL docs note that when you truly need to reclaim excess disk space after heavy update/delete activity, alternatives like VACUUM FULL, CLUSTER, or table-rewriting variants of ALTER TABLE can do that, but they require ACCESS EXCLUSIVE locks and temporarily use extra disk space approximately equal to the size of the table. :contentReference[oaicite:6]{index=6}
That makes these tools powerful but operationally expensive.
Good use cases
- one-off cleanup after unusual churn
- disk-space recovery during planned maintenance
- rewrites combined with physical reorganization goals
Bad use cases
- routine maintenance for normal churn-heavy tables
- fixing what should have been solved by steady autovacuum and better table design
Step 10: Reindex When the Index Is the Bloated Part
Table bloat and index bloat are not the same problem.
The PostgreSQL docs describe REINDEX as rebuilding indexes and explicitly list index bloat as one scenario where it is useful, especially when a B-tree index contains many empty or nearly-empty pages. They also note that REINDEX CONCURRENTLY can rebuild without blocking concurrent inserts, updates, or deletes, though with caveats. :contentReference[oaicite:7]{index=7}
When REINDEX makes sense
- the index has grown disproportionately
- the heap is not the main problem
- index scans have become less efficient
- a heavy churn pattern created lots of near-empty index pages
- you need to rebuild after corruption or fillfactor changes
Practical rule
If the table is fine but the indexes are swollen, REINDEX is a more precise fix than rewriting the table.
Step 11: Partition Hot Historical Tables When the Workload Justifies It
Partitioning is not a universal anti-bloat strategy, but it can help with tables where:
- data is naturally time-based
- retention rules exist
- old partitions become mostly immutable
- churn is concentrated in the newest slice
Why this can help
It becomes easier to:
- vacuum smaller active partitions more effectively
- isolate hot data from cold data
- archive or drop old partitions
- and avoid making one giant table absorb every lifecycle pattern
Good candidates
- event tables
- logs
- time-series metrics
- large job history tables
- append-heavy systems with age-based lifecycle rules
Do not partition solely because “the table is big.” Partition when the workload pattern makes it operationally useful.
Step 12: Watch for Long Transactions and Idle-in-Transaction Sessions
Even good autovacuum settings cannot fully help if old transactions keep visibility horizons open.
Common causes
- application code opening transactions too early
- network calls inside transactions
- background workers forgetting to commit
- tools leaving sessions idle in transaction
- long-running manual sessions during debugging
Why this matters
Dead tuples may remain reclaimable in theory but not reusable as effectively in practice while long-lived transactions keep old snapshots relevant.
What to do
Monitor:
- long transaction age
- idle-in-transaction sessions
- blocking chains
- backend behavior during incidents
Often the fix is not “more vacuum.” It is:
- “stop keeping transactions open for so long.”
Step 13: Measure Before and After
Do not treat bloat work as guesswork.
Before making a fix, capture:
- table size
- index size
- live/dead tuple estimates
- churn rate
- vacuum history
- query latency on affected paths
- disk pressure if that is the main concern
Afterward, measure again.
This matters because some fixes:
- reduce disk size
- but do not improve latency much
while others:
- improve steady-state performance
- without dramatically shrinking the file right away
Those are both useful, but they are not the same outcome.
A Practical Bloat Reduction Workflow
If you want a clear order of operations, use this:
Step 1
Confirm whether the problem is:
- table bloat
- index bloat
- or both
Step 2
Check autovacuum and analyze health on the affected table
Step 3
Check for long transactions and idle-in-transaction sessions
Step 4
Review the workload:
- frequent updates?
- mass deletes?
- wide rows?
- JSONB churn?
- too many indexes?
Step 5
Reduce unnecessary update churn and keep hot rows lean
Step 6
Tune table-specific autovacuum behavior if needed
Step 7
Use REINDEX if the indexes are the real issue
Step 8
Use VACUUM FULL, CLUSTER, or a rewrite only when you truly need immediate disk reclamation and can tolerate the operational cost
This workflow avoids overreacting while still giving you a path to real cleanup.
FAQ
What causes PostgreSQL table bloat?
Table bloat usually comes from MVCC row versioning combined with frequent updates or deletes, long transactions that delay cleanup, and maintenance that cannot keep up with the workload.
Does VACUUM reduce PostgreSQL table size?
Standard VACUUM usually makes dead space reusable inside the table but does not usually shrink the table file at the operating-system level. VACUUM FULL rewrites the table and can return space to the OS, but it is much more disruptive.
Should I run VACUUM FULL regularly?
Usually no. PostgreSQL’s own guidance is to use regular VACUUM often enough that VACUUM FULL is rarely needed, because VACUUM FULL is slower and requires an ACCESS EXCLUSIVE lock.
Can indexes bloat too?
Yes. Index bloat is a separate issue from table bloat, and REINDEX is the usual tool when the index itself has become bloated.
Conclusion
Reducing PostgreSQL table bloat is mostly about preventing needless dead-space growth and helping normal maintenance keep up.
That means:
- letting autovacuum work
- tuning hot tables where needed
- shortening transactions
- reducing unnecessary updates
- keeping high-churn rows lean
- avoiding bad delete patterns
- and separating table bloat from index bloat before choosing a fix
The best long-term goal is not to keep every table at its minimum possible size.
It is to keep each table in a healthy steady state where space is reused efficiently and heavy rewrite operations stay rare.
That is usually the real win.