PostgreSQL Maintenance Checklist for Production

·Updated Apr 3, 2026·
postgresqldatabasesqlmaintenanceproductiondatabase-operations
·

Level: intermediate · ~15 min read · Intent: informational

Audience: backend developers, database engineers, technical teams

Prerequisites

  • basic familiarity with PostgreSQL

Key takeaways

  • Production PostgreSQL stays healthy when maintenance is treated as an ongoing operating practice, not a rescue task after performance degrades.
  • The highest-value maintenance work usually focuses on vacuuming, statistics, backups, restore testing, monitoring, logs, replication health, and operational safety around schema changes.

FAQ

What is the most important PostgreSQL maintenance task in production?
For most systems, the most important ongoing task is keeping autovacuum and statistics healthy, because that directly affects table health, bloat control, planner quality, and long-term performance.
How often should PostgreSQL backups be tested?
Regularly. Backup jobs alone are not enough. A production team should routinely verify backups and perform real test restores so recovery is proven, not assumed.
Do I need to run VACUUM manually in production?
Sometimes, but not always. Many systems rely mostly on autovacuum. Manual VACUUM is usually most useful when investigating hot tables, unusual churn, lagging cleanup, or one-off maintenance situations.
What should be checked every day in PostgreSQL production?
Every day, you should check backup status, replication health if relevant, disk growth, lock or slow-query signals, connection pressure, and whether autovacuum appears to be keeping up on the hottest tables.
0

PostgreSQL production maintenance is not one job.

It is a set of repeated habits that keep the database:

  • fast enough
  • recoverable enough
  • observable enough
  • and safe enough

until the day something goes wrong.

That is the real point of maintenance.

A healthy PostgreSQL system usually does not stay healthy by accident. It stays healthy because a team keeps up with:

  • vacuuming and statistics
  • disk growth
  • backup quality
  • restore testing
  • logs and slow-query visibility
  • replication health
  • schema safety
  • and release hygiene

This checklist is designed to be practical.

Not every database needs every task at the same frequency. But almost every serious PostgreSQL production system needs most of them in some form.

What Production PostgreSQL Maintenance Is Really About

A useful production checklist should help you prevent five broad classes of failure:

  1. Performance drift
    Queries get slower because of bloat, stale statistics, storage pressure, or bad workload growth.

  2. Recoverability drift
    Backup jobs still run, but restores are untested or incomplete.

  3. Operational drift
    Config, roles, schemas, and replication setups become harder to reason about over time.

  4. Capacity drift
    Disk, connections, WAL, and table growth creep forward until they become an incident.

  5. Upgrade and security drift
    The cluster falls behind on routine patching, role hygiene, and maintenance-safe operational practices.

If your maintenance checklist fights those five drifts, it is probably a good one.

Daily PostgreSQL Production Checklist

These are the things most serious systems should check every day.

1. Confirm backups succeeded

Do not settle for:

  • “the backup job ran”

Check:

  • did it finish successfully
  • did it produce the expected files or artifacts
  • did WAL archiving continue cleanly if used
  • did backup size look roughly normal
  • did verification steps pass where used

For physical backups, pg_basebackup is the standard whole-cluster tool, and pg_verifybackup can validate backups against the backup manifest. But even PostgreSQL’s own docs stress that verification still does not replace real test restores. (postgresql.org, postgresql.org)

Daily question

Can we still recover if the primary disappears today?

2. Check replication health

If the system uses standbys, check:

  • replication lag
  • sender and receiver state
  • slot health if replication slots are used
  • WAL retention behavior
  • whether the standby is actually still a failover candidate

A replica that quietly stopped replaying hours ago is not high availability. It is false confidence.

3. Check disk growth and free space

Watch:

  • total disk usage
  • WAL growth
  • log growth
  • temp file growth
  • sudden table/index growth
  • archive storage growth

Disk full problems are some of the most avoidable PostgreSQL incidents.

Daily question

What is growing faster than expected?

4. Check slow-query and lock-wait signals

Every day, look for:

  • new slow query patterns
  • elevated query duration
  • lock wait spikes
  • blocked sessions
  • temp file spikes
  • checkpoint pressure if write latency is sensitive

This is where logs, pg_stat_statements, pg_stat_activity, and dashboard trends work together.

5. Check connection pressure

Look for:

  • active versus idle sessions
  • pool saturation
  • growing connection count after deploys
  • idle-in-transaction sessions
  • worker fleets scaling faster than the database should

A connection problem often appears before it becomes a query problem.

6. Check autovacuum is keeping up on hot tables

You do not need to review every table every day. But you should know whether the hottest write tables are:

  • accumulating dead tuples too fast
  • missing analyze refreshes
  • being vacuumed too slowly
  • or showing signs of bloat pressure

PostgreSQL’s vacuuming docs make clear that routine maintenance is needed to recover or reuse dead-row space, update planner statistics, keep the visibility map current, and prevent wraparound problems. (postgresql.org)

Weekly PostgreSQL Production Checklist

These are the checks and actions that usually do not need daily attention, but should not be ignored for long.

7. Review top expensive queries

Use:

  • pg_stat_statements
  • logs
  • slow-query dashboards
  • app performance dashboards

Look for:

  • queries with highest total execution time
  • queries with highest mean execution time
  • high-call noisy queries
  • newly introduced statements after releases
  • rising buffer or I/O heavy patterns

The goal is to catch:

  • query regressions
  • ORM mistakes
  • workload changes
  • or rising customer/tenant pressure

before they become emergencies.

At least weekly, inspect:

  • largest tables
  • fastest-growing tables
  • largest indexes
  • indexes growing faster than expected
  • tenant-specific outliers if the app is multi-tenant

This helps catch:

  • hidden bloat
  • archive-table problems
  • unbounded event storage
  • accidental duplication
  • and retention failures

9. Review autovacuum tuning for outlier tables

Many systems can rely mostly on defaults. But production usually has a few tables that behave differently.

Common examples:

  • jobs
  • sessions
  • event queues
  • activity feeds
  • append-heavy logs
  • churn-heavy workflow tables

Review whether those tables need:

  • more aggressive vacuum thresholds
  • more analyze frequency
  • different fillfactor
  • or more thoughtful table design

10. Review temp file activity

If temp file logging is enabled, review:

  • which queries spilled
  • how large the spills were
  • whether the same query pattern keeps spilling
  • whether the issue is memory sizing or bad query shape

This is one of the clearest ways to find:

  • expensive sorts
  • hash spills
  • analytical endpoints that are drifting
  • or reports that need redesign

11. Review lock and blocking patterns

A weekly look at blocking trends helps catch:

  • bad migrations
  • long transactions
  • background job collisions
  • queue contention
  • admin tasks that are too disruptive

Lock pain is easier to fix when it is recognized as a pattern instead of a rare mystery.

12. Review logs for noisy warnings and recurring operational smells

Look for recurring patterns such as:

  • checkpoint warnings
  • replication warnings
  • log-rotation issues
  • lock waits
  • temp file churn
  • authentication failures
  • autovacuum lag clues
  • canceled statements
  • timeout patterns

Production logs are one of the earliest places trouble becomes visible.

Monthly PostgreSQL Production Checklist

These are the deeper maintenance tasks that should happen on a regular rhythm even if nothing looks broken.

13. Perform a real restore test

This is one of the most important monthly tasks.

Do not just verify that backups exist. Actually restore one.

Test:

  • whole-cluster restore
  • one database restore if that matters
  • point-in-time recovery if that matters
  • role and permission correctness afterward
  • application connectivity afterward
  • expected data presence

PostgreSQL’s own backup verification tooling explicitly warns that verification cannot replace real test restores. (postgresql.org)

Monthly question

If we had to recover this month, would we already know the steps and timing?

14. Review role and privilege hygiene

At least monthly, review:

  • superusers
  • roles with CREATEROLE, CREATEDB, REPLICATION, or BYPASSRLS
  • application roles that own objects
  • stale service credentials
  • PUBLIC grants you do not want
  • schema create privileges
  • whether search_path is still sane for runtime roles

This is one of the easiest production-safety areas to neglect.

15. Review security basics

Check:

  • whether TLS expectations are still being met
  • whether pg_hba.conf is tighter than it needs to be, not looser
  • whether auth methods are current enough for your environment
  • whether backup storage and logs are still access-controlled properly
  • whether role separation between runtime, migration, and admin duties is intact

16. Review replication failover readiness

If you claim high availability, validate it periodically.

Check:

  • which standby would be promoted
  • whether routing would move correctly
  • whether the standby is caught up enough
  • whether the team knows the runbook
  • whether the old-primary recovery path is still understood

A standby that exists is not the same as a standby that is ready.

17. Review maintenance memory settings on busy systems

PostgreSQL’s resource docs note that maintenance_work_mem can improve operations like VACUUM, CREATE INDEX, restore work, and ALTER TABLE ADD FOREIGN KEY, but also warn that autovacuum can multiply memory usage across workers, which is why autovacuum_work_mem often deserves separate control. (postgresql.org)

This does not mean you change the settings every month. It means you review whether:

  • maintenance operations are slower than they should be
  • autovacuum workers are memory-starved
  • restore jobs are taking longer than expected
  • or the current values no longer match the workload

Quarterly PostgreSQL Production Checklist

These are the bigger operational health tasks.

18. Run a failover drill if HA matters

If the system claims:

  • high availability
  • replica promotion readiness
  • low recovery time

then you should test failover deliberately.

Practice:

  • promotion
  • app reconnection
  • routing changes
  • monitoring changes
  • old-primary fencing
  • old-primary rewind or rebuild process

Quarterly is not a hard law. But failover should be practiced often enough that it is not a first-time activity during a real outage.

19. Review schema drift and dangerous tables

Look for:

  • giant wide tables
  • fast-changing JSONB blobs
  • tables with weak primary access paths
  • missing tenant/account scoping in multi-tenant systems
  • child tables with missing foreign-key indexes
  • unbounded append tables without retention plans

Quarterly schema review is one of the best ways to prevent next-quarter incidents.

20. Review index quality

Ask:

  • which indexes are heavily used
  • which are redundant
  • which are bloated
  • which were added reactively and no longer matter
  • which hot queries still lack the right index
  • whether some broad indexes should become partial or compound indexes

This is also a good time to evaluate whether REINDEX is warranted for index bloat on specific large objects.

21. Review data-retention and archival behavior

For large systems, this is critical.

Check:

  • event/archive tables
  • deleted-but-never-purged rows
  • stale historical partitions
  • backup retention
  • WAL archive retention
  • whether legal/compliance retention still matches reality
  • whether old data is on the right storage tier or in the right system at all

A lot of PostgreSQL pain is really retention pain.

22. Review extension usage and operational dependencies

Quarterly, review:

  • installed extensions
  • version compatibility concerns
  • extension-specific maintenance needs
  • whether every installed extension is still necessary
  • whether extension upgrades need planning alongside PostgreSQL upgrades

This matters because production complexity often grows quietly through dependencies.

On Every Release or Deployment

Some maintenance work is event-driven, not calendar-driven.

23. Review migration safety before deploying schema changes

Before any significant schema change, ask:

  • will this rewrite a big table
  • will this block writes
  • do we need phased rollout
  • do we need backfill in batches
  • can it be added NOT VALID first
  • how will rollback or forward-fix work
  • how will it behave with replicas and failover

Production maintenance is not only what you do between releases. It is also how safely you change the system.

24. Re-check query regressions after deploys

After application releases, watch for:

  • new query patterns
  • new slow statements
  • increased call volume
  • new temp files
  • new lock waits
  • new connection pressure
  • higher latency on hot endpoints

A lot of PostgreSQL incidents are really post-release regressions.

On Every Incident

Incidents are also maintenance opportunities.

25. Capture the operational lesson before it fades

After every production issue, ask:

  • what metric would have caught this earlier
  • what log setting would have made diagnosis faster
  • what runbook step was missing
  • what query/index/schema issue made the incident worse
  • what checklist item should be added or tightened

This is how a production checklist gets better over time.

A Practical PostgreSQL Maintenance Schedule

Here is a compact version of the checklist by cadence.

Daily

  • confirm backups completed
  • check replication health
  • check disk/WAL/log growth
  • review slow queries and lock-wait signals
  • check connection pressure
  • confirm autovacuum is keeping up on hot tables

Weekly

  • review top expensive queries
  • review table/index growth trends
  • inspect temp file activity
  • inspect blocking patterns
  • review noisy warnings in logs
  • revisit outlier autovacuum tables

Monthly

  • perform a real restore test
  • review roles and privileges
  • review TLS/auth/backup-access basics
  • review HA failover readiness
  • revisit maintenance memory settings if operations are slow

Quarterly

  • run a failover drill
  • review schema drift and dangerous tables
  • review index quality and bloat candidates
  • review archival and retention behavior
  • review extension and versioning dependencies

What Not to Do

A good checklist is also about avoiding bad habits.

Do not assume autovacuum always means healthy vacuuming

Autovacuum running is not the same as autovacuum keeping up.

Do not assume backups are good because the job is green

A backup is not trustworthy until restore is proven.

Do not review only server CPU and call it monitoring

PostgreSQL’s own monitoring docs explicitly recommend pairing database stats with OS tools like ps, top, iostat, and vmstat. (postgresql.org)

Do not let logs stay unstructured if the system is large enough to need real analysis

PostgreSQL supports stderr, csvlog, and jsonlog, and structured formats are often much easier to analyze centrally. (postgresql.org)

Do not wait for quarterly reviews to notice basic disk or replication trouble

Some signals are too urgent for a slow cadence.

Do not let production maintenance depend on one person’s memory

Runbooks and repeatable checks matter.

Final Production Checklist Principle

The best PostgreSQL maintenance checklist is not the longest one.

It is the one that:

  • gets done consistently
  • catches drift early
  • proves recoverability
  • and keeps the system understandable under pressure

That usually means focusing first on:

  • vacuuming and statistics
  • backups and restore testing
  • monitoring and logs
  • replication and failover readiness
  • schema safety
  • and role/security hygiene

If those six areas are healthy, most PostgreSQL systems stay in much better shape.

PostgreSQL cluster

Explore the connected PostgreSQL guides around tuning, indexing, operations, schema design, scaling, and app integrations.

Pillar guide

PostgreSQL Performance Tuning: Complete Developer Guide

A practical PostgreSQL performance tuning guide for developers covering indexing, query plans, caching, connection pooling, vacuum, schema design, and troubleshooting with real examples.

View all PostgreSQL guides →

Related posts