How to Scale PostgreSQL for Millions of Rows

·Updated Apr 3, 2026·
postgresqldatabasesqlscalingperformancepartitioning
·

Level: intermediate · ~15 min read · Intent: informational

Audience: backend developers, database engineers, technical teams

Prerequisites

  • basic familiarity with PostgreSQL

Key takeaways

  • PostgreSQL can handle millions of rows well, but row count alone is not the real scaling problem. The real issues are query shape, index design, row width, write churn, maintenance health, and application access patterns.
  • The best way to scale PostgreSQL is usually not to jump straight to sharding. Most systems get much further by fixing query patterns, choosing better indexes, keeping hot tables lean, tuning autovacuum on churn-heavy tables, and using partitioning only when the workload truly benefits.

FAQ

Can PostgreSQL handle millions of rows?
Yes. PostgreSQL can handle millions of rows very well, but success depends on good schema design, indexes, query patterns, maintenance, and application behavior.
When should I partition a PostgreSQL table?
Partitioning usually starts making sense when a table becomes very large, queries mostly target one slice of the data, or operational tasks like retention and bulk deletes become painful on one giant table.
What is the biggest reason PostgreSQL slows down at scale?
Usually it is not raw row count alone. It is more often a combination of inefficient queries, weak indexes, deep offset pagination, wide rows, long transactions, stale statistics, or application-side N+1 patterns.
Do I need to shard PostgreSQL for millions of rows?
Usually no. Many systems can go very far with one PostgreSQL cluster if the schema, indexes, maintenance, and query patterns are designed well.
0

PostgreSQL does not become slow just because a table reaches a million rows.

Or ten million. Or much more.

That is one of the most important mindset shifts for developers who are starting to worry about scale.

The real question is not:

  • “Can PostgreSQL handle millions of rows?”

It is:

  • “Can my schema, queries, indexes, and application behavior handle millions of rows efficiently?”

That difference matters because many teams reach for the wrong solution too early. They think they need:

  • sharding
  • a database switch
  • aggressive denormalization
  • or heavy infrastructure changes

when the real fixes are usually much more practical:

  • better indexes
  • smaller result sets
  • better pagination
  • better schema design
  • healthier vacuuming
  • and fewer wasteful query patterns

This guide focuses on those practical fixes.

1. Understand What “Scaling” Actually Means

Scaling PostgreSQL is not one problem. It is a collection of problems that appear as data and traffic grow.

Common examples include:

  • reads getting slower because scans touch too many rows
  • writes getting slower because hot tables churn heavily
  • sorts and joins becoming more expensive
  • indexes growing large enough to change cache behavior
  • autovacuum struggling on update-heavy tables
  • APIs using offset pagination too long
  • application code issuing far too many queries
  • retention and archival tasks becoming operationally painful

So when you say:

  • “We need PostgreSQL to scale,”

you really need to ask:

  • scale for what?
  • reads?
  • writes?
  • time-series growth?
  • tenant growth?
  • event ingestion?
  • reporting?
  • mixed transactional traffic?

That clarification changes almost every decision after it.

2. Row Count Is Not the Main Bottleneck

A table with millions of rows can be perfectly healthy if:

  • the important queries are selective
  • the right indexes exist
  • the hot working set fits memory reasonably well
  • maintenance keeps up
  • and the app is not forcing PostgreSQL to do unnecessary work

A much smaller table can still be painful if:

  • every request scans too much of it
  • the result shape is too wide
  • the query pattern is chatty
  • or write churn keeps generating dead tuples faster than maintenance can clean them up

Better question

Instead of asking:

  • “How many rows is too many?”

ask:

  • “How many rows does this specific query have to touch?”
  • “How many rows does this API return?”
  • “How much of this table is hot?”
  • “How much of this table changes constantly?”

That is usually where the scaling truth actually lives.

3. Design Tables Around Real Query Patterns

A PostgreSQL table that scales well usually starts with one simple habit: designing around real query patterns instead of only theoretical data models.

For each major table, you should know:

  • the most common filter columns
  • the most common order-by columns
  • the most common join path
  • whether the workload is read-heavy or write-heavy
  • and whether the hot path needs only a few columns or many

Example

If the common query is:

select id, status, created_at
from orders
where account_id = $1
order by created_at desc
limit 20;

that implies:

  • account_id matters for filtering
  • created_at matters for sorting
  • a multicolumn index may matter
  • and the endpoint probably does not need select *

This sounds basic, but it is one of the biggest reasons some systems scale cleanly and others do not.

4. Keep Hot Tables Narrow

Wide rows become more expensive as tables grow.

They affect:

  • cache efficiency
  • heap access cost
  • update churn
  • table bloat impact
  • and how much useful data fits in memory

Common sources of wide rows

  • large text columns in hot tables
  • oversized JSONB payloads
  • blob-like binary fields
  • too many rarely used columns
  • denormalized fields that made one query easier but made every row heavier

Better pattern

Keep high-frequency transactional tables lean.

Move bulky or infrequently read data into:

  • companion tables
  • archive tables
  • or clearly separate document-style tables when appropriate

At scale, one lean hot table is often worth far more than one “convenient” everything-table.

5. Use the Right Data Types

Bad type choices compound as rows grow.

Good defaults for large systems often look like:

  • integer for ordinary whole numbers
  • bigint only when the range really requires it
  • text for most strings
  • date for date-only values
  • timestamptz for real timestamps
  • boolean for true/false state
  • jsonb only where flexible structure is genuinely needed

Common mistakes that hurt at scale

  • using numeric for ordinary counts
  • using bigint everywhere without reason
  • storing flags as strings
  • burying relational values inside JSONB
  • using timestamps where dates would be clearer

At millions of rows, schema sloppiness becomes physical cost.

6. Build Better Indexes, Not Just More Indexes

Indexes are one of the biggest scaling tools PostgreSQL gives you. They are also one of the biggest sources of write overhead if used carelessly.

The right question is not:

  • “Should we add more indexes?”

It is:

  • “Which repeated query shapes need a better access path?”

Start with B-tree for most scalar queries

B-tree is the default for a reason. It is the best general-purpose index for:

  • equality
  • ranges
  • many ordered lookups
  • and common joins

Use multicolumn indexes deliberately

For a query like:

select *
from events
where account_id = $1
order by created_at desc
limit 50;

a better index is often:

create index idx_events_account_created
on events (account_id, created_at desc);

not two unrelated single-column indexes.

Use GIN for the right kinds of data

For:

  • jsonb
  • arrays
  • full-text search

GIN is often the right choice.

Use BRIN on huge append-heavy tables

For very large time-based tables where values correlate with physical row order, BRIN can be extremely useful because it stays much smaller than B-tree and still helps prune large scans effectively.

Important warning

Every index has a write cost.

On hot tables, overindexing can become part of the scaling problem.

7. Fix Query Shape Before Reaching for Infrastructure

Many scaling issues are query-shape issues first.

That includes:

  • scanning too many rows
  • sorting too many rows
  • returning too many columns
  • joining too broadly
  • forcing PostgreSQL to do deep offset pagination
  • or hiding key filters inside expressions

Better habits

  • avoid select * in production paths
  • reduce row counts earlier
  • filter before joining where possible
  • return only the columns the API actually uses
  • avoid making PostgreSQL sort giant result sets that will mostly be thrown away

A lot of “PostgreSQL scaling” is really “stop asking PostgreSQL to do unnecessary work.”

8. Replace Deep OFFSET Pagination With Keyset Pagination

One of the most common large-table scaling mistakes is keeping offset pagination long after the table stopped being small.

Weak pattern

select id, created_at, title
from posts
order by created_at desc
offset 50000
limit 20;

This gets increasingly expensive because PostgreSQL still has to move through the skipped rows.

Better pattern

select id, created_at, title
from posts
where created_at < $1
order by created_at desc
limit 20;

This is keyset pagination.

Why it helps

It lets PostgreSQL continue from a known position instead of scanning deeper and deeper into the table just to discard rows.

For feeds, event streams, dashboards, and large APIs, this is one of the highest-value query changes you can make.

9. Watch for N+1 Query Patterns

Sometimes PostgreSQL looks overwhelmed not because any single query is terrible, but because the application is emitting far too many queries.

This is extremely common in ORM-heavy systems.

Typical pattern

  • fetch 100 parent rows
  • fetch each child row set separately
  • fetch counts separately
  • fetch metadata separately

The database then sees:

  • one request
  • and hundreds of statements

Better approach

  • preload related data where appropriate
  • batch lookups
  • use joins when the result shape fits
  • track query count per request in development and staging
  • inspect the SQL the ORM is actually generating

A system with millions of rows often feels fine until the application multiplies the work per request.

10. Keep Autovacuum Healthy

At larger row counts, maintenance stops being background trivia. It becomes part of performance.

PostgreSQL’s own vacuuming docs say regular vacuuming is necessary to:

  • recover or reuse space from updated and deleted rows
  • update planner statistics
  • update the visibility map for index-only scans
  • and protect against wraparound risks

That is why scaling a large PostgreSQL system without healthy vacuuming is not realistic.

What to watch

  • dead tuple accumulation
  • autovacuum frequency on hot tables
  • tables with heavy update/delete churn
  • long transactions that delay cleanup
  • analyze frequency after large changes

Important point

The goal is not:

  • “turn autovacuum off and manage everything manually”

The goal is:

  • “make sure autovacuum can keep up, and tune the tables that are much hotter than average”

11. Keep Statistics Fresh

As tables grow, bad row estimates become more painful.

The planner depends on statistics to choose between:

  • sequential scans
  • index scans
  • hash joins
  • nested loops
  • sorts
  • and more

If statistics are stale, PostgreSQL can make poor plan choices that were not visible when the table was small.

Good practice

  • make sure analyze is happening normally
  • run ANALYZE after major data distribution changes where needed
  • inspect estimated vs actual rows in EXPLAIN ANALYZE
  • treat large estimate mismatches as a real signal

Scaling to millions of rows without good statistics is like driving with a distorted map.

12. Use Partitioning Only When the Workload Justifies It

Partitioning is powerful. It is not the first answer to every large-table problem.

The PostgreSQL docs say partitioning can help when:

  • heavily accessed rows are concentrated in one or a few partitions
  • queries or updates access a large percentage of a single partition
  • or bulk loads/deletes benefit from adding or removing partitions

The docs also note a rough rule of thumb: partitioning tends to become worth considering when a table is so large it exceeds physical memory, though the exact point depends on the application. :contentReference[oaicite:1]{index=1}

Good candidates for partitioning

  • logs
  • event tables
  • time-series data
  • large append-heavy history tables
  • retention-driven datasets
  • large tenant- or region-segmented workloads

Weak reasons to partition

  • “the table is getting big”
  • “it feels like good architecture”
  • “we might need it someday”

Partitioning adds complexity:

  • schema management
  • index management
  • migration planning
  • operational routines

Use it when the workload makes the benefits real.

13. Use Replicas for Read Distribution When It Actually Helps

Once read traffic grows, one common scaling move is to introduce read replicas.

This makes the most sense when:

  • reads dominate writes
  • read latency matters a lot
  • some workloads can tolerate replica lag
  • analytical or user-facing reads can be separated from primary write pressure

Good use cases

  • reporting dashboards
  • search-heavy read APIs
  • historical exploration
  • user-facing read traffic that does not require strict write-after-read guarantees

Important warning

A replica is not a universal solution.

It does not fix:

  • bad query shape
  • missing indexes
  • deep pagination
  • N+1 query patterns
  • or hot write-table design

Scale the workload first. Then distribute it where appropriate.

14. Tune Connection Behavior

A growing application often scales out its app servers faster than it scales its database behavior.

That causes:

  • too many open sessions
  • bursty connection creation
  • pool saturation
  • and database coordination overhead

Better habits

  • use connection pooling
  • understand framework defaults
  • cap connection growth
  • keep transactions short so connections return quickly
  • monitor pool wait times, not just query times

A connection problem can feel like a database performance problem even when query execution is not the true bottleneck.

15. Split Operational and Analytical Workloads When Needed

As datasets grow, one schema and one set of queries often cannot serve every use case equally well.

That is normal.

Transactional tables are optimized for:

  • correctness
  • fast writes
  • predictable API reads
  • constraints and relationships

Analytical or reporting workloads often want:

  • larger scans
  • different aggregation paths
  • summary tables
  • materialized views
  • denormalized projections
  • or separate pipelines

Do not force one hot transactional table to be perfect for:

  • app writes
  • user APIs
  • internal reports
  • admin search
  • and deep analytics

That is how large systems become awkward.

16. Measure the Right Things

If you want PostgreSQL to scale well, measure beyond raw CPU.

Useful things to monitor include:

  • top expensive queries
  • top frequent queries
  • query count per request
  • lock waits
  • autovacuum behavior
  • table and index size growth
  • I/O pressure
  • connection pool saturation
  • replication lag if replicas exist
  • temp file creation for sorts/hashes
  • transaction duration

At millions of rows, vague monitoring becomes expensive quickly.

17. Do Not Jump to Sharding Too Early

A lot of teams use “millions of rows” as shorthand for:

  • “we probably need sharding”

Usually they do not.

Many systems can go very far on one PostgreSQL cluster with:

  • better indexes
  • better query design
  • better pagination
  • healthier vacuuming
  • better schema design
  • smarter read distribution
  • and more disciplined application behavior

Sharding is real. It is also one of the most expensive complexity choices you can make.

You want to earn your way into it, not panic your way into it.

A Practical Scaling Checklist

If your PostgreSQL tables are moving into the millions of rows, use this checklist:

  1. Identify the real hot queries
  2. Run EXPLAIN ANALYZE on them
  3. Check whether the right indexes exist
  4. Replace deep offset pagination
  5. Reduce wide-row and select * patterns
  6. Check autovacuum and analyze health
  7. Watch dead tuples and write churn
  8. Review connection pool behavior
  9. Decide whether read replicas help the actual workload
  10. Consider partitioning only if the table is truly large enough and the access pattern fits

That checklist usually gets you much further than dramatic architecture changes.

Conclusion

Scaling PostgreSQL for millions of rows is mostly about staying disciplined.

That means:

  • designing around real query patterns
  • choosing better data types
  • keeping hot tables lean
  • building the right indexes
  • reducing scanned rows
  • fixing pagination
  • preventing write churn from overwhelming maintenance
  • and introducing partitioning or replicas only when the workload clearly justifies them

The biggest mistake is treating raw row count like the enemy.

Most of the time, the real enemy is inefficient work.

PostgreSQL can handle millions of rows well. Your job is to make sure the application and schema let it.

PostgreSQL cluster

Explore the connected PostgreSQL guides around tuning, indexing, operations, schema design, scaling, and app integrations.

Pillar guide

PostgreSQL Performance Tuning: Complete Developer Guide

A practical PostgreSQL performance tuning guide for developers covering indexing, query plans, caching, connection pooling, vacuum, schema design, and troubleshooting with real examples.

View all PostgreSQL guides →

Related posts