Can PostgreSQL handle millions of rows?

Yes. PostgreSQL can handle millions of rows very well, but success depends on good schema design, indexes, query patterns, maintenance, and application behavior.

When should I partition a PostgreSQL table?

Partitioning usually starts making sense when a table becomes very large, queries mostly target one slice of the data, or operational tasks like retention and bulk deletes become painful on one giant table.

What is the biggest reason PostgreSQL slows down at scale?

Usually it is not raw row count alone. It is more often a combination of inefficient queries, weak indexes, deep offset pagination, wide rows, long transactions, stale statistics, or application-side N+1 patterns.

Do I need to shard PostgreSQL for millions of rows?

Usually no. Many systems can go very far with one PostgreSQL cluster if the schema, indexes, maintenance, and query patterns are designed well.

Back to Blog

How to Scale PostgreSQL for Millions of Rows

Data & Database Workflows

Apr 3, 2026·By Elysiate·Updated Apr 3, 2026·

postgresqldatabasesqlscalingperformancepartitioning

·

Level: intermediate · ~15 min read · Intent: informational

Audience: backend developers, database engineers, technical teams

Prerequisites

basic familiarity with PostgreSQL

Key takeaways

PostgreSQL can handle millions of rows well, but row count alone is not the real scaling problem. The real issues are query shape, index design, row width, write churn, maintenance health, and application access patterns.
The best way to scale PostgreSQL is usually not to jump straight to sharding. Most systems get much further by fixing query patterns, choosing better indexes, keeping hot tables lean, tuning autovacuum on churn-heavy tables, and using partitioning only when the workload truly benefits.

FAQ

Can PostgreSQL handle millions of rows?: Yes. PostgreSQL can handle millions of rows very well, but success depends on good schema design, indexes, query patterns, maintenance, and application behavior.
When should I partition a PostgreSQL table?: Partitioning usually starts making sense when a table becomes very large, queries mostly target one slice of the data, or operational tasks like retention and bulk deletes become painful on one giant table.
What is the biggest reason PostgreSQL slows down at scale?: Usually it is not raw row count alone. It is more often a combination of inefficient queries, weak indexes, deep offset pagination, wide rows, long transactions, stale statistics, or application-side N+1 patterns.
Do I need to shard PostgreSQL for millions of rows?: Usually no. Many systems can go very far with one PostgreSQL cluster if the schema, indexes, maintenance, and query patterns are designed well.

0

PostgreSQL does not become slow just because a table reaches a million rows.

Or ten million. Or much more.

That is one of the most important mindset shifts for developers who are starting to worry about scale.

The real question is not:

“Can PostgreSQL handle millions of rows?”

It is:

“Can my schema, queries, indexes, and application behavior handle millions of rows efficiently?”

That difference matters because many teams reach for the wrong solution too early. They think they need:

sharding
a database switch
aggressive denormalization
or heavy infrastructure changes

when the real fixes are usually much more practical:

better indexes
smaller result sets
better pagination
better schema design
healthier vacuuming
and fewer wasteful query patterns

This guide focuses on those practical fixes.

1. Understand What “Scaling” Actually Means

Scaling PostgreSQL is not one problem. It is a collection of problems that appear as data and traffic grow.

Common examples include:

reads getting slower because scans touch too many rows
writes getting slower because hot tables churn heavily
sorts and joins becoming more expensive
indexes growing large enough to change cache behavior
autovacuum struggling on update-heavy tables
APIs using offset pagination too long
application code issuing far too many queries
retention and archival tasks becoming operationally painful

So when you say:

“We need PostgreSQL to scale,”

you really need to ask:

scale for what?
reads?
writes?
time-series growth?
tenant growth?
event ingestion?
reporting?
mixed transactional traffic?

That clarification changes almost every decision after it.

2. Row Count Is Not the Main Bottleneck

A table with millions of rows can be perfectly healthy if:

the important queries are selective
the right indexes exist
the hot working set fits memory reasonably well
maintenance keeps up
and the app is not forcing PostgreSQL to do unnecessary work

A much smaller table can still be painful if:

every request scans too much of it
the result shape is too wide
the query pattern is chatty
or write churn keeps generating dead tuples faster than maintenance can clean them up

Better question

Instead of asking:

“How many rows is too many?”

ask:

“How many rows does this specific query have to touch?”
“How many rows does this API return?”
“How much of this table is hot?”
“How much of this table changes constantly?”

That is usually where the scaling truth actually lives.

3. Design Tables Around Real Query Patterns

A PostgreSQL table that scales well usually starts with one simple habit: designing around real query patterns instead of only theoretical data models.

For each major table, you should know:

the most common filter columns
the most common order-by columns
the most common join path
whether the workload is read-heavy or write-heavy
and whether the hot path needs only a few columns or many

Example

If the common query is:

select id, status, created_at
from orders
where account_id = $1
order by created_at desc
limit 20;

that implies:

account_id matters for filtering
created_at matters for sorting
a multicolumn index may matter
and the endpoint probably does not need select *

This sounds basic, but it is one of the biggest reasons some systems scale cleanly and others do not.

4. Keep Hot Tables Narrow

Wide rows become more expensive as tables grow.

They affect:

cache efficiency
heap access cost
update churn
table bloat impact
and how much useful data fits in memory

Common sources of wide rows

large text columns in hot tables
oversized JSONB payloads
blob-like binary fields
too many rarely used columns
denormalized fields that made one query easier but made every row heavier

Better pattern

Keep high-frequency transactional tables lean.

Move bulky or infrequently read data into:

companion tables
archive tables
or clearly separate document-style tables when appropriate

At scale, one lean hot table is often worth far more than one “convenient” everything-table.

5. Use the Right Data Types

Bad type choices compound as rows grow.

Good defaults for large systems often look like:

integer for ordinary whole numbers
bigint only when the range really requires it
text for most strings
date for date-only values
timestamptz for real timestamps
boolean for true/false state
jsonb only where flexible structure is genuinely needed

Common mistakes that hurt at scale

using numeric for ordinary counts
using bigint everywhere without reason
storing flags as strings
burying relational values inside JSONB
using timestamps where dates would be clearer

At millions of rows, schema sloppiness becomes physical cost.

6. Build Better Indexes, Not Just More Indexes

Indexes are one of the biggest scaling tools PostgreSQL gives you. They are also one of the biggest sources of write overhead if used carelessly.

The right question is not:

“Should we add more indexes?”

It is:

“Which repeated query shapes need a better access path?”

Start with B-tree for most scalar queries

B-tree is the default for a reason. It is the best general-purpose index for:

equality
ranges
many ordered lookups
and common joins

Use multicolumn indexes deliberately

For a query like:

select *
from events
where account_id = $1
order by created_at desc
limit 50;

a better index is often:

create index idx_events_account_created
on events (account_id, created_at desc);

not two unrelated single-column indexes.

Use GIN for the right kinds of data

For:

jsonb
arrays
full-text search

GIN is often the right choice.

Use BRIN on huge append-heavy tables

For very large time-based tables where values correlate with physical row order, BRIN can be extremely useful because it stays much smaller than B-tree and still helps prune large scans effectively.

Important warning

Every index has a write cost.

On hot tables, overindexing can become part of the scaling problem.

7. Fix Query Shape Before Reaching for Infrastructure

Many scaling issues are query-shape issues first.

That includes:

scanning too many rows
sorting too many rows
returning too many columns
joining too broadly
forcing PostgreSQL to do deep offset pagination
or hiding key filters inside expressions

Better habits

avoid select * in production paths
reduce row counts earlier
filter before joining where possible
return only the columns the API actually uses
avoid making PostgreSQL sort giant result sets that will mostly be thrown away

A lot of “PostgreSQL scaling” is really “stop asking PostgreSQL to do unnecessary work.”

8. Replace Deep OFFSET Pagination With Keyset Pagination

One of the most common large-table scaling mistakes is keeping offset pagination long after the table stopped being small.

Weak pattern

select id, created_at, title
from posts
order by created_at desc
offset 50000
limit 20;

This gets increasingly expensive because PostgreSQL still has to move through the skipped rows.

Better pattern

select id, created_at, title
from posts
where created_at < $1
order by created_at desc
limit 20;

This is keyset pagination.

Why it helps

It lets PostgreSQL continue from a known position instead of scanning deeper and deeper into the table just to discard rows.

For feeds, event streams, dashboards, and large APIs, this is one of the highest-value query changes you can make.

9. Watch for N+1 Query Patterns

Sometimes PostgreSQL looks overwhelmed not because any single query is terrible, but because the application is emitting far too many queries.

This is extremely common in ORM-heavy systems.

Typical pattern

fetch 100 parent rows
fetch each child row set separately
fetch counts separately
fetch metadata separately

The database then sees:

one request
and hundreds of statements

Better approach

preload related data where appropriate
batch lookups
use joins when the result shape fits
track query count per request in development and staging
inspect the SQL the ORM is actually generating

A system with millions of rows often feels fine until the application multiplies the work per request.

10. Keep Autovacuum Healthy

At larger row counts, maintenance stops being background trivia. It becomes part of performance.

PostgreSQL’s own vacuuming docs say regular vacuuming is necessary to:

recover or reuse space from updated and deleted rows
update planner statistics
update the visibility map for index-only scans
and protect against wraparound risks

That is why scaling a large PostgreSQL system without healthy vacuuming is not realistic.

What to watch

dead tuple accumulation
autovacuum frequency on hot tables
tables with heavy update/delete churn
long transactions that delay cleanup
analyze frequency after large changes

Important point

The goal is not:

“turn autovacuum off and manage everything manually”

The goal is:

“make sure autovacuum can keep up, and tune the tables that are much hotter than average”

11. Keep Statistics Fresh

As tables grow, bad row estimates become more painful.

The planner depends on statistics to choose between:

sequential scans
index scans
hash joins
nested loops
sorts
and more

If statistics are stale, PostgreSQL can make poor plan choices that were not visible when the table was small.

Good practice

make sure analyze is happening normally
run ANALYZE after major data distribution changes where needed
inspect estimated vs actual rows in EXPLAIN ANALYZE
treat large estimate mismatches as a real signal

Scaling to millions of rows without good statistics is like driving with a distorted map.

12. Use Partitioning Only When the Workload Justifies It

Partitioning is powerful. It is not the first answer to every large-table problem.

The PostgreSQL docs say partitioning can help when:

heavily accessed rows are concentrated in one or a few partitions
queries or updates access a large percentage of a single partition
or bulk loads/deletes benefit from adding or removing partitions

The docs also note a rough rule of thumb: partitioning tends to become worth considering when a table is so large it exceeds physical memory, though the exact point depends on the application. :contentReference[oaicite:1]{index=1}

Good candidates for partitioning

logs
event tables
time-series data
large append-heavy history tables
retention-driven datasets
large tenant- or region-segmented workloads

Weak reasons to partition

“the table is getting big”
“it feels like good architecture”
“we might need it someday”

Partitioning adds complexity:

schema management
index management
migration planning
operational routines

Use it when the workload makes the benefits real.

13. Use Replicas for Read Distribution When It Actually Helps

Once read traffic grows, one common scaling move is to introduce read replicas.

This makes the most sense when:

reads dominate writes
read latency matters a lot
some workloads can tolerate replica lag
analytical or user-facing reads can be separated from primary write pressure

Good use cases

reporting dashboards
search-heavy read APIs
historical exploration
user-facing read traffic that does not require strict write-after-read guarantees

Important warning

A replica is not a universal solution.

It does not fix:

bad query shape
missing indexes
deep pagination
N+1 query patterns
or hot write-table design

Scale the workload first. Then distribute it where appropriate.

14. Tune Connection Behavior

A growing application often scales out its app servers faster than it scales its database behavior.

That causes:

too many open sessions
bursty connection creation
pool saturation
and database coordination overhead

Better habits

use connection pooling
understand framework defaults
cap connection growth
keep transactions short so connections return quickly
monitor pool wait times, not just query times

A connection problem can feel like a database performance problem even when query execution is not the true bottleneck.

15. Split Operational and Analytical Workloads When Needed

As datasets grow, one schema and one set of queries often cannot serve every use case equally well.

That is normal.

Transactional tables are optimized for:

correctness
fast writes
predictable API reads
constraints and relationships

Analytical or reporting workloads often want:

larger scans
different aggregation paths
summary tables
materialized views
denormalized projections
or separate pipelines

Do not force one hot transactional table to be perfect for:

app writes
user APIs
internal reports
admin search
and deep analytics

That is how large systems become awkward.

16. Measure the Right Things

If you want PostgreSQL to scale well, measure beyond raw CPU.

Useful things to monitor include:

top expensive queries
top frequent queries
query count per request
lock waits
autovacuum behavior
table and index size growth
I/O pressure
connection pool saturation
replication lag if replicas exist
temp file creation for sorts/hashes
transaction duration

At millions of rows, vague monitoring becomes expensive quickly.

17. Do Not Jump to Sharding Too Early

A lot of teams use “millions of rows” as shorthand for:

“we probably need sharding”

Usually they do not.

Many systems can go very far on one PostgreSQL cluster with:

better indexes
better query design
better pagination
healthier vacuuming
better schema design
smarter read distribution
and more disciplined application behavior

Sharding is real. It is also one of the most expensive complexity choices you can make.

You want to earn your way into it, not panic your way into it.

A Practical Scaling Checklist

If your PostgreSQL tables are moving into the millions of rows, use this checklist:

Identify the real hot queries
Run EXPLAIN ANALYZE on them
Check whether the right indexes exist
Replace deep offset pagination
Reduce wide-row and select * patterns
Check autovacuum and analyze health
Watch dead tuples and write churn
Review connection pool behavior
Decide whether read replicas help the actual workload
Consider partitioning only if the table is truly large enough and the access pattern fits

That checklist usually gets you much further than dramatic architecture changes.

Conclusion

Scaling PostgreSQL for millions of rows is mostly about staying disciplined.

That means:

designing around real query patterns
choosing better data types
keeping hot tables lean
building the right indexes
reducing scanned rows
fixing pagination
preventing write churn from overwhelming maintenance
and introducing partitioning or replicas only when the workload clearly justifies them

The biggest mistake is treating raw row count like the enemy.

Most of the time, the real enemy is inefficient work.

PostgreSQL can handle millions of rows well. Your job is to make sure the application and schema let it.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

View author profile Read editorial policy

Free, privacy-first utilities in your browser — no uploads required for most workflows.

PostgreSQL cluster

Explore the connected PostgreSQL guides around tuning, indexing, operations, schema design, scaling, and app integrations.

Pillar guide

PostgreSQL Performance Tuning: Complete Developer Guide

A practical PostgreSQL performance tuning guide for developers covering indexing, query plans, caching, connection pooling, vacuum, schema design, and troubleshooting with real examples.

View all PostgreSQL guides →

How to Scale PostgreSQL for Millions of Rows

Prerequisites

Key takeaways

FAQ

1. Understand What “Scaling” Actually Means

2. Row Count Is Not the Main Bottleneck

Better question

3. Design Tables Around Real Query Patterns

Example

4. Keep Hot Tables Narrow

Common sources of wide rows

Better pattern

5. Use the Right Data Types

Common mistakes that hurt at scale

6. Build Better Indexes, Not Just More Indexes

Start with B-tree for most scalar queries

Use multicolumn indexes deliberately

Use GIN for the right kinds of data

Use BRIN on huge append-heavy tables

Important warning

7. Fix Query Shape Before Reaching for Infrastructure

Better habits

8. Replace Deep OFFSET Pagination With Keyset Pagination

Weak pattern

Better pattern

Why it helps

9. Watch for N+1 Query Patterns

Typical pattern

Better approach

10. Keep Autovacuum Healthy

What to watch

Important point

11. Keep Statistics Fresh

Good practice

12. Use Partitioning Only When the Workload Justifies It

Good candidates for partitioning

Weak reasons to partition

13. Use Replicas for Read Distribution When It Actually Helps

Good use cases

Important warning

14. Tune Connection Behavior

Better habits

15. Split Operational and Analytical Workloads When Needed

16. Measure the Right Things

17. Do Not Jump to Sharding Too Early

A Practical Scaling Checklist

Conclusion

About the author

Use these tools

PostgreSQL cluster

Related posts