How to Scale PostgreSQL for Millions of Rows
Level: intermediate · ~15 min read · Intent: informational
Audience: backend developers, database engineers, technical teams
Prerequisites
- basic familiarity with PostgreSQL
Key takeaways
- PostgreSQL can handle millions of rows well, but row count alone is not the real scaling problem. The real issues are query shape, index design, row width, write churn, maintenance health, and application access patterns.
- The best way to scale PostgreSQL is usually not to jump straight to sharding. Most systems get much further by fixing query patterns, choosing better indexes, keeping hot tables lean, tuning autovacuum on churn-heavy tables, and using partitioning only when the workload truly benefits.
FAQ
- Can PostgreSQL handle millions of rows?
- Yes. PostgreSQL can handle millions of rows very well, but success depends on good schema design, indexes, query patterns, maintenance, and application behavior.
- When should I partition a PostgreSQL table?
- Partitioning usually starts making sense when a table becomes very large, queries mostly target one slice of the data, or operational tasks like retention and bulk deletes become painful on one giant table.
- What is the biggest reason PostgreSQL slows down at scale?
- Usually it is not raw row count alone. It is more often a combination of inefficient queries, weak indexes, deep offset pagination, wide rows, long transactions, stale statistics, or application-side N+1 patterns.
- Do I need to shard PostgreSQL for millions of rows?
- Usually no. Many systems can go very far with one PostgreSQL cluster if the schema, indexes, maintenance, and query patterns are designed well.
PostgreSQL does not become slow just because a table reaches a million rows.
Or ten million. Or much more.
That is one of the most important mindset shifts for developers who are starting to worry about scale.
The real question is not:
- “Can PostgreSQL handle millions of rows?”
It is:
- “Can my schema, queries, indexes, and application behavior handle millions of rows efficiently?”
That difference matters because many teams reach for the wrong solution too early. They think they need:
- sharding
- a database switch
- aggressive denormalization
- or heavy infrastructure changes
when the real fixes are usually much more practical:
- better indexes
- smaller result sets
- better pagination
- better schema design
- healthier vacuuming
- and fewer wasteful query patterns
This guide focuses on those practical fixes.
1. Understand What “Scaling” Actually Means
Scaling PostgreSQL is not one problem. It is a collection of problems that appear as data and traffic grow.
Common examples include:
- reads getting slower because scans touch too many rows
- writes getting slower because hot tables churn heavily
- sorts and joins becoming more expensive
- indexes growing large enough to change cache behavior
- autovacuum struggling on update-heavy tables
- APIs using offset pagination too long
- application code issuing far too many queries
- retention and archival tasks becoming operationally painful
So when you say:
- “We need PostgreSQL to scale,”
you really need to ask:
- scale for what?
- reads?
- writes?
- time-series growth?
- tenant growth?
- event ingestion?
- reporting?
- mixed transactional traffic?
That clarification changes almost every decision after it.
2. Row Count Is Not the Main Bottleneck
A table with millions of rows can be perfectly healthy if:
- the important queries are selective
- the right indexes exist
- the hot working set fits memory reasonably well
- maintenance keeps up
- and the app is not forcing PostgreSQL to do unnecessary work
A much smaller table can still be painful if:
- every request scans too much of it
- the result shape is too wide
- the query pattern is chatty
- or write churn keeps generating dead tuples faster than maintenance can clean them up
Better question
Instead of asking:
- “How many rows is too many?”
ask:
- “How many rows does this specific query have to touch?”
- “How many rows does this API return?”
- “How much of this table is hot?”
- “How much of this table changes constantly?”
That is usually where the scaling truth actually lives.
3. Design Tables Around Real Query Patterns
A PostgreSQL table that scales well usually starts with one simple habit: designing around real query patterns instead of only theoretical data models.
For each major table, you should know:
- the most common filter columns
- the most common order-by columns
- the most common join path
- whether the workload is read-heavy or write-heavy
- and whether the hot path needs only a few columns or many
Example
If the common query is:
select id, status, created_at
from orders
where account_id = $1
order by created_at desc
limit 20;
that implies:
account_idmatters for filteringcreated_atmatters for sorting- a multicolumn index may matter
- and the endpoint probably does not need
select *
This sounds basic, but it is one of the biggest reasons some systems scale cleanly and others do not.
4. Keep Hot Tables Narrow
Wide rows become more expensive as tables grow.
They affect:
- cache efficiency
- heap access cost
- update churn
- table bloat impact
- and how much useful data fits in memory
Common sources of wide rows
- large text columns in hot tables
- oversized JSONB payloads
- blob-like binary fields
- too many rarely used columns
- denormalized fields that made one query easier but made every row heavier
Better pattern
Keep high-frequency transactional tables lean.
Move bulky or infrequently read data into:
- companion tables
- archive tables
- or clearly separate document-style tables when appropriate
At scale, one lean hot table is often worth far more than one “convenient” everything-table.
5. Use the Right Data Types
Bad type choices compound as rows grow.
Good defaults for large systems often look like:
integerfor ordinary whole numbersbigintonly when the range really requires ittextfor most stringsdatefor date-only valuestimestamptzfor real timestampsbooleanfor true/false statejsonbonly where flexible structure is genuinely needed
Common mistakes that hurt at scale
- using
numericfor ordinary counts - using
biginteverywhere without reason - storing flags as strings
- burying relational values inside JSONB
- using timestamps where dates would be clearer
At millions of rows, schema sloppiness becomes physical cost.
6. Build Better Indexes, Not Just More Indexes
Indexes are one of the biggest scaling tools PostgreSQL gives you. They are also one of the biggest sources of write overhead if used carelessly.
The right question is not:
- “Should we add more indexes?”
It is:
- “Which repeated query shapes need a better access path?”
Start with B-tree for most scalar queries
B-tree is the default for a reason. It is the best general-purpose index for:
- equality
- ranges
- many ordered lookups
- and common joins
Use multicolumn indexes deliberately
For a query like:
select *
from events
where account_id = $1
order by created_at desc
limit 50;
a better index is often:
create index idx_events_account_created
on events (account_id, created_at desc);
not two unrelated single-column indexes.
Use GIN for the right kinds of data
For:
jsonb- arrays
- full-text search
GIN is often the right choice.
Use BRIN on huge append-heavy tables
For very large time-based tables where values correlate with physical row order, BRIN can be extremely useful because it stays much smaller than B-tree and still helps prune large scans effectively.
Important warning
Every index has a write cost.
On hot tables, overindexing can become part of the scaling problem.
7. Fix Query Shape Before Reaching for Infrastructure
Many scaling issues are query-shape issues first.
That includes:
- scanning too many rows
- sorting too many rows
- returning too many columns
- joining too broadly
- forcing PostgreSQL to do deep offset pagination
- or hiding key filters inside expressions
Better habits
- avoid
select *in production paths - reduce row counts earlier
- filter before joining where possible
- return only the columns the API actually uses
- avoid making PostgreSQL sort giant result sets that will mostly be thrown away
A lot of “PostgreSQL scaling” is really “stop asking PostgreSQL to do unnecessary work.”
8. Replace Deep OFFSET Pagination With Keyset Pagination
One of the most common large-table scaling mistakes is keeping offset pagination long after the table stopped being small.
Weak pattern
select id, created_at, title
from posts
order by created_at desc
offset 50000
limit 20;
This gets increasingly expensive because PostgreSQL still has to move through the skipped rows.
Better pattern
select id, created_at, title
from posts
where created_at < $1
order by created_at desc
limit 20;
This is keyset pagination.
Why it helps
It lets PostgreSQL continue from a known position instead of scanning deeper and deeper into the table just to discard rows.
For feeds, event streams, dashboards, and large APIs, this is one of the highest-value query changes you can make.
9. Watch for N+1 Query Patterns
Sometimes PostgreSQL looks overwhelmed not because any single query is terrible, but because the application is emitting far too many queries.
This is extremely common in ORM-heavy systems.
Typical pattern
- fetch 100 parent rows
- fetch each child row set separately
- fetch counts separately
- fetch metadata separately
The database then sees:
- one request
- and hundreds of statements
Better approach
- preload related data where appropriate
- batch lookups
- use joins when the result shape fits
- track query count per request in development and staging
- inspect the SQL the ORM is actually generating
A system with millions of rows often feels fine until the application multiplies the work per request.
10. Keep Autovacuum Healthy
At larger row counts, maintenance stops being background trivia. It becomes part of performance.
PostgreSQL’s own vacuuming docs say regular vacuuming is necessary to:
- recover or reuse space from updated and deleted rows
- update planner statistics
- update the visibility map for index-only scans
- and protect against wraparound risks
That is why scaling a large PostgreSQL system without healthy vacuuming is not realistic.
What to watch
- dead tuple accumulation
- autovacuum frequency on hot tables
- tables with heavy update/delete churn
- long transactions that delay cleanup
- analyze frequency after large changes
Important point
The goal is not:
- “turn autovacuum off and manage everything manually”
The goal is:
- “make sure autovacuum can keep up, and tune the tables that are much hotter than average”
11. Keep Statistics Fresh
As tables grow, bad row estimates become more painful.
The planner depends on statistics to choose between:
- sequential scans
- index scans
- hash joins
- nested loops
- sorts
- and more
If statistics are stale, PostgreSQL can make poor plan choices that were not visible when the table was small.
Good practice
- make sure analyze is happening normally
- run
ANALYZEafter major data distribution changes where needed - inspect estimated vs actual rows in
EXPLAIN ANALYZE - treat large estimate mismatches as a real signal
Scaling to millions of rows without good statistics is like driving with a distorted map.
12. Use Partitioning Only When the Workload Justifies It
Partitioning is powerful. It is not the first answer to every large-table problem.
The PostgreSQL docs say partitioning can help when:
- heavily accessed rows are concentrated in one or a few partitions
- queries or updates access a large percentage of a single partition
- or bulk loads/deletes benefit from adding or removing partitions
The docs also note a rough rule of thumb: partitioning tends to become worth considering when a table is so large it exceeds physical memory, though the exact point depends on the application. :contentReference[oaicite:1]{index=1}
Good candidates for partitioning
- logs
- event tables
- time-series data
- large append-heavy history tables
- retention-driven datasets
- large tenant- or region-segmented workloads
Weak reasons to partition
- “the table is getting big”
- “it feels like good architecture”
- “we might need it someday”
Partitioning adds complexity:
- schema management
- index management
- migration planning
- operational routines
Use it when the workload makes the benefits real.
13. Use Replicas for Read Distribution When It Actually Helps
Once read traffic grows, one common scaling move is to introduce read replicas.
This makes the most sense when:
- reads dominate writes
- read latency matters a lot
- some workloads can tolerate replica lag
- analytical or user-facing reads can be separated from primary write pressure
Good use cases
- reporting dashboards
- search-heavy read APIs
- historical exploration
- user-facing read traffic that does not require strict write-after-read guarantees
Important warning
A replica is not a universal solution.
It does not fix:
- bad query shape
- missing indexes
- deep pagination
- N+1 query patterns
- or hot write-table design
Scale the workload first. Then distribute it where appropriate.
14. Tune Connection Behavior
A growing application often scales out its app servers faster than it scales its database behavior.
That causes:
- too many open sessions
- bursty connection creation
- pool saturation
- and database coordination overhead
Better habits
- use connection pooling
- understand framework defaults
- cap connection growth
- keep transactions short so connections return quickly
- monitor pool wait times, not just query times
A connection problem can feel like a database performance problem even when query execution is not the true bottleneck.
15. Split Operational and Analytical Workloads When Needed
As datasets grow, one schema and one set of queries often cannot serve every use case equally well.
That is normal.
Transactional tables are optimized for:
- correctness
- fast writes
- predictable API reads
- constraints and relationships
Analytical or reporting workloads often want:
- larger scans
- different aggregation paths
- summary tables
- materialized views
- denormalized projections
- or separate pipelines
Do not force one hot transactional table to be perfect for:
- app writes
- user APIs
- internal reports
- admin search
- and deep analytics
That is how large systems become awkward.
16. Measure the Right Things
If you want PostgreSQL to scale well, measure beyond raw CPU.
Useful things to monitor include:
- top expensive queries
- top frequent queries
- query count per request
- lock waits
- autovacuum behavior
- table and index size growth
- I/O pressure
- connection pool saturation
- replication lag if replicas exist
- temp file creation for sorts/hashes
- transaction duration
At millions of rows, vague monitoring becomes expensive quickly.
17. Do Not Jump to Sharding Too Early
A lot of teams use “millions of rows” as shorthand for:
- “we probably need sharding”
Usually they do not.
Many systems can go very far on one PostgreSQL cluster with:
- better indexes
- better query design
- better pagination
- healthier vacuuming
- better schema design
- smarter read distribution
- and more disciplined application behavior
Sharding is real. It is also one of the most expensive complexity choices you can make.
You want to earn your way into it, not panic your way into it.
A Practical Scaling Checklist
If your PostgreSQL tables are moving into the millions of rows, use this checklist:
- Identify the real hot queries
- Run
EXPLAIN ANALYZEon them - Check whether the right indexes exist
- Replace deep offset pagination
- Reduce wide-row and
select *patterns - Check autovacuum and analyze health
- Watch dead tuples and write churn
- Review connection pool behavior
- Decide whether read replicas help the actual workload
- Consider partitioning only if the table is truly large enough and the access pattern fits
That checklist usually gets you much further than dramatic architecture changes.
Conclusion
Scaling PostgreSQL for millions of rows is mostly about staying disciplined.
That means:
- designing around real query patterns
- choosing better data types
- keeping hot tables lean
- building the right indexes
- reducing scanned rows
- fixing pagination
- preventing write churn from overwhelming maintenance
- and introducing partitioning or replicas only when the workload clearly justifies them
The biggest mistake is treating raw row count like the enemy.
Most of the time, the real enemy is inefficient work.
PostgreSQL can handle millions of rows well. Your job is to make sure the application and schema let it.