PostgreSQL Sort and GROUP BY Performance Tuning
Level: intermediate · ~12 min read · Intent: informational
Audience: backend developers, database engineers, technical teams
Prerequisites
- basic familiarity with PostgreSQL
Key takeaways
- PostgreSQL sort and GROUP BY performance usually improves most when queries process fewer rows, use indexes that match the access pattern, and avoid unnecessary disk spills.
- work_mem matters for sorts and hash aggregations, but it should be tuned carefully because it is not a global per-server limit and can multiply across operations and concurrent sessions.
FAQ
- Why are ORDER BY queries slow in PostgreSQL?
- ORDER BY queries are often slow because PostgreSQL has to sort too many rows, cannot use an index that already matches the requested order, or spills sort work to disk due to memory limits.
- Does increasing work_mem always make GROUP BY faster?
- Not always. More work_mem can reduce spills for sorts and hash aggregation, but query shape, row counts, indexes, and concurrency often matter just as much or more.
Sorting and grouping are some of the most common reasons PostgreSQL queries become expensive.
That is because ORDER BY and GROUP BY often force PostgreSQL to do work that goes beyond simple row lookup.
Instead of just finding rows, PostgreSQL may need to:
- sort a large working set
- group many rows together
- build hashes in memory
- spill intermediate data to disk
- or scan far more rows than the final result actually returns
This is why a query that looks simple at the SQL level can become slow once the table is large or the working set grows.
The good news is that PostgreSQL sort and grouping performance is often very fixable. The biggest gains usually come from:
- reducing how many rows reach the sort or group step
- using indexes that match the access pattern
- understanding when PostgreSQL sorts versus hashes
- and tuning memory only after the workload shape is understood
This guide explains how PostgreSQL handles ORDER BY and GROUP BY, why these queries slow down, and how to tune them in practical ways.
The Most Important Sort and Grouping Rule
Before going deeper, remember this:
The fastest PostgreSQL sort or grouping step is usually the one that receives fewer rows in the first place.
That matters because many teams focus immediately on:
- raising
work_mem - forcing indexes
- or blaming PostgreSQL for sorting
But often the real problem is earlier in the query:
- too many rows were scanned
- too many rows were joined
- filters were applied too late
- or the query shape asked PostgreSQL to sort or group a much larger dataset than necessary
So good tuning usually starts with reducing the working set before tuning the sort itself.
1. Why ORDER BY and GROUP BY Become Expensive
A plain indexed lookup can often stop after finding a few matching rows.
A sort or grouping step is different.
For ORDER BY, PostgreSQL may need to:
- collect matching rows
- compare them
- arrange them into the requested order
- and sometimes materialize intermediate results before returning the final output
For GROUP BY, PostgreSQL may need to:
- read many rows
- organize them by grouping keys
- calculate aggregates
- and possibly sort or hash the input before finalizing results
That means performance depends heavily on:
- how many rows flow into the step
- whether ordering already exists
- whether a supporting index can be used
- and whether the operation stays in memory or spills to disk
This is why sorting and grouping are so sensitive to scale.
2. PostgreSQL Does Not Always Need an Explicit Sort
One of the most useful tuning ideas is that PostgreSQL sometimes can avoid a separate sort step entirely.
If an index already provides rows in the needed order, PostgreSQL may be able to read them in that order directly.
For example, this query:
SELECT id, created_at, status
FROM orders
WHERE customer_id = 42
ORDER BY created_at DESC
LIMIT 20;
can often be helped by an index like:
CREATE INDEX idx_orders_customer_created_at
ON orders (customer_id, created_at DESC);
That index matches:
- the filter on
customer_id - and the requested order on
created_at DESC
When that alignment exists, PostgreSQL may avoid doing a large separate sort.
That is one of the most powerful sort optimizations available.
3. Why ORDER BY Gets Slow
ORDER BY becomes slow mainly when PostgreSQL has to sort too many rows.
That often happens because:
- there is no matching index for the filter-plus-order pattern
- the query returns a broad working set before sorting
- the sort spills to disk
- or the query joins large tables and sorts the expanded result later
A common pattern looks like this:
SELECT id, created_at, total
FROM orders
WHERE status = 'open'
ORDER BY created_at DESC;
If there is no index that helps both:
- find the relevant open rows
- and provide the right ordering
PostgreSQL may scan a large set and sort it afterward.
That becomes expensive fast on large tables.
4. LIMIT Makes Sort Design Even More Important
ORDER BY queries with LIMIT are especially important to optimize because they are common in:
- dashboards
- feeds
- recent activity lists
- newest-first tables
- search result pages
- admin interfaces
If PostgreSQL can use an index that already gives rows in the right order, it may be able to stop early after finding enough rows.
That is much cheaper than:
- gathering a large working set
- sorting it
- and then returning just the first 20 rows
This is why index alignment matters so much for hot list endpoints.
A good index can turn:
- sort everything, then return a few rows
into:
- read the next few rows directly in order
That is a huge win.
5. GROUP BY Has Different Cost Patterns
GROUP BY is not just about ordering.
It is about aggregation.
That means PostgreSQL may need to:
- split rows into groups
- maintain running aggregate state
- and produce one output row per group
Examples:
- total sales per customer
- count of orders per day
- average response time per service
- sum of quantity per product
The performance question becomes:
- how many input rows must be read?
- how many groups are produced?
- and what strategy PostgreSQL uses to build them?
That is why GROUP BY tuning often overlaps with:
- filtering
- join reduction
- aggregate strategy
- memory usage
- and sometimes index design
6. PostgreSQL May Use Sort-Based or Hash-Based Aggregation
For grouping queries, PostgreSQL commonly chooses between:
- sort-based grouping
- hash-based grouping
Sort-based grouping
Here PostgreSQL sorts rows by the grouping key, then walks through the ordered set and aggregates each group.
This can work well when:
- sorting is affordable
- ordered input already exists
- or the planner thinks the sort path is efficient enough
Hash-based grouping
Here PostgreSQL builds a hash structure in memory keyed by the grouping columns.
This can work well when:
- the grouping keys fit memory reasonably
- the number of groups is manageable
- and a hash path is cheaper than sorting
Either one can be correct. The planner chooses based on estimated cost.
This is why some GROUP BY queries improve with more memory, while others improve more from row reduction or indexing.
7. work_mem Matters, But It Is Not the Whole Story
work_mem is one of the most discussed PostgreSQL settings for sorts and aggregations.
It matters because it affects how much memory a sort or hash operation can use before spilling to disk.
That can influence:
- sorts for
ORDER BY - sort-based grouping
- hash aggregation
- hash joins
- DISTINCT operations
But work_mem is often misunderstood.
It is not:
- one global chunk for the whole server
It is more like:
- a per-operation working limit
That means:
- one query can use multiple such operations
- several sessions can do that at once
- and total memory pressure can grow much faster than expected
So while low work_mem can definitely hurt sorts and grouping, blindly raising it is risky.
It should be done with real query evidence and concurrency in mind.
8. Sort Spills to Disk Are a Big Warning Sign
One of the clearest signs of a sorting problem is when PostgreSQL spills sort work to disk.
Disk-based sorting is much slower than keeping the operation in memory.
This often happens when:
- too many rows reach the sort
- the rows are wide
work_memis too low for the workload- or the query shape causes an unnecessarily large intermediate result
If a frequently used query spills to disk, that is a strong signal to investigate:
- whether the working set can be reduced
- whether the right index can remove or narrow the sort
- whether selected columns are wider than necessary
- and only then whether memory settings should change
A spill is often a symptom, not the entire root cause.
9. Reduce Rows Before Sorting or Grouping
This is one of the best practical tuning strategies.
If a query sorts 2 million rows but only 5,000 actually matter after a filter or join change, the best improvement is often reducing the input, not optimizing the sort itself.
Helpful questions:
- can filters be pushed earlier?
- can joins happen after narrowing rows?
- are we selecting unnecessary rows before aggregating?
- is the query grouping far more data than the final answer needs?
- are there predicates that can be made more selective?
This is especially important in reporting queries where a large join is performed first and only then sorted or grouped.
PostgreSQL can only be as fast as the working set you hand it.
10. Use Indexes That Match the Sort Path
Indexes help ORDER BY most when they match:
- the filter pattern
- and the requested ordering
For example, this query:
SELECT id, created_at
FROM posts
WHERE published = true
ORDER BY created_at DESC
LIMIT 50;
may benefit from an index like:
CREATE INDEX idx_posts_published_created_at
ON posts (published, created_at DESC);
This is much better than separate disconnected indexes if the hot query always combines:
published = true- recent-first ordering
The key idea is that PostgreSQL should be able to walk the index in the desired logical order instead of sorting a bigger result later.
11. Column Order in Composite Indexes Matters
For sort tuning, composite index order is very important.
If a query filters by:
tenant_idand sorts by:created_at DESC
then an index like:
(tenant_id, created_at DESC)
is usually much more useful than:
(created_at DESC, tenant_id)
because the filter should lead the access path.
This is why sort tuning is closely connected to understanding actual query shape. The same columns in the wrong order may not help nearly as much.
12. Indexes Do Not Always Eliminate Sorting
Even when an index exists, PostgreSQL may still sort.
That can happen when:
- the filter does not align well enough with the index
- the join order changes the result shape
- the planner estimates that another path is cheaper
- the requested ordering does not fully match the accessible index order
- or the query requires operations that destroy simple ordered access
So the right question is not:
- is there an index?
It is:
- does this index let PostgreSQL retrieve the required rows in the required order efficiently enough to beat an explicit sort?
That is a more practical way to think about it.
13. Grouping Often Benefits More From Better Filtering Than From Bigger Memory
A common assumption is:
GROUP BYis slow, so increasework_mem
Sometimes that helps. Often it is not the main answer.
A grouping query may really be slow because:
- it scans too many rows
- joins too much data before grouping
- aggregates rows from old history that the endpoint does not need
- or groups on an expression that is expensive and poorly supported
In many cases, a better improvement comes from:
- narrowing the date range
- pre-filtering the source
- reducing join width
- aggregating in stages
- or moving heavy reporting logic to a more suitable path
Memory matters, but query shape usually matters first.
14. GROUP BY on Expressions Can Be Expensive
Grouping by a derived expression can add cost.
Example:
SELECT DATE(created_at), COUNT(*)
FROM events
GROUP BY DATE(created_at);
This may work, but it can still become expensive on very large tables because PostgreSQL must:
- compute the expression
- group by that computed value
- and potentially scan large numbers of rows first
The main issue is usually not the syntax alone. It is whether the workload has been narrowed enough and whether the grouping path matches real access patterns.
If the system runs this kind of query constantly, teams may consider:
- pre-aggregation
- materialized reporting paths
- or schema/index strategies tailored to that workload
15. Hash Aggregation Can Spill Too
Sort spills are widely known. Hash aggregation can spill too when memory pressure is insufficient for the number of groups or the shape of the aggregate workload.
This means a GROUP BY query can struggle even if it is not visibly doing a giant sort step.
When grouping performance is poor, it helps to inspect:
- how many rows are read
- how many groups are produced
- whether hashing or sorting was chosen
- and whether memory pressure caused a spill
This is another reason EXPLAIN ANALYZE is so valuable.
It shows where the cost is really landing.
16. Wide Rows Make Sorting More Expensive
Sorting wide rows is more expensive than sorting narrow rows.
Why?
Because PostgreSQL has to carry more data through the sort step.
That means queries can often improve by:
- selecting only needed columns
- postponing large text or JSON fields until later
- avoiding
SELECT *in sorted hot paths - and keeping intermediate row width smaller
For high-frequency endpoints, this can make a real difference.
A sort of:
id, created_at, status
is much cheaper than sorting:
id, created_at, status, huge_json_payload, long_text_body, metadata_blob
even if the row count is identical.
17. Joins Before Sorts Can Multiply Cost
A common expensive pattern is:
- join several tables
- create a wide intermediate result
- then sort or group the expanded output
This can be much slower than:
- narrowing rows first
- sorting or grouping earlier where safe
- or restructuring the query so the heavy join does not happen before the largest reduction step
This is especially true when:
- the join increases row count
- the result rows are wide
- or the query only needs a small final page or summary
In sort and grouping performance, query order matters a lot.
18. DISTINCT Can Behave Like Sorting or Grouping
DISTINCT often shows up in the same family of performance problems because PostgreSQL may need to:
- sort rows
- hash rows
- and remove duplicates
That means a slow DISTINCT query may be suffering from many of the same issues as a slow GROUP BY query:
- too many input rows
- lack of selectivity
- wide tuples
- poor memory behavior
- or missing supporting indexes
So if DISTINCT is slow, inspect it with the same mindset you would use for a grouping query.
19. EXPLAIN ANALYZE Is the Best Debugging Tool Here
For sort and grouping tuning, EXPLAIN ANALYZE is essential.
It helps answer:
- did PostgreSQL do an explicit sort?
- did it use hash aggregation or sort-based grouping?
- how many rows reached the expensive step?
- were estimated rows far from actual rows?
- did the operation spill to disk?
- was the expensive part really the sort, or something earlier?
Without that visibility, teams often optimize the wrong part of the query.
A query that “feels like a sort problem” may really be:
- a scan problem
- a join explosion problem
- a stale-statistics problem
- or a wide-row problem
20. Materialized Summaries Are Sometimes the Right Answer
Not every repeated GROUP BY workload should be answered live from base transactional tables.
If the application repeatedly needs:
- dashboard totals
- daily rollups
- per-tenant counts
- top products
- repeated time-bucket summaries
then precomputed paths may be justified.
Examples:
- summary tables
- materialized views
- projection tables
- scheduled rollups
This is especially true when:
- the grouping workload is large
- exact real-time freshness is not critical
- and the same expensive aggregate is requested again and again
This is not a replacement for basic tuning. It is an architectural optimization for repeated heavy aggregate reads.
Common PostgreSQL Sort and Grouping Mistakes
Sorting far more rows than needed
The main problem is often the input size, not the sort algorithm itself.
Using SELECT * on sorted hot paths
Wide rows make sorts and aggregations more expensive than necessary.
Missing composite indexes for filter-plus-order queries
An index exists, but it does not match the real access path.
Assuming work_mem is the only answer
Memory helps, but query shape and row reduction usually matter first.
Joining large tables before reducing rows
This can create huge intermediate results that are expensive to sort or group.
Ignoring spills
Disk spills are a major performance signal and should not be dismissed.
FAQ
Why are ORDER BY queries slow in PostgreSQL?
ORDER BY queries are often slow because PostgreSQL has to sort too many rows, cannot use an index that already matches the requested order, or spills sort work to disk due to memory limits.
Does increasing work_mem always make GROUP BY faster?
Not always. More work_mem can reduce spills for sorts and hash aggregation, but query shape, row counts, indexes, and concurrency often matter just as much or more.
Conclusion
PostgreSQL sort and GROUP BY performance tuning works best when you stop thinking only about the final sort or aggregate step.
The biggest wins usually come from:
- reducing how many rows reach that step
- using indexes that match the filter and order pattern
- keeping result rows narrower
- understanding whether PostgreSQL is sorting or hashing
- and only then tuning memory where real spills justify it
That is why the best tuning mindset is simple:
- reduce the working set
- align indexes with the query shape
- inspect the real execution plan
- and treat memory as one tool, not the whole solution
When you do that, PostgreSQL sort and grouping workloads become much easier to control, even as the data grows.