PostgreSQL Covering Indexes and Index-Only Scans
Level: intermediate · ~14 min read · Intent: informational
Audience: backend developers, database engineers, technical teams
Prerequisites
- basic familiarity with PostgreSQL
Key takeaways
- A PostgreSQL covering index usually means a normal index whose key columns handle filtering or ordering while INCLUDE columns carry extra payload so some queries can avoid reading the heap.
- Index-only scans are powerful on read-heavy, mostly stable tables, but they do not happen just because an index contains the right columns. PostgreSQL also needs enough pages marked all-visible in the visibility map.
FAQ
- What is a covering index in PostgreSQL?
- A covering index is usually an index that includes all the columns a query needs, often by using INCLUDE for non-key payload columns, so PostgreSQL may be able to satisfy the query directly from the index.
- What is an index-only scan in PostgreSQL?
- An index-only scan is a plan where PostgreSQL reads data from the index without needing to fetch most or all matching rows from the table heap, as long as the index supports it and visibility checks allow it.
- Do covering indexes always guarantee index-only scans?
- No. PostgreSQL also needs visibility information that proves rows are visible to the query. On frequently updated tables, heap checks may still be required often enough that index-only scans are much less effective.
- Does GIN support index-only scans in PostgreSQL?
- No. PostgreSQL documents that GIN cannot support index-only scans because each index entry usually stores only part of the original data value.
PostgreSQL covering indexes and index-only scans are closely related, but they are not the same thing.
That distinction matters.
A lot of developers hear:
- “make it a covering index” and assume that means
- “PostgreSQL will stop touching the table”
Sometimes it will. Sometimes it will not.
The reason is that index-only scans depend on more than just column presence.
For PostgreSQL to avoid heap reads effectively, three things have to line up:
- the index type must support index-only scans
- the query must need only columns stored in the index
- and PostgreSQL must be able to confirm row visibility cheaply enough using the visibility map
That is why covering indexes can be great for:
- hot read paths
- dashboards
- top-N lists
- recent activity feeds
- lookup-heavy APIs
but much less impressive on:
- highly volatile tables
- constantly updated rows
- or indexes stuffed with too many payload columns
This guide explains how it all works and when it is worth doing.
What Is a Covering Index in PostgreSQL?
In practical PostgreSQL language, a covering index is an index that contains all the data a query needs.
That usually means:
- key columns used for searching or ordering
- plus extra columns stored with
INCLUDEso the query can return those values without visiting the heap
Example:
create index idx_orders_account_created_include
on orders (account_id, created_at desc)
include (status, total_cents);
This index is shaped so a query like:
select created_at, status, total_cents
from orders
where account_id = $1
order by created_at desc
limit 20;
may be satisfied directly from the index.
That is what people usually mean by a covering index.
What Is an Index-Only Scan?
An index-only scan is a PostgreSQL execution plan where the database can get the needed data from the index itself without having to fetch every matching row from the table heap.
That is important because heap access is often the expensive part.
If PostgreSQL can avoid heap reads, it may reduce:
- random I/O
- buffer churn
- heap page reads
- and the cost of repeatedly visiting large tables for small result sets
Why this is powerful
A normal index scan typically works like this:
- use the index to find matching row pointers
- visit the table heap to fetch the actual row data
An index-only scan aims more for:
- use the index to find matches
- return the needed data from the index itself
- avoid or greatly reduce heap visits
That can be a big improvement on read-heavy tables.
The Three Conditions for Index-Only Scans
PostgreSQL’s docs make this clear: index-only scans are possible only when a few conditions line up.
1. The index type must support index-only scans
PostgreSQL states that:
- B-tree always supports index-only scans
- GiST and SP-GiST support them for some operator classes
- GIN does not support them :contentReference[oaicite:1]{index=1}
This is a major reason why B-tree remains the most important index type for covering-index patterns.
2. The query must reference only columns stored in the index
If the query asks for a column that is not present in the index, PostgreSQL cannot satisfy the result directly from the index.
Example that can qualify:
select created_at, status, total_cents
from orders
where account_id = $1;
if the index contains:
account_idcreated_atstatustotal_cents
Example that cannot qualify:
select created_at, status, total_cents, shipping_address
from orders
where account_id = $1;
if shipping_address is not in the index.
3. The visibility map must help PostgreSQL trust the row visibility cheaply
This is the part many people miss.
Even if the index contains every needed column, PostgreSQL still has to verify that the row is visible to the query’s MVCC snapshot.
To make index-only scans efficient, PostgreSQL uses the visibility map. If the relevant heap page is marked all-visible, PostgreSQL can often skip the heap lookup. If it is not, PostgreSQL still has to visit the heap page to check visibility. :contentReference[oaicite:2]{index=2}
That means a covering index does not automatically guarantee that the heap will be avoided.
Why the Visibility Map Matters So Much
The visibility map is what turns the idea of index-only scans into a practical performance feature.
On a mostly static table:
- many pages become all-visible
- heap checks are often avoidable
- index-only scans become much more valuable
On a heavily updated table:
- visibility information is disrupted more often
- many pages are not all-visible
- PostgreSQL may still need heap checks frequently
- the benefits of index-only scans shrink
Practical lesson
Index-only scans are usually strongest on:
- append-heavy tables
- read-mostly tables
- historical tables
- content tables
- reporting tables
- recent-item queries over mostly stable data
They are often less impressive on:
- job queues
- session tables
- highly churned status tables
- anything rewritten constantly
Key Columns vs INCLUDE Columns
This distinction is essential.
Key columns
These are the columns that define the searchable and ordered structure of the index.
Example:
(account_id, created_at desc)
These are used for:
- filtering
- ordering
- navigation through the B-tree
INCLUDE columns
These are extra payload columns stored in the index to help more queries be covered.
Example:
include (status, total_cents)
These are not part of the search key. They do not affect:
- ordering
- uniqueness semantics
- B-tree navigation
They are there to help the query get more data without touching the heap.
PostgreSQL’s CREATE INDEX docs explain that included columns are stored in leaf tuples only and are not part of the upper-level tree entries used for navigation. :contentReference[oaicite:3]{index=3}
That is exactly why they are useful for covering behavior without changing the key logic.
Why INCLUDE Is Better Than Just Appending More Key Columns
A common beginner mistake is to turn every needed output column into an indexed key column.
Example of a worse idea:
create index idx_orders_account_created_status_total
on orders (account_id, created_at desc, status, total_cents);
Often better:
create index idx_orders_account_created_include
on orders (account_id, created_at desc)
include (status, total_cents);
Why the second is better
Because:
- only the true search/sort columns stay in the key
statusandtotal_centsare payload- the key remains more focused
- you avoid pretending those columns are part of the search order when they are not
This is usually the cleaner design when the query needs those columns only for output.
Example: A Great Covering-Index Use Case
Imagine a tenant-scoped API endpoint:
select created_at, status, total_cents
from orders
where account_id = $1
order by created_at desc
limit 20;
This is a near-perfect candidate for a covering B-tree index:
create index idx_orders_account_created_include
on orders (account_id, created_at desc)
include (status, total_cents);
Why it works well:
account_idis the filtercreated_at descmatches the sortlimit 20means PostgreSQL only needs the first few rowsstatusandtotal_centsare payload columns needed only in the result
On a fairly stable table, this can be very effective.
Example: A Bad Covering-Index Use Case
Now imagine a high-churn sessions table:
select user_id, last_seen_at, metadata
from sessions
where account_id = $1
and is_active = true;
and last_seen_at is updated constantly.
Even if you build a covering index, index-only scans may not be nearly as useful because heap visibility checks will often still be needed.
That does not necessarily mean the index is wrong. It means the index-only benefit may be much weaker than people expect.
B-Tree Is Usually the Best Index Type for Covering Queries
For most teams, the safest practical rule is:
If you are designing for covering behavior and index-only scans, B-tree is usually the first place to start.
Why?
Because PostgreSQL documents that:
- B-tree always supports index-only scans
- only B-tree can return rows in index order for
ORDER BY - B-tree is the general-purpose default index type :contentReference[oaicite:4]{index=4}
That makes it the natural fit for many:
- top-N queries
- account-scoped recent-item queries
- filtered dashboard queries
- lookup-heavy APIs
- and read-mostly lists
What About GiST and SP-GiST?
PostgreSQL documents that:
- GiST and SP-GiST can support index-only scans for some operator classes
- and both GiST and SP-GiST support
INCLUDEcolumns currently :contentReference[oaicite:5]{index=5}
This means covering-style design is not purely a B-tree concept.
But for most application developers, the practical world of covering indexes is still mostly about B-tree because:
- it is the common case
- it supports ordered retrieval
- and its behavior is easier to reason about for ordinary filters and sorts
What About GIN?
GIN is important precisely because it is a great counterexample.
PostgreSQL explicitly states that GIN cannot support index-only scans because each index entry usually contains only part of the original data value. :contentReference[oaicite:6]{index=6}
That means if you are indexing:
jsonb- arrays
- full-text search
GIN may still be the correct index type for the search problem, but not for index-only scan behavior.
This is one reason why covering-index thinking should not be applied blindly to every index type.
Covering Indexes and ORDER BY
One of the best PostgreSQL use cases for covering indexes is:
- filter
- order
- limit
- return a few extra columns
Example:
select created_at, status, total_cents
from orders
where account_id = $1
order by created_at desc
limit 20;
A strong index:
create index idx_orders_account_created_include
on orders (account_id, created_at desc)
include (status, total_cents);
This is powerful because:
- the B-tree can navigate by
account_id - return rows already ordered by
created_at desc - avoid a separate sort
- and potentially avoid heap reads for the output columns
That is one of the cleanest real-world index-only patterns in PostgreSQL.
Covering Indexes Are Not Free
This is where overenthusiastic indexing often goes wrong.
Every included column:
- makes the index larger
- duplicates table data inside the index
- increases write cost
- may reduce cache efficiency if overused
- and can even exceed index tuple size limits if you are careless
PostgreSQL’s docs warn that non-key payload columns duplicate data from the table and bloat the size of the index, potentially slowing searches. They also note that wide payload columns should be considered carefully, and if an index tuple exceeds the maximum allowed size, insertion fails. :contentReference[oaicite:7]{index=7}
Practical rule
Use INCLUDE for:
- small, frequently needed output columns
- on hot read paths
- where the query pattern is repeated often
Do not use INCLUDE for:
- giant text columns
- arbitrary “maybe useful later” payload
- every selected column in the whole table
- indexes that are already borderline too wide
When Covering Indexes Pay Off Most
They tend to pay off most when:
1. The query is read-heavy
If the path is hit constantly, reducing heap reads matters more.
2. The result set is small
Top-N and recent-item queries are especially strong candidates.
3. The table is mostly stable
This improves the odds that index-only scans can actually avoid heap visits.
4. The index key already makes sense
A covering index is best when the underlying search/sort key is already a good fit.
5. The extra columns are small
Narrow payload columns are much safer than wide ones.
When They Are Less Worth It
Covering indexes are often less impressive when:
- the table is updated constantly
- visibility map coverage is poor
- the payload columns are wide
- the query pattern is rare
- the result shape changes frequently
- the underlying index key is weak already
- the system is write-heavy and every extra index byte matters
In those cases, a normal index may be better than a bigger covering one.
Expression Indexes and Partial Indexes Complicate the Story
PostgreSQL’s docs note that expression indexes and partial indexes complicate the simple “all needed columns are in the index” rule. :contentReference[oaicite:8]{index=8}
That matters because you can still create very effective designs like:
create index idx_users_lower_email_include
on users (lower(email))
include (id, created_at);
or:
create index idx_jobs_queue_created_pending
on jobs (queue_name, created_at)
include (payload)
where processed_at is null;
These can be excellent designs when the workload is predictable.
But they also mean you should think carefully about:
- what the query filters on
- what it returns
- and whether the index shape truly matches the hot path
A Good Design Workflow
If you are considering a covering index, ask these questions in order:
- What exact query is slow or repeated?
- What are the true search and sort columns?
- Which extra columns are needed only for output?
- Are those payload columns small enough to include safely?
- Is the table stable enough that index-only scans are likely to help?
- How often does this query actually run?
- Will the extra index size and write cost be worth it?
That workflow prevents most bad covering-index decisions.
Example Patterns That Often Deserve Covering Indexes
Recent orders per account
create index idx_orders_account_created_include
on orders (account_id, created_at desc)
include (status, total_cents);
Latest events per user
create index idx_events_user_occurred_include
on events (user_id, occurred_at desc)
include (event_type, source);
Pending jobs queue
create index idx_jobs_queue_created_pending
on jobs (queue_name, created_at)
include (attempt_count)
where processed_at is null;
These are good because:
- the key is clear
- the query shape repeats
- the payload is small
- and the result set is usually limited
Common Mistakes
1. Thinking “covering index” means “guaranteed heap-free”
It does not.
2. Including too many columns
This often bloats the index more than it helps.
3. Including wide columns casually
This can make the index slow and fragile.
4. Using INCLUDE instead of fixing a bad key design
A covering index is not a substitute for the right filter/sort key.
5. Expecting index-only scans on volatile tables to be amazing
They often are not.
6. Assuming GIN can do this too
It cannot do index-only scans.
FAQ
What is a covering index in PostgreSQL?
A covering index is usually an index that includes all the columns a query needs, often by using INCLUDE for non-key payload columns, so PostgreSQL may be able to satisfy the query directly from the index.
What is an index-only scan in PostgreSQL?
An index-only scan is a plan where PostgreSQL reads data from the index without needing to fetch most or all matching rows from the table heap, as long as the index supports it and visibility checks allow it.
Do covering indexes always guarantee index-only scans?
No. PostgreSQL also needs visibility information that proves rows are visible to the query. On frequently updated tables, heap checks may still be required often enough that index-only scans are much less effective.
Does GIN support index-only scans in PostgreSQL?
No. PostgreSQL documents that GIN cannot support index-only scans because each index entry usually stores only part of the original data value.
Conclusion
PostgreSQL covering indexes and index-only scans are best understood as a performance opportunity, not a guarantee.
A good covering index:
- uses the right search and sort key
- includes only the extra columns the hot query really needs
- stays reasonably lean
- and is applied to a workload where heap avoidance is realistically possible
That is why they are so effective on:
- read-heavy
- limited-result
- mostly stable query paths
The biggest lesson is simple:
A covering index is about storing enough useful data in the index.
An index-only scan is about whether PostgreSQL can really take advantage of that storage without going back to the heap.
Those are related ideas, but they are not the same thing.