SQL Subqueries Explained
Level: intermediate · ~18 min read · Intent: informational
Audience: backend developers, data analysts, data engineers, technical teams, software engineers, students
Prerequisites
- basic familiarity with databases
- basic understanding of SELECT, WHERE, JOIN, and aggregate functions
Key takeaways
- A SQL subquery is simply a query inside another query, but the real skill is understanding what the inner query should return and how the outer query uses that result.
- Subqueries are powerful for filtering, comparison, existence checks, and staged logic, but they should be chosen deliberately because some problems are clearer or faster with joins, CTEs, or window functions.
FAQ
- What is a subquery in SQL?
- A subquery is a query nested inside another SQL query. It can return a single value, a list of values, or a result set that the outer query uses for filtering, comparison, or further querying.
- What is the difference between a correlated subquery and a normal subquery?
- A normal subquery can run independently of the outer query, while a correlated subquery depends on values from the current row of the outer query.
- When should I use a subquery instead of a JOIN?
- Use a subquery when it expresses the logic more clearly, especially for value comparisons, EXISTS checks, or staged filtering. Use a JOIN when you need columns from related tables or when a join-based approach is clearer and easier to optimize.
- Are subqueries bad for performance?
- Not automatically. Some subqueries perform very well, but correlated subqueries or poorly structured nested logic can become expensive. The right choice depends on the database engine, the data shape, the indexes, and the execution plan.
SQL subqueries are one of the most useful SQL features because they let you answer a question by first answering a smaller question inside it.
That sounds simple, and it is.
But it is also where many SQL learners get stuck.
They can write:
SELECTWHEREJOINGROUP BY
But once they see a query inside another query, it starts to feel more advanced than it really is.
The good news is that a subquery is not a separate kind of SQL universe. It is just:
- one query producing something
- that another query uses
That is all.
This guide explains SQL subqueries clearly, including:
- what a subquery is
- the main subquery types
- scalar subqueries
INandEXISTS- correlated subqueries
- subqueries in
SELECT - subqueries in
FROM - when subqueries are a great fit
- and when a join, CTE, or window function may be the better choice
Why subqueries matter
A lot of SQL problems are really two-step problems.
For example:
- find products priced above the average product price
- return customers who placed at least one order
- show employees who earn more than their department average
- return the latest order date per customer
- list users whose email appears in a suspicious source list
- find products that were never ordered
Each of those has a natural smaller question inside it:
- what is the average price?
- which customers have orders?
- what is the average salary in this department?
- what is the latest order date?
- which emails are in the suspicious list?
- which products appear in order items?
That inner question is often a subquery.
This is why subqueries matter so much. They let you break logic into nested steps that often match how people naturally think.
The most important rule
Before anything else, remember this:
To understand a subquery, first ask what the inner query returns.
That is the most important rule in this topic.
A lot of confusion disappears once you ask:
- does the inner query return one value?
- a list of values?
- or a table-like result set?
Because that determines how the outer query can use it.
For example:
- one value can be used with
=,>,< - a list of values can be used with
IN - an existence check can be used with
EXISTS - a table-like result can be used in
FROM
If you get the return shape of the inner query clear first, the rest becomes much easier.
What a subquery is
A subquery is a query inside another query.
That is the basic definition.
Example:
SELECT *
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);
The inner query is:
SELECT AVG(price)
FROM products
The outer query uses that result to filter rows.
That is a subquery.
It is sometimes also called:
- nested query
- inner query
But “subquery” is the most common term.
A simple mental model
A very useful mental model is:
Outer query
The main question.
Inner query
A smaller helper question.
For example:
Main question:
- which products cost more than average?
Helper question:
- what is the average price?
That is exactly how subqueries often work in practice.
The main kinds of subqueries
Subqueries are usually easiest to understand when grouped by what they return.
The most common practical types are:
1. Scalar subquery
Returns one value.
2. Multi-row subquery
Returns a list of values.
3. Correlated subquery
Depends on the current row of the outer query.
4. Derived-table subquery
Returns a table-like result used in FROM.
These are the main patterns you will see again and again.
Scalar subquery explained
A scalar subquery returns one value.
That makes it useful in comparisons like:
=><>=<=
Example:
SELECT
product_id,
product_name,
price
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);
The inner query returns:
- one value
- the average price
The outer query then compares each product price to that one value.
This is one of the easiest and most common subquery patterns.
Why scalar subqueries are useful
Scalar subqueries are useful when the outer query needs:
- one threshold
- one benchmark
- one summary value
- one lookup value
Examples:
- above average salary
- below minimum passing score
- equal to latest date
- more expensive than average product
- greater than total from another calculation
These are all natural one-value comparison problems.
Multi-row subquery explained
A multi-row subquery returns multiple values.
That means the outer query often uses it with:
INNOT INANYALL
The most common one by far is IN.
Example:
SELECT
customer_id,
customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
);
The inner query returns:
- many customer IDs from orders
The outer query then keeps only customers whose customer_id appears in that set.
This is a very common and useful SQL pattern.
Why IN works so naturally with subqueries
IN is useful because the logic reads almost like English.
Example:
- return customers whose ID is in the set of customer IDs from orders
That is very intuitive.
It is often used for:
- filtering by related IDs
- matching against a result set
- narrowing rows based on another table
- and quick membership logic
That is why IN subqueries are common in both learning and real production SQL.
Example: products that were ordered
SELECT
product_id,
product_name
FROM products
WHERE product_id IN (
SELECT product_id
FROM order_items
);
This returns products whose IDs appear in the order_items table.
The inner query returns:
- the set of ordered product IDs
The outer query uses that set for filtering.
This is another very practical subquery pattern.
Subqueries with NOT IN
You can also use:
NOT IN
Example:
SELECT
product_id,
product_name
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM order_items
);
This looks like:
- return products not present in order items
Conceptually, that is fine.
But there is an important caveat with NULL values, which we will cover later, because NOT IN can behave unexpectedly if the subquery returns NULL.
That is one reason many people prefer NOT EXISTS for these patterns.
EXISTS explained
EXISTS checks whether the subquery returns at least one row.
That is different from comparing a value to a returned set. It is purely an existence check.
Example:
SELECT
c.customer_id,
c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
This means:
- keep customers for whom at least one matching order exists
This is one of the most important SQL patterns.
Why EXISTS is so useful
EXISTS is useful when the real question is:
- does a related row exist?
Not:
- what exact values should I compare?
Examples:
- customers with at least one order
- users with at least one login
- products with at least one sale
- employees with subordinates
- accounts with unpaid invoices
These are all existence questions.
That is why EXISTS is often one of the cleanest ways to express them.
EXISTS versus IN
A very common SQL question is:
- should I use
INorEXISTS?
The practical answer is:
- both can be correct
- but they express slightly different logic
IN
Often feels natural when comparing one value to a returned set.
EXISTS
Often feels natural when asking whether at least one related row exists.
Example with IN:
WHERE customer_id IN (
SELECT customer_id
FROM orders
)
Example with EXISTS:
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
)
Both may solve similar problems.
But EXISTS is often clearer for relationship existence logic.
Correlated subquery explained
A correlated subquery is a subquery that depends on values from the outer query.
That means the inner query cannot be understood independently in the same way as a normal subquery. It is tied to the current outer row.
Example:
SELECT
e.employee_id,
e.employee_name,
e.department,
e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e.department
);
The inner query depends on:
e.department
which comes from the outer query.
So for each employee row, the inner query calculates:
- average salary in that employee’s department
That is a correlated subquery.
Why correlated subqueries are powerful
Correlated subqueries are powerful because they let you compare each row against:
- its own group
- its own related context
- or its own dynamic benchmark
Examples:
- employee versus department average
- order versus customer’s average order size
- product versus category average price
- row versus previous maximum within a group
They are very expressive and often match the business logic well.
Why correlated subqueries can be harder
Correlated subqueries are often harder because:
- they are more conceptually nested
- the inner query runs with reference to outer-row values
- performance can become more sensitive on larger datasets
- and the same logic might sometimes be expressible with joins or window functions more clearly
That does not make them bad. It just means they require more deliberate thinking.
A good habit is:
- use them when they make the logic clearer
- but still inspect performance and readability
Example: customers whose total order value is above average customer total
You can solve this in several ways, but a staged subquery approach can look natural.
SELECT
customer_id,
total_spend
FROM (
SELECT
customer_id,
SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
) customer_totals
WHERE total_spend > (
SELECT AVG(total_spend)
FROM (
SELECT
customer_id,
SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
) avg_source
);
This is more complex than some alternatives, but it shows a key point:
- subqueries can be used to create intermediate result sets
- and then query those intermediate results further
That is very useful in reporting logic.
Subquery in the FROM clause
A subquery can also appear in the FROM clause.
When this happens, it acts like a temporary table or derived table.
Example:
SELECT
customer_id,
total_spend
FROM (
SELECT
customer_id,
SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
) customer_totals;
The inner query creates:
- a customer totals result set
The outer query selects from it as though it were a table.
This is a very useful SQL technique.
Why subqueries in FROM are useful
They are useful when:
- you want to build an intermediate result
- give it a name
- then query it further
This helps with tasks like:
- filtering aggregated results
- joining summarized data
- restructuring row grain before the next step
- simplifying more complex logic
A lot of SQL reporting becomes easier when you think in stages like this.
Subquery in the SELECT list
A subquery can also appear in the SELECT list itself.
Example:
SELECT
c.customer_id,
c.customer_name,
(
SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id
) AS order_count
FROM customers c;
This returns:
- customer data
- plus a derived order count per customer
This is another form of correlated subquery, because the inner query depends on:
c.customer_id
This pattern can be useful, but like other correlated forms, it should be used thoughtfully.
When a SELECT-list subquery is useful
It is useful when you want:
- one main row
- plus a derived related value per row
Examples:
- customer plus order count
- product plus total sales
- employee plus number of direct reports
- ticket plus latest comment time
This can be very readable for small or moderate use cases.
But on large workloads, you should still consider whether:
- a join plus grouping
- or a window function would be clearer or more efficient
Subqueries versus joins
This is one of the biggest practical SQL design questions.
Sometimes a subquery is clearer. Sometimes a join is clearer.
There is no universal rule that one is always better.
A good practical difference is:
Subquery
Useful when the inner query feels like:
- a helper result
- a membership set
- an existence check
- a benchmark value
- or a staged intermediate result
Join
Useful when you need:
- columns from related tables
- direct row combination
- or a flatter relationship-based query structure
The best choice is usually the one that makes the logic easiest to understand and maintain while still performing well.
Example: subquery versus join for ordered products
Subquery style:
SELECT
product_id,
product_name
FROM products
WHERE product_id IN (
SELECT product_id
FROM order_items
);
Join style:
SELECT DISTINCT
p.product_id,
p.product_name
FROM products p
INNER JOIN order_items oi
ON p.product_id = oi.product_id;
Both may answer the same logical question.
The subquery version says:
- keep products whose ID appears in order_items
The join version says:
- join matching ordered products
Sometimes the subquery feels more direct for the business question. That is why subqueries remain so useful.
Subqueries versus CTEs
A CTE is often just a clearer, named way to express staged subquery logic.
Example using a derived table:
SELECT
customer_id,
total_spend
FROM (
SELECT
customer_id,
SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
) customer_totals;
Equivalent idea with a CTE:
WITH customer_totals AS (
SELECT
customer_id,
SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
total_spend
FROM customer_totals;
Both are valid.
The CTE version is often easier to read in larger queries. That is why subqueries and CTEs are closely related in practical SQL design.
Subqueries versus window functions
Some problems that used to be solved with subqueries are often cleaner with window functions.
Example: latest order per customer.
Subquery solutions exist.
But often ROW_NUMBER() or MAX() OVER (...) style patterns are clearer.
That does not make subqueries obsolete. It just means modern SQL gives you several tools, and the best one depends on:
- readability
- row shape
- performance
- and the business question
A strong SQL user knows multiple ways to solve a problem.
Common subquery patterns
Here are the most common patterns you will see.
1. Scalar comparison
WHERE price > (
SELECT AVG(price)
FROM products
)
2. Membership test
WHERE customer_id IN (
SELECT customer_id
FROM orders
)
3. Existence check
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
)
4. Derived table in FROM
FROM (
SELECT customer_id, SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
) customer_totals
5. Correlated calculation
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e.department
)
If you understand these five patterns, you understand a large share of real SQL subquery use.
Common mistake: not knowing what the subquery returns
This is the biggest one.
If the outer query expects:
- one value
but the subquery returns:
- many rows
the query may fail or behave differently than expected.
That is why the first question is always:
- what shape does the subquery return?
One value? One column with many rows? A whole table-like result?
That determines the correct outer usage.
Common mistake: using = with a multi-row subquery
This is a classic beginner issue.
Wrong pattern if multiple rows can return:
WHERE customer_id = (
SELECT customer_id
FROM orders
)
If the subquery returns many customer IDs, that is not appropriate.
The correct pattern may be:
WHERE customer_id IN (
SELECT customer_id
FROM orders
)
This is why matching the comparison operator to the subquery shape is so important.
Common mistake: NOT IN with NULL values
This is one of the most important practical warnings.
Example:
SELECT *
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM order_items
);
If the subquery can return NULL values, NOT IN may behave in surprising ways.
That is why many teams prefer:
WHERE NOT EXISTS (...)
for anti-join style logic such as:
- rows with no related match
This is a very important real-world SQL caution.
Common mistake: overusing correlated subqueries where clearer options exist
Correlated subqueries can be elegant. But sometimes they make a query:
- harder to read
- harder to optimize
- or harder to explain
If the same logic is clearer with:
- a join
- a CTE
- or a window function
that may be the better option.
This is not about banning correlated subqueries. It is about using the clearest tool for the problem.
Performance thoughts on subqueries
Subqueries are not automatically bad for performance.
Some perform extremely well. Others do not.
The real performance depends on:
- the database engine
- the indexes
- whether the subquery is correlated
- row counts
- join paths
- and the overall execution plan
A strong rule is:
- do not judge performance by syntax alone
- inspect the plan and measure the real query
That is especially important for correlated subqueries and very large result sets.
Practical examples
Example 1: products above average price
SELECT
product_id,
product_name,
price
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);
Example 2: customers with orders
SELECT
customer_id,
customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
);
Example 3: customers with at least one order using EXISTS
SELECT
c.customer_id,
c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Example 4: employees above department average
SELECT
e.employee_id,
e.employee_name,
e.department,
e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e.department
);
Example 5: summarized data from a derived table
SELECT
customer_id,
total_spend
FROM (
SELECT
customer_id,
SUM(total_amount) AS total_spend
FROM orders
GROUP BY customer_id
) customer_totals
WHERE total_spend > 1000;
These examples cover the main styles you will use most often.
How to decide whether a subquery is a good fit
A practical decision framework looks like this:
Use a subquery when:
- the inner query feels like a natural helper question
- you need one benchmark value
- you need a membership set
- you need an existence check
- or you want a clean intermediate result
Consider a join when:
- you need columns from related tables directly
- the row relationship is central to the logic
- or the join makes the result shape clearer
Consider a CTE when:
- the logic has multiple stages
- naming intermediate results improves readability
- or the query is getting long
Consider window functions when:
- the problem is about row ranking, row comparison, or per-group latest logic
This is a very useful way to think practically about SQL design.
FAQ
What is a subquery in SQL?
A subquery is a query nested inside another SQL query. It can return a single value, a list of values, or a result set that the outer query uses for filtering, comparison, or further querying.
What is the difference between a correlated subquery and a normal subquery?
A normal subquery can run independently of the outer query, while a correlated subquery depends on values from the current row of the outer query.
When should I use a subquery instead of a JOIN?
Use a subquery when it expresses the logic more clearly, especially for value comparisons, EXISTS checks, or staged filtering. Use a JOIN when you need columns from related tables or when a join-based approach is clearer and easier to optimize.
Are subqueries bad for performance?
Not automatically. Some subqueries perform very well, but correlated subqueries or poorly structured nested logic can become expensive. The right choice depends on the database engine, the data shape, the indexes, and the execution plan.
Final thoughts
SQL subqueries are powerful because they let you build a larger query from smaller logical questions.
That is why they feel natural once you stop thinking of them as “advanced syntax” and start thinking of them as:
- helper queries inside bigger queries
The most important ideas to remember are:
- first understand what the inner query returns
- scalar subqueries return one value
- multi-row subqueries often work with
IN EXISTSis great for relationship existence checks- correlated subqueries depend on the outer row
- subqueries in
FROMact like temporary derived tables - and joins, CTEs, and window functions are sometimes better alternatives depending on the problem
If you keep those ideas clear, subqueries stop feeling confusing and start becoming one of the most natural and useful parts of writing real SQL.