SQL DISTINCT vs GROUP BY
Level: intermediate · ~17 min read · Intent: informational
Audience: backend developers, data analysts, data engineers, technical teams, students, software engineers
Prerequisites
- basic familiarity with databases
- basic understanding of SELECT, WHERE, and aggregate functions
Key takeaways
- DISTINCT and GROUP BY can sometimes produce similar-looking results, but they are designed for different purposes. DISTINCT removes duplicate result rows, while GROUP BY creates groups so you can summarize or aggregate data.
- The best choice depends on query intent. Use DISTINCT when you only want unique values, and use GROUP BY when you want one result row per group, especially when aggregate functions like COUNT, SUM, or AVG are involved.
FAQ
- What is the difference between DISTINCT and GROUP BY in SQL?
- DISTINCT removes duplicate rows from a result set, while GROUP BY groups rows together so you can summarize them, often with aggregate functions like COUNT, SUM, or AVG.
- Can DISTINCT and GROUP BY return the same result?
- Yes, in some simple cases they can return the same rows, especially when you are only trying to get unique combinations of selected columns. But the intent and use cases are still different.
- Is DISTINCT faster than GROUP BY?
- Not always. Performance depends on the database engine, the query plan, indexes, and the size of the data. You should choose based on intent first and then measure if performance matters.
- Should I use GROUP BY without aggregate functions?
- You can, and it may return results similar to DISTINCT in some cases, but it is usually clearer to use DISTINCT when your real goal is only to remove duplicates.
DISTINCT and GROUP BY are two of the most commonly confused SQL features because in some queries they appear to do the same thing.
For example, both of these can seem to return unique values:
SELECT DISTINCT country
FROM users;
and
SELECT country
FROM users
GROUP BY country;
At first glance, they look interchangeable.
That is why many developers and analysts eventually ask:
- what is the real difference?
- when should I use one instead of the other?
- is one better for performance?
- and are there cases where using the wrong one causes incorrect results?
Those are exactly the right questions.
The short answer is:
DISTINCTremoves duplicate rows from the resultGROUP BYcreates groups of rows, usually so you can aggregate or summarize them
That sounds simple, but the deeper difference matters a lot in real SQL work.
This guide explains SQL DISTINCT vs GROUP BY clearly, including:
- what each one does
- when they return similar results
- when they do not
- when aggregate functions change everything
- how to think about query intent
- and the common mistakes that cause confusion
Why this topic matters
A lot of SQL problems come from using something that happens to work instead of using something that matches the actual intent of the query.
That is especially true with DISTINCT and GROUP BY.
For example:
- someone wants unique values and uses
GROUP BYeven though no grouping logic is really needed - someone wants counts by category and reaches for
DISTINCTwhen aggregation is required - someone uses
DISTINCTas a bandage over a bad join - someone writes a grouped query without understanding why non-aggregated columns are restricted
- someone thinks the two are always interchangeable and gets confused when aggregates are added
The problem is not only syntax. It is query meaning.
If you understand the conceptual difference properly, your SQL becomes:
- clearer
- easier to maintain
- less error-prone
- and often easier to optimize
The most important rule
Before going deeper, remember this:
Use DISTINCT when you want unique result rows. Use GROUP BY when you want one result row per group, especially when you need summaries or aggregates for each group.
That is the most practical decision rule.
If your real question is:
- which unique countries exist?
- which unique email values appear?
- which unique customer and city pairs exist?
then DISTINCT is usually the clearest fit.
If your real question is:
- how many users are in each country?
- what is total revenue per customer?
- what is the average salary by department?
then GROUP BY is the right tool.
That difference in intent matters more than small syntax similarities.
What DISTINCT does
DISTINCT removes duplicate rows from the result set.
That means it looks at the selected columns and keeps only unique combinations.
Example:
SELECT DISTINCT country
FROM users;
If the users table contains:
| user_id | country |
|---|---|
| 1 | South Africa |
| 2 | South Africa |
| 3 | Namibia |
| 4 | Botswana |
| 5 | Namibia |
then the result becomes:
| country |
|---|
| South Africa |
| Namibia |
| Botswana |
DISTINCT does not summarize anything.
It does not count anything.
It simply removes duplicate result rows.
That is its main job.
What GROUP BY does
GROUP BY creates groups of rows based on one or more columns.
Once the groups exist, SQL can perform aggregate calculations on each group.
Example:
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country;
With the same input table, the result becomes:
| country | user_count |
|---|---|
| South Africa | 2 |
| Namibia | 2 |
| Botswana | 1 |
This is very different from DISTINCT.
Here, SQL is not just removing duplicates. It is:
- grouping rows by
country - then counting how many rows belong to each country
That is why GROUP BY is tied so closely to aggregates like:
COUNTSUMAVGMINMAX
Why DISTINCT and GROUP BY sometimes look the same
This is the source of most confusion.
Suppose you run:
SELECT DISTINCT country
FROM users;
and:
SELECT country
FROM users
GROUP BY country;
In many databases, these return the same rows.
That makes it look like the features are interchangeable.
But they are only similar in that narrow case because:
- you selected one column
- you did not aggregate anything
- and both commands ended up producing unique country values
The important difference is still there:
DISTINCT says:
Remove duplicate result rows.
GROUP BY says:
Group rows by this column.
Those are not the same intent, even if the output happens to match.
DISTINCT is about result uniqueness
A very useful way to think about DISTINCT is this:
DISTINCT acts on the output row shape.
It looks at the selected columns and removes duplicate result rows based on those selected values.
Example:
SELECT DISTINCT country, city
FROM users;
This returns unique country-city combinations.
It does not care how many users belong to each combination. It only cares whether the output row values repeat.
That is why DISTINCT is often the clearest tool when you want:
- a unique list
- a deduplicated set of output values
- or one row per unique selected combination
GROUP BY is about grouped row sets
A useful way to think about GROUP BY is this:
GROUP BY creates buckets of rows before aggregates are calculated.
For example:
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country;
Each country becomes its own group.
Then COUNT(*) runs inside each group.
That is why GROUP BY is essential when your query involves:
- per-customer totals
- per-country counts
- per-month revenue
- average salary by department
- product sales by category
This grouped calculation behavior is the real purpose of GROUP BY.
DISTINCT with multiple columns
DISTINCT works on the full selected row combination, not on each column independently.
Example:
SELECT DISTINCT country, city
FROM users;
Suppose the data is:
| user_id | country | city |
|---|---|---|
| 1 | South Africa | Cape Town |
| 2 | South Africa | Cape Town |
| 3 | South Africa | Durban |
| 4 | Namibia | Windhoek |
The result becomes:
| country | city |
|---|---|
| South Africa | Cape Town |
| South Africa | Durban |
| Namibia | Windhoek |
This is because DISTINCT removes duplicate rows based on the combination of:
countrycity
It does not separately deduplicate each column on its own.
That is a very important detail.
GROUP BY with multiple columns
GROUP BY can also group by combinations.
Example:
SELECT
country,
city,
COUNT(*) AS user_count
FROM users
GROUP BY country, city;
This groups rows by each unique country-city pair, then counts how many rows are in each pair.
So again, the surface result might look similar to DISTINCT in some cases, but the purpose is different:
DISTINCTreturns unique combinationsGROUP BYcreates groups so you can calculate something about each combination
The clearest practical difference
A very simple way to separate them is this:
DISTINCT answers:
Which unique result rows exist?
GROUP BY answers:
What does each group of rows look like when summarized?
That difference becomes obvious the moment you add aggregate functions.
DISTINCT does not replace GROUP BY for aggregates
If you want counts, totals, averages, minimums, or maximums per category, GROUP BY is the correct tool.
Example:
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
This is grouped reporting.
Trying to think of this as a DISTINCT problem would be incorrect because the real goal is not:
- remove duplicates
The real goal is:
- create one row per department
- compute a summary for each department
That is exactly what GROUP BY is for.
DISTINCT can work with aggregates, but differently
You can use DISTINCT inside certain aggregate functions.
Example:
SELECT COUNT(DISTINCT customer_id)
FROM orders;
This answers:
- how many unique customers placed orders?
That is not the same as GROUP BY. It is a different use of distinctness.
Another example:
SELECT SUM(DISTINCT amount)
FROM payments;
This means:
- sum only distinct amount values
That can be useful in some cases, though it is less common.
The important point is:
DISTINCTcan appear at the SELECT-row level- or inside aggregate functions
- but it is still conceptually about distinct values, not group summaries
GROUP BY usually becomes necessary when you need one row per entity with calculations
Imagine this question:
- how many orders does each customer have?
Correct query:
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
This is clearly a grouping problem.
Now imagine this question:
- which customer IDs appear at least once in orders?
Possible query:
SELECT DISTINCT customer_id
FROM orders;
This is clearly a distinct-result problem.
That difference is the heart of the topic.
GROUP BY without aggregates
You can write GROUP BY without aggregate functions.
Example:
SELECT country
FROM users
GROUP BY country;
This often returns the same result as:
SELECT DISTINCT country
FROM users;
But even though this is valid in many systems, it is usually less expressive if your goal is only to remove duplicates.
In most cases:
- use
DISTINCTwhen you mean “unique values” - use
GROUP BYwhen you mean “group and summarize”
That makes the query easier for humans to understand.
When DISTINCT is usually the better choice
Use DISTINCT when the main intent is one of these:
- get unique countries
- get unique user emails
- get unique product-category pairs
- remove duplicate result rows from a final output
- create a deduplicated list for export or display
- count distinct values inside an aggregate
Examples:
Unique countries
SELECT DISTINCT country
FROM users;
Unique email list
SELECT DISTINCT email
FROM customers;
Count unique customers who ordered
SELECT COUNT(DISTINCT customer_id)
FROM orders;
These are all clearly distinctness tasks, not grouping tasks.
When GROUP BY is usually the better choice
Use GROUP BY when the main intent is one of these:
- count rows per category
- total revenue per customer
- average score by class
- minimum price by category
- maximum order value per month
- grouped summaries for dashboards and reports
Examples:
Orders per customer
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Revenue by month
SELECT
order_month,
SUM(total_amount) AS total_revenue
FROM monthly_orders
GROUP BY order_month;
Average salary by department
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
These are all summary-by-group problems.
DISTINCT is not a fix for a bad JOIN
This is one of the most important practical warnings.
Many developers use DISTINCT to hide duplication caused by a bad join.
Example pattern:
SELECT DISTINCT
c.customer_id,
c.customer_name
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id;
This may work if the real intention is:
- customers who have at least one order
But it can also be a sign that the join is multiplying rows and the developer is using DISTINCT to cover the symptom.
That is dangerous because:
- it can hide logic problems
- it can hide incorrect row multiplication
- it can add extra work
- and it may return the right result for the wrong reason
So if you find yourself adding DISTINCT after a join, ask:
- do I truly need unique output rows?
- or am I hiding a join shape problem?
That question matters a lot in real SQL debugging.
A better pattern than DISTINCT after a join
Sometimes a better query is EXISTS.
Instead of:
SELECT DISTINCT
c.customer_id,
c.customer_name
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id;
you can often write:
SELECT
c.customer_id,
c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
This is often clearer when your intention is:
- customers that have at least one related row
That is not directly a DISTINCT vs GROUP BY issue, but it is a very important real-world DISTINCT lesson.
DISTINCT versus GROUP BY for readability
In practice, readability matters a lot.
If the goal is:
- give me unique values
then DISTINCT usually reads better.
If the goal is:
- summarize per group
then GROUP BY usually reads better.
That means these two queries, even if they return the same rows, do not communicate the same intent:
SELECT DISTINCT country
FROM users;
versus
SELECT country
FROM users
GROUP BY country;
The first clearly says:
- unique countries
The second implies:
- countries as groups
That is why using the right one makes your SQL easier to understand later.
Performance: DISTINCT vs GROUP BY
This is one of the most common questions, but it needs a careful answer.
The honest answer is:
Neither is automatically faster in every situation.
Performance depends on:
- the database engine
- indexes
- query plan
- row counts
- selected columns
- joins
- sorts
- memory usage
- and how the optimizer handles the query
In some cases, the engine may produce very similar plans for:
SELECT DISTINCT column_name- and
SELECT column_name GROUP BY column_name
In other cases, the plans may differ.
That is why performance should not be the first reason to choose one over the other.
A much better rule is:
- choose the one that matches intent
- then measure if performance matters
That keeps the query semantically clear first.
DISTINCT and GROUP BY with ORDER BY
Both can be followed by ORDER BY.
DISTINCT example
SELECT DISTINCT country
FROM users
ORDER BY country;
GROUP BY example
SELECT country
FROM users
GROUP BY country
ORDER BY country;
Again, these may return the same result in simple cases.
But once aggregates appear, the difference becomes obvious:
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;
That is clearly a grouped summary query, not a distinct-list query.
DISTINCT with COUNT versus GROUP BY with COUNT
These are often confused, but they answer different questions.
Count distinct customers overall
SELECT COUNT(DISTINCT customer_id)
FROM orders;
This answers:
- how many unique customers placed an order?
Count orders per customer
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
This answers:
- how many orders did each customer place?
These are completely different results.
That is one of the easiest ways to see the conceptual difference between distinctness and grouping.
GROUP BY requires discipline with selected columns
One of the big reasons GROUP BY feels different from DISTINCT is that grouped queries follow stricter rules about selected columns.
For example, this is usually fine:
SELECT DISTINCT country, city
FROM users;
But in grouped queries, selected columns normally must either:
- be included in
GROUP BY - or be aggregated
Example:
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country;
This works.
But this usually does not:
SELECT
country,
city,
COUNT(*) AS user_count
FROM users
GROUP BY country;
Why?
Because city is:
- not grouped
- not aggregated
SQL does not know which city value should represent the country group.
This is one of the reasons GROUP BY is not just “another way to do DISTINCT.”
It follows a grouping model, not a deduplication model.
Common mistakes with DISTINCT
There are a few distinct-related mistakes that show up often.
1. Using DISTINCT to hide bad joins
This is the biggest one. It can produce results that look correct while masking incorrect row multiplication.
2. Assuming DISTINCT applies to each column separately
It does not. It applies to the combination of selected columns.
3. Adding DISTINCT automatically to “clean” results
That can add unnecessary work and make intent less clear.
4. Using DISTINCT when GROUP BY with aggregates was really needed
This usually happens when someone wants summary rows, not just unique rows.
Common mistakes with GROUP BY
There are also a few classic GROUP BY mistakes.
1. Using GROUP BY when DISTINCT is clearer
This is not always wrong, but it can make the query intent less obvious.
2. Forgetting that grouped queries need aggregates or grouped columns
This leads to invalid queries or confusing logic.
3. Grouping by too many columns
This can accidentally break a summary into overly detailed results.
4. Using GROUP BY when the real problem is duplicate input data
Sometimes people group because the underlying data is messy, not because the business question is grouped reporting.
That should be diagnosed carefully.
A practical rule for developers and analysts
If you want one simple rule to use in everyday work, use this:
Use DISTINCT when:
- you want unique output rows
- you are deduplicating selected values
- you are counting distinct values
Use GROUP BY when:
- you want one output row per group
- you need per-group summaries
- you are using aggregate functions like COUNT, SUM, AVG, MIN, or MAX
That rule covers most real-world usage.
Practical examples side by side
Example 1: unique countries
SELECT DISTINCT country
FROM users;
Best fit:
DISTINCT
Why:
- you just want unique values
Example 2: user count by country
SELECT
country,
COUNT(*) AS user_count
FROM users
GROUP BY country;
Best fit:
GROUP BY
Why:
- you want one row per country with a summary
Example 3: unique customer-city pairs
SELECT DISTINCT customer_id, city
FROM customers;
Best fit:
DISTINCT
Why:
- you want unique result combinations
Example 4: revenue by customer and city
SELECT
customer_id,
city,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY customer_id, city;
Best fit:
GROUP BY
Why:
- you want summarized values per grouped combination
Example 5: how many unique customers ordered?
SELECT COUNT(DISTINCT customer_id)
FROM orders;
Best fit:
DISTINCTinside aggregate
Why:
- one overall number of unique customers
Example 6: how many orders per customer?
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Best fit:
GROUP BY
Why:
- one row per customer with order counts
FAQ
What is the difference between DISTINCT and GROUP BY in SQL?
DISTINCT removes duplicate rows from a result set, while GROUP BY groups rows together so you can summarize them, often with aggregate functions like COUNT, SUM, or AVG.
Can DISTINCT and GROUP BY return the same result?
Yes, in some simple cases they can return the same rows, especially when you are only trying to get unique combinations of selected columns. But the intent and use cases are still different.
Is DISTINCT faster than GROUP BY?
Not always. Performance depends on the database engine, the query plan, indexes, and the size of the data. You should choose based on intent first and then measure if performance matters.
Should I use GROUP BY without aggregate functions?
You can, and it may return results similar to DISTINCT in some cases, but it is usually clearer to use DISTINCT when your real goal is only to remove duplicates.
Final thoughts
DISTINCT and GROUP BY are related, but they are not the same thing.
They may sometimes produce the same-looking output, but the reason they exist is different:
DISTINCTis for unique result rowsGROUP BYis for grouped row sets and summaries
That distinction becomes especially important when:
- aggregate functions are involved
- joins create duplication
- reporting logic gets more complex
- or query readability matters
If you want one final practical rule, use this:
- if your main goal is uniqueness, use
DISTINCT - if your main goal is summarization by category, use
GROUP BY
That choice usually leads to clearer SQL, more accurate intent, and fewer mistakes in real queries.