SQL UNION vs UNION ALL
Level: intermediate · ~17 min read · Intent: informational
Audience: backend developers, data analysts, data engineers, technical teams, software engineers, students
Prerequisites
- basic familiarity with databases
- basic understanding of SELECT statements
Key takeaways
- UNION combines result sets and removes duplicate rows, while UNION ALL combines result sets and keeps every row, including duplicates.
- UNION ALL is usually faster and should be the default when you do not explicitly need duplicate removal, because UNION adds extra work to deduplicate the combined result.
FAQ
- What is the difference between UNION and UNION ALL in SQL?
- UNION removes duplicate rows from the combined result, while UNION ALL keeps all rows, including duplicates.
- Is UNION ALL faster than UNION?
- Often yes. UNION ALL is usually faster because it does not need to do the extra work required to identify and remove duplicate rows.
- When should I use UNION instead of UNION ALL?
- Use UNION when duplicate rows across the combined result should appear only once. Use UNION ALL when every row should be preserved or when you know duplicates are not a problem.
- Do UNION and UNION ALL require the same columns?
- Yes. The SELECT statements being combined must return the same number of columns in the same logical order, and the data types should be compatible.
SQL UNION and UNION ALL are two of the most useful set operators because they let you combine the results of multiple queries into one result set.
That matters because real SQL work often involves questions like:
- combine current customers and archived customers
- merge active products and draft products into one report
- pull users from multiple sources into one export
- stack monthly summary queries together
- combine two filtered result sets into one output
- or build one dashboard table from multiple query branches
At first glance, UNION and UNION ALL look almost identical.
That is why people confuse them.
The syntax is very similar. The output often looks similar. But the difference is important because it affects:
- whether duplicates stay or disappear
- whether row counts change unexpectedly
- and whether the query does extra work that may hurt performance
This guide explains SQL UNION vs UNION ALL clearly, including:
- what each one does
- how duplicates are handled
- why performance differs
- what column rules must match
- how
ORDER BYworks with combined results - and how to choose the right one in real SQL work
Why UNION and UNION ALL matter
A lot of SQL queries read from one table. But many real problems require combining multiple result sets.
For example:
- one query finds paid orders
- another query finds refunded orders
- and the final report needs both in one combined output
Or:
- one table stores current employees
- another stores contractors
- and you want one people list
Or:
- one branch of the query handles online sales
- another handles retail sales
- and the report needs both in one result
This is exactly the kind of situation where UNION or UNION ALL becomes useful.
They are not join tools. They are result-stacking tools.
That distinction is very important.
The most important rule
Before anything else, remember this:
Use UNION ALL when you want to keep every row. Use UNION only when you explicitly want duplicate rows removed.
That is the single most important rule in this topic.
A lot of people use UNION by default because it sounds cleaner.
But that is often the wrong habit.
Why?
Because UNION does extra work to remove duplicates.
If you do not actually need that behavior, then UNION ALL is often the better and faster choice.
So the right question is not:
- which one sounds nicer?
The right question is:
- should duplicate rows be removed here or not?
That is the practical decision.
What UNION does
UNION combines the results of two or more SELECT queries and removes duplicate rows from the final combined result.
Example:
SELECT city
FROM customers
UNION
SELECT city
FROM suppliers;
This returns a single list of cities from both queries, but if the same city appears in both result sets, it appears only once in the final output.
That is the key behavior of UNION:
- combine
- then deduplicate
What UNION ALL does
UNION ALL also combines the results of two or more SELECT queries, but it keeps all rows exactly as they come in.
Example:
SELECT city
FROM customers
UNION ALL
SELECT city
FROM suppliers;
Now if the same city appears in both queries, it appears multiple times in the final result.
That is the key behavior of UNION ALL:
- combine
- keep everything
The simplest difference
A very simple summary is:
UNION
Combines results and removes duplicates.
UNION ALL
Combines results and keeps duplicates.
That is the whole core difference.
Everything else in this topic comes from that one distinction.
Why duplicate handling matters so much
A lot of SQL users think duplicates are always bad.
That is not true.
Sometimes duplicates are:
- expected
- meaningful
- and correct
For example:
- if two different systems both produced the same row values, maybe you want to keep both
- if two sales happened with identical displayed values, those are still two sales
- if two customers share the same city, that repetition is meaningful data
So the key question is not:
- do duplicates exist?
It is:
- are identical rows across these two result sets supposed to appear once or multiple times in the final result?
That is the decision UNION versus UNION ALL controls.
A basic example with names
Suppose query one returns:
| name |
|---|
| Alice |
| Bob |
And query two returns:
| name |
|---|
| Bob |
| Cara |
With UNION
Result:
| name |
|---|
| Alice |
| Bob |
| Cara |
Bob appears once.
With UNION ALL
Result:
| name |
|---|
| Alice |
| Bob |
| Bob |
| Cara |
Bob appears twice.
That is the easiest possible example of the difference.
Why UNION can change row counts in surprising ways
Because UNION removes duplicate rows, the final number of rows may be lower than the sum of the two input queries.
For example:
- first query returns 100 rows
- second query returns 100 rows
- final UNION result might return 170 rows instead of 200 if 30 rows were duplicates
This is important because it means:
- row count changes are part of UNION logic
- not necessarily a bug
So if you use UNION and your row count drops, the first question should be:
- were duplicates removed?
That is usually the explanation.
Why UNION ALL is often faster
UNION ALL is usually faster because it does not need to remove duplicates.
It simply stacks the results together.
UNION, on the other hand, must do extra work to identify duplicate rows across the combined result.
That usually means more processing, especially on:
- large result sets
- wide rows
- or queries already doing expensive work
This is why UNION ALL is often the better default when duplicate removal is not required.
A good practical rule is:
- do not pay for deduplication unless you actually need deduplication
That is one of the most important performance lessons here.
Why UNION is not always “better quality”
Some people assume:
- UNION removes duplicates
- therefore it must be the safer or cleaner choice
Not necessarily.
If duplicate rows are meaningful, then removing them is actually wrong.
For example, suppose one query returns:
- one order row
and another query returns:
- another different order row
If the selected columns happen to match exactly, UNION may collapse them into one visible row even though the business meaning says they were two separate events.
That is why you should not use UNION automatically.
It can change the meaning of the data.
UNION and UNION ALL are set operators, not join operators
This is one of the most important conceptual distinctions.
UNION and UNION ALL:
- stack rows vertically
JOIN:
- combines columns horizontally based on relationships
That means UNION is for:
- combining similar result sets
JOIN is for:
- relating tables by keys
If you confuse these jobs, queries become harder to reason about.
A good question is:
- do I need to add more columns to each row using a relationship? Use a join.
Or:
- do I need to append more rows from another compatible query? Use UNION or UNION ALL.
That distinction matters a lot.
Column compatibility rules
For UNION and UNION ALL to work, the SELECT queries must be compatible.
That usually means:
1. Same number of columns
Each SELECT must return the same count of columns.
2. Same column order
The first column from query one aligns with the first column from query two, and so on.
3. Compatible data types
The aligned columns should be type-compatible or convertible.
Example:
SELECT customer_id, customer_name
FROM customers
UNION ALL
SELECT supplier_id, supplier_name
FROM suppliers;
This works because both queries return:
- an ID-like column
- a name-like column
The column names do not have to be identical. The positions are what matter.
Why column order matters
This is easy to overlook.
Consider:
SELECT customer_id, customer_name
FROM customers
UNION ALL
SELECT supplier_name, supplier_id
FROM suppliers;
This is logically wrong if the intention was:
- ID then name
Because now the second query returns:
- name in position 1
- ID in position 2
The union aligns by column position, not by semantic label.
That is why result shape discipline matters. Always make sure the columns line up correctly.
Output column names usually come from the first query
Another practical detail is that the final result usually takes its output column names from the first SELECT.
Example:
SELECT customer_id AS entity_id, customer_name AS entity_name
FROM customers
UNION ALL
SELECT supplier_id, supplier_name
FROM suppliers;
The combined result will usually use:
entity_identity_name
from the first query branch.
That is why the first SELECT is often where people define the final presentation names.
ORDER BY with UNION and UNION ALL
If you want to sort the combined result, the ORDER BY usually applies to the entire final unioned output, not to each query separately.
Example:
SELECT customer_name AS name
FROM customers
UNION ALL
SELECT supplier_name AS name
FROM suppliers
ORDER BY name;
This sorts the final combined result by name.
That is the standard pattern.
If you put ordering ideas inside individual branches casually, that usually does not mean the final combined result will stay ordered the way you expect.
So a good rule is:
- if the final merged result needs sorting, apply ORDER BY once at the end
Can you use ORDER BY in each branch?
Some databases allow branch-level ordering only when combined with subqueries or specific constructs, but for ordinary UNION-style result building, what matters most is:
- the final ORDER BY on the full result
That is the safest mental model.
So in practice, think:
- combine first
- sort last
That is the cleanest way to reason about unioned results.
UNION can hide data issues if used carelessly
Because UNION removes duplicates, it can sometimes hide problems you should have noticed.
For example:
- accidental overlap between sources
- duplicate feed rows
- repeated ingestion
- faulty joins producing identical output rows
- or business records that should have been examined instead of silently collapsed
That is why UNION should be used intentionally.
It is not just a harmless convenience feature.
If you remove duplicates, you are making a semantic decision about the data.
That deserves deliberate thought.
UNION ALL is often better for auditability
If you are combining sources and want to see exactly what each source contributed, UNION ALL is often better because it preserves:
- every row
- every duplication
- every repeated appearance
This can be useful in:
- audits
- reconciliation
- data validation
- integration troubleshooting
- and pipeline debugging
You can always deduplicate later if needed. But once a UNION collapses rows, that detail is gone from the result.
This is one reason many engineers prefer to start with UNION ALL during debugging and only move to UNION when the business rule clearly requires deduplication.
Practical use case: active and archived data
A very common use case is combining current and archived tables.
Example:
SELECT order_id, customer_id, total_amount, created_at
FROM current_orders
UNION ALL
SELECT order_id, customer_id, total_amount, created_at
FROM archived_orders;
This is often a good UNION ALL use case because:
- current and archived rows are both real rows
- duplicates are usually not expected if the archival process is clean
- and if duplicates do appear, you may actually want to know
Using plain UNION here could mask problems in the archival process.
That is a good example of why the choice matters.
Practical use case: unique category list across sources
Now consider this:
SELECT category
FROM products
UNION
SELECT category
FROM services;
Here, UNION may be the correct choice because the real question is:
- what unique categories exist across both sources?
If Consulting appears in both queries, the final answer may indeed need it only once.
This is a classic good fit for UNION.
UNION ALL for event-style data
If you are combining:
- events from system A
- and events from system B
then UNION ALL is often more appropriate because each event row is meaningful, even if some displayed fields match.
Example:
SELECT event_time, event_type, user_id
FROM app_events
UNION ALL
SELECT event_time, event_type, user_id
FROM web_events;
These are usually separate event streams. If identical-looking rows occur, they may still represent two different real events.
That makes UNION ALL the safer and more semantically correct choice in many event scenarios.
UNION is often better for distinct lookup-style outputs
UNION is often more appropriate when the output itself is meant to behave like a unique set.
Examples:
- unique city list
- unique email domains
- unique tags
- unique status names
- unique product categories
- unique region names across two sources
These are set-style outputs where duplicate removal is part of the goal.
That is why the word “union” fits so naturally in set-like reporting problems.
Common mistake: using UNION when UNION ALL is the real intent
This is probably the most common mistake.
People often write UNION simply because they know it, not because they need duplicate removal.
That causes two problems:
- extra work
- possible silent row loss
A better default habit is:
- start by asking whether duplicates should remain
- if yes or maybe yes, use
UNION ALL - only use
UNIONwhen duplicate removal is explicitly correct
This is the single most practical usage habit for this topic.
Common mistake: assuming duplicates are compared by only one column
UNION removes duplicate rows, not duplicate values in just one column.
That means all selected columns matter together.
Example:
Query result one:
| name | city |
|---|---|
| Alice | Cape Town |
Query result two:
| name | city |
|---|---|
| Alice | Durban |
These are not duplicates as full rows.
So UNION keeps both.
This is important because duplicate removal applies to the whole row shape of the combined SELECT result.
Common mistake: mismatched column meaning
A UNION query may run successfully even when the columns are semantically mismatched.
Example:
SELECT customer_id, country
FROM customers
UNION ALL
SELECT supplier_name, supplier_id
FROM suppliers;
This might be syntactically allowed if types are compatible enough, but it is logically nonsense.
That is why column alignment is not only about type compatibility. It is also about meaning compatibility.
Strong UNION queries combine:
- similar columns
- in the same positions
- with a shared logical purpose
UNION ALL in ETL and data pipelines
In ETL and data engineering, UNION ALL is extremely common because data pipelines often need to:
- stack source partitions
- combine daily loads
- merge feeds from multiple systems
- or append snapshots before later deduplication or validation
This is another reason UNION ALL often appears more frequently in performance-sensitive or engineering-heavy SQL work.
It preserves the raw truth of the inputs, which is often what pipelines need first.
If you need deduplication later, do it deliberately
Sometimes the best design is:
- first use
UNION ALL - then explicitly deduplicate in a later step if needed
For example, you may want to:
- combine all source rows
- inspect duplicates
- label source system
- apply rules
- then decide what counts as a true duplicate
That is often better than using UNION too early and silently collapsing rows before you fully understand the overlap.
This is especially valuable in data integration work.
Performance summary: why UNION ALL usually wins
If both result sets are already logically compatible and you do not need duplicate removal, UNION ALL is usually better because it:
- does less work
- preserves row truth
- avoids deduplication overhead
- and is often easier to reason about in data pipelines
UNION is still the right choice when uniqueness is the actual business requirement.
But it should be used because you want deduplication, not by habit.
That is the most practical performance lesson here.
Practical examples
Example 1: unique city list
SELECT city
FROM customers
UNION
SELECT city
FROM suppliers;
Use this when:
- you want one deduplicated list of cities
Example 2: combine all events
SELECT event_time, event_type, user_id
FROM app_events
UNION ALL
SELECT event_time, event_type, user_id
FROM web_events;
Use this when:
- every event row matters
- and duplicates should not be removed automatically
Example 3: combine active and archived users
SELECT user_id, email, created_at
FROM active_users
UNION ALL
SELECT user_id, email, created_at
FROM archived_users;
Use this when:
- you want the full combined history
- and deduplication is not the immediate goal
Example 4: unique product categories across tables
SELECT category AS category_name
FROM products
UNION
SELECT category_name
FROM services;
Use this when:
- the output should be one distinct category list
Example 5: performance-conscious stacking of monthly results
SELECT 'January' AS month_name, total_revenue
FROM january_summary
UNION ALL
SELECT 'February' AS month_name, total_revenue
FROM february_summary
UNION ALL
SELECT 'March' AS month_name, total_revenue
FROM march_summary;
Use this when:
- each monthly summary row is meaningful
- and no deduplication is needed
FAQ
What is the difference between UNION and UNION ALL in SQL?
UNION removes duplicate rows from the combined result, while UNION ALL keeps all rows, including duplicates.
Is UNION ALL faster than UNION?
Often yes. UNION ALL is usually faster because it does not need to do the extra work required to identify and remove duplicate rows.
When should I use UNION instead of UNION ALL?
Use UNION when duplicate rows across the combined result should appear only once. Use UNION ALL when every row should be preserved or when you know duplicates are not a problem.
Do UNION and UNION ALL require the same columns?
Yes. The SELECT statements being combined must return the same number of columns in the same logical order, and the data types should be compatible.
Final thoughts
SQL UNION and UNION ALL are simple once you focus on the one difference that matters most:
UNIONremoves duplicate rowsUNION ALLkeeps everything
That sounds small, but it affects:
- correctness
- row counts
- performance
- and the meaning of the final result
The strongest practical habit is:
- use
UNION ALLby default when duplicate removal is not explicitly needed - use
UNIONonly when one deduplicated combined result is the actual business goal
If you keep that distinction clear, you will avoid a lot of accidental performance cost and a lot of silent result-shaping mistakes in real SQL work.