UNION ALL

Learning Focus

Use this lesson to understand UNION ALL with practical syntax and examples.

Concept Overview

UNION ALL is a set operator that stacks rows vertically by combining the results of multiple SELECT statements.

UNION ALL keeps duplicates (fast).
UNION removes duplicates (slower because it must deduplicate).

This is different from joins:

Joins combine related tables horizontally (more columns per row).
Unions combine compatible result sets vertically (more rows).

Why is it important?

Reporting: combine data from multiple partitions/tables (monthly tables, regional tables)
Log analysis: append multiple sources while keeping all entries
Performance: when you do not need deduplication, UNION ALL avoids sorting/hashing

Where does it fit?

UNION ALL is part of SQL set operators (along with UNION, and in PostgreSQL also INTERSECT and EXCEPT). You typically use it with SELECT queries and sometimes inside views.

Syntax & Rules

Core Syntax

SELECT col1, col2
FROM t1
UNION ALL
SELECT col1, col2
FROM t2;

Available Options / Parameters

Rule	What it means	Notes
Same column count	each SELECT must return same number of columns	otherwise error
Type compatibility	corresponding columns must be castable to a common type	explicit casts often help
Column names	output column names come from the first SELECT	aliases in later SELECTs do not rename outputs
ORDER BY placement	ORDER BY applies to the full union	only once, at the end
LIMIT placement	LIMIT applies to the full union	use parentheses/CTEs if you need per-branch limits

Key Rules and Considerations

Use UNION ALL by default unless you explicitly need deduplication.
If you need to tag which branch a row came from, add a literal column (for example, '2024' AS source_year).
Always define ordering at the end if you need stable output.

Step-by-Step Examples

Example 1: Combine Two Tables (Duplicates Kept) (Beginner)

CREATE TABLE customers_usa (
  customer_id bigint PRIMARY KEY,
  full_name   text NOT NULL
);

CREATE TABLE customers_canada (
  customer_id bigint PRIMARY KEY,
  full_name   text NOT NULL
);

INSERT INTO customers_usa (customer_id, full_name) VALUES
  (1, 'Alice'),
  (2, 'Bob');

INSERT INTO customers_canada (customer_id, full_name) VALUES
  (3, 'Claire'),
  (4, 'Bob');

SELECT full_name
FROM customers_usa
UNION ALL
SELECT full_name
FROM customers_canada
ORDER BY full_name;

Expected output:

 full_name
----------
 Alice
 Bob
 Bob
 Claire
(4 rows)

Explanation:

Bob appears twice because duplicates are not removed.

Example 2: Add a Source Column (Intermediate)

SELECT full_name, 'USA' AS country
FROM customers_usa
UNION ALL
SELECT full_name, 'Canada' AS country
FROM customers_canada
ORDER BY full_name, country;

Expected output:

 full_name | country
----------+---------
 Alice     | USA
 Bob       | Canada
 Bob       | USA
 Claire    | Canada
(4 rows)

Explanation:

Adding a literal column is the simplest way to label provenance.

Example 3: UNION vs UNION ALL (Deduplication) (Intermediate)

SELECT full_name
FROM customers_usa
UNION
SELECT full_name
FROM customers_canada
ORDER BY full_name;

Expected output:

 full_name
----------
 Alice
 Bob
 Claire
(3 rows)

Explanation:

UNION deduplicates, so Bob appears once.

Example 4: ORDER BY and LIMIT Apply to the Whole Union (Advanced)

SELECT full_name
FROM customers_usa
UNION ALL
SELECT full_name
FROM customers_canada
ORDER BY full_name
LIMIT 2;

Expected output:

 full_name
----------
 Alice
 Bob
(2 rows)

Explanation:

The ORDER BY and LIMIT apply to the combined output.

Example 5: Type Alignment with CAST (Advanced)

CREATE TABLE events_2025 (
  event_id bigint PRIMARY KEY,
  occurred_on date NOT NULL
);

CREATE TABLE events_2026 (
  event_id bigint PRIMARY KEY,
  occurred_at timestamptz NOT NULL
);

INSERT INTO events_2025 (event_id, occurred_on) VALUES
  (1, '2025-12-31');

INSERT INTO events_2026 (event_id, occurred_at) VALUES
  (2, '2026-01-01 00:00:00+00');

-- Align both branches to timestamptz
SELECT event_id, occurred_on::timestamptz AS occurred_at
FROM events_2025
UNION ALL
SELECT event_id, occurred_at
FROM events_2026
ORDER BY event_id;

Expected output:

 event_id | occurred_at
----------+------------------------
 1        | 2025-12-31 00:00:00+00
 2        | 2026-01-01 00:00:00+00
(2 rows)

Explanation:

Explicit casting makes type compatibility clear and prevents surprises.

Practical Use Cases

1) Merge partitioned tables for reporting

SELECT * FROM sales_2025
UNION ALL
SELECT * FROM sales_2026;

2) Combine logs from multiple systems

SELECT occurred_at, message, 'api' AS source
FROM api_logs
UNION ALL
SELECT occurred_at, message, 'worker' AS source
FROM worker_logs;

3) Build a multi-region customer view

SELECT customer_id, full_name, 'US' AS region FROM customers_us
UNION ALL
SELECT customer_id, full_name, 'EU' AS region FROM customers_eu;

4) Append staging data for validation

SELECT * FROM canonical_orders
UNION ALL
SELECT * FROM staging_orders;

5) Compare UNION ALL and UNION costs

-- Use UNION only when dedup is required.
SELECT email FROM mailing_list_a
UNION
SELECT email FROM mailing_list_b;

Common Mistakes & Troubleshooting

1) Mismatched column counts

Wrong SQL:

SELECT customer_id, full_name
FROM customers_usa
UNION ALL
SELECT full_name
FROM customers_canada;

Typical error:

ERROR:  each UNION query must have the same number of columns

Fix:

SELECT customer_id, full_name
FROM customers_usa
UNION ALL
SELECT customer_id, full_name
FROM customers_canada;

2) Type mismatch across branches

Wrong SQL:

SELECT occurred_on
FROM events_2025
UNION ALL
SELECT occurred_at
FROM events_2026;

Bad outcome:

PostgreSQL may error or pick an unexpected common type.

Fix:

SELECT occurred_on::timestamptz AS occurred_at FROM events_2025
UNION ALL
SELECT occurred_at FROM events_2026;

3) Putting ORDER BY in the middle

Wrong SQL:

SELECT full_name FROM customers_usa ORDER BY full_name
UNION ALL
SELECT full_name FROM customers_canada;

Typical error:

ERROR:  syntax error at or near "UNION"

Fix:

SELECT full_name FROM customers_usa
UNION ALL
SELECT full_name FROM customers_canada
ORDER BY full_name;

4) Expecting UNION ALL to remove duplicates

Wrong expectation:

assuming duplicates are removed automatically.

Fix:

Use UNION when you need deduplication.

Debugging checklist:

Verify every SELECT has the same number of columns.
Check corresponding column types and add explicit casts.
Ensure ORDER BY appears only once at the end.
Decide explicitly between UNION ALL (keep duplicates) and UNION (dedupe).
If performance is slow, check whether UNION is forcing a large dedupe operation.

Best Practices

✅ Use UNION ALL when duplicates are acceptable. ❌ Avoid UNION by default; dedupe has a cost.
✅ Add provenance columns ('source' AS origin) when merging data. ❌ Avoid ambiguous combined outputs with no source context.
✅ Align types explicitly with casts. ❌ Avoid relying on implicit coercions across branches.
✅ Apply ORDER BY and LIMIT once at the end. ❌ Avoid trying to order individual branches without parentheses/CTEs.
✅ Consider a view for repeated union logic. ❌ Avoid copy-pasting large unions across many code paths.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE employees_2023 (employee_id integer, full_name text);
CREATE TABLE employees_2024 (employee_id integer, full_name text);

INSERT INTO employees_2023 VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO employees_2024 VALUES (3, 'Charlie'), (2, 'Bob');

Exercise 1 (Easy): Combine all names

Task: Combine employee names from both years using UNION ALL.

-- Your SQL here

Solution:

SELECT full_name FROM employees_2023
UNION ALL
SELECT full_name FROM employees_2024;

Exercise 2 (Medium): Add a source year column

Task: Return full_name and year for both tables.

-- Your SQL here

Solution:

SELECT full_name, '2023' AS year
FROM employees_2023
UNION ALL
SELECT full_name, '2024' AS year
FROM employees_2024
ORDER BY full_name, year;

Exercise 3 (Advanced): Deduplicate with UNION

Task: Return distinct names across both years.

-- Your SQL here

Solution:

SELECT full_name FROM employees_2023
UNION
SELECT full_name FROM employees_2024
ORDER BY full_name;

Connection to Other Concepts

Concept	Why it matters
`UNION`	same shape as UNION ALL but deduplicates
`JOIN`	combines columns horizontally (different from union)
`ORDER BY`	applies to the full union result set
Data types	unions require compatible types across branches
Views	often used to encapsulate repeated union logic

Visual Learning Diagram

flowchart TD
  A[SELECT 1] --> B[UNION ALL]
  C[SELECT 2] --> B
  B --> D[Rows Stacked]
  B --> E[Duplicates Kept]
  B --> F[ORDER BY at End]
  B --> G[Type Alignment]
  E --> H[Fast Path]
  B --> I[UNION (Dedup)]

  classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
  classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
  class A,B,C,D,E,F,G,H,I allNodes
  class B highlight

Common Pitfalls

Pitfall	Consequence	Prevention
Different column counts	query error	ensure each branch returns same number of columns
Incompatible types	errors or unexpected casts	cast explicitly to a common type
ORDER BY not at the end	syntax error	place ORDER BY after the final SELECT
Using UNION when you don't need dedupe	slower queries	use UNION ALL by default
Losing provenance	hard to debug results	add a source column

Quick Reference

SELECT a FROM t1 UNION ALL SELECT a FROM t2;
SELECT a FROM t1 UNION SELECT a FROM t2; -- dedupe
SELECT a, 'src1' AS src FROM t1 UNION ALL SELECT a, 'src2' FROM t2;
SELECT a::text FROM t1 UNION ALL SELECT b::text FROM t2; -- align types
SELECT * FROM (SELECT ... UNION ALL SELECT ...) u ORDER BY ... LIMIT ...;

What's Next

Previous: SELF JOIN - Review the previous lesson to reinforce context.
Module Overview - Return to this module index and choose another related lesson.

Concept Overview​

Why is it important?​

Where does it fit?​

Syntax & Rules​

Core Syntax​

Available Options / Parameters​

Key Rules and Considerations​

Step-by-Step Examples​

Example 1: Combine Two Tables (Duplicates Kept) (Beginner)​

Example 2: Add a Source Column (Intermediate)​

Example 3: UNION vs UNION ALL (Deduplication) (Intermediate)​

Example 4: ORDER BY and LIMIT Apply to the Whole Union (Advanced)​

Example 5: Type Alignment with CAST (Advanced)​

Practical Use Cases​

1) Merge partitioned tables for reporting​

2) Combine logs from multiple systems​

3) Build a multi-region customer view​

4) Append staging data for validation​

5) Compare UNION ALL and UNION costs​

Common Mistakes & Troubleshooting​

1) Mismatched column counts​

2) Type mismatch across branches​

3) Putting ORDER BY in the middle​

4) Expecting UNION ALL to remove duplicates​

Best Practices​

Hands-On Practice​

Exercise 1 (Easy): Combine all names​

Exercise 2 (Medium): Add a source year column​

Exercise 3 (Advanced): Deduplicate with UNION​

Connection to Other Concepts​

Visual Learning Diagram​

Common Pitfalls​

Quick Reference​

What's Next​

Concept Overview

Why is it important?

Where does it fit?

Syntax & Rules

Core Syntax

Available Options / Parameters

Key Rules and Considerations

Step-by-Step Examples

Example 1: Combine Two Tables (Duplicates Kept) (Beginner)

Example 2: Add a Source Column (Intermediate)

Example 3: UNION vs UNION ALL (Deduplication) (Intermediate)

Example 4: ORDER BY and LIMIT Apply to the Whole Union (Advanced)

Example 5: Type Alignment with CAST (Advanced)

Practical Use Cases

1) Merge partitioned tables for reporting

2) Combine logs from multiple systems

3) Build a multi-region customer view

4) Append staging data for validation

5) Compare UNION ALL and UNION costs

Common Mistakes & Troubleshooting

1) Mismatched column counts

2) Type mismatch across branches

3) Putting ORDER BY in the middle

4) Expecting UNION ALL to remove duplicates

Best Practices

Hands-On Practice

Exercise 1 (Easy): Combine all names

Exercise 2 (Medium): Add a source year column

Exercise 3 (Advanced): Deduplicate with UNION

Connection to Other Concepts

Visual Learning Diagram

Common Pitfalls

Quick Reference

What's Next