Skip to main content

UNION ALL

Learning Focus

Use this lesson to understand UNION ALL with practical syntax and examples.

Concept Overview

UNION ALL is a set operator that stacks rows vertically by combining the results of multiple SELECT statements.

  • UNION ALL keeps duplicates (fast).
  • UNION removes duplicates (slower because it must deduplicate).

This is different from joins:

  • Joins combine related tables horizontally (more columns per row).
  • Unions combine compatible result sets vertically (more rows).

Why is it important?

  • Reporting: combine data from multiple partitions/tables (monthly tables, regional tables)
  • Log analysis: append multiple sources while keeping all entries
  • Performance: when you do not need deduplication, UNION ALL avoids sorting/hashing

Where does it fit?

UNION ALL is part of SQL set operators (along with UNION, and in PostgreSQL also INTERSECT and EXCEPT). You typically use it with SELECT queries and sometimes inside views.


Syntax & Rules

Core Syntax

SELECT col1, col2
FROM t1
UNION ALL
SELECT col1, col2
FROM t2;

Available Options / Parameters

RuleWhat it meansNotes
Same column counteach SELECT must return same number of columnsotherwise error
Type compatibilitycorresponding columns must be castable to a common typeexplicit casts often help
Column namesoutput column names come from the first SELECTaliases in later SELECTs do not rename outputs
ORDER BY placementORDER BY applies to the full uniononly once, at the end
LIMIT placementLIMIT applies to the full unionuse parentheses/CTEs if you need per-branch limits

Key Rules and Considerations

  • Use UNION ALL by default unless you explicitly need deduplication.
  • If you need to tag which branch a row came from, add a literal column (for example, '2024' AS source_year).
  • Always define ordering at the end if you need stable output.

Step-by-Step Examples

Example 1: Combine Two Tables (Duplicates Kept) (Beginner)

CREATE TABLE customers_usa (
customer_id bigint PRIMARY KEY,
full_name text NOT NULL
);

CREATE TABLE customers_canada (
customer_id bigint PRIMARY KEY,
full_name text NOT NULL
);

INSERT INTO customers_usa (customer_id, full_name) VALUES
(1, 'Alice'),
(2, 'Bob');

INSERT INTO customers_canada (customer_id, full_name) VALUES
(3, 'Claire'),
(4, 'Bob');

SELECT full_name
FROM customers_usa
UNION ALL
SELECT full_name
FROM customers_canada
ORDER BY full_name;

Expected output:

 full_name
----------
Alice
Bob
Bob
Claire
(4 rows)

Explanation:

  • Bob appears twice because duplicates are not removed.

Example 2: Add a Source Column (Intermediate)

SELECT full_name, 'USA' AS country
FROM customers_usa
UNION ALL
SELECT full_name, 'Canada' AS country
FROM customers_canada
ORDER BY full_name, country;

Expected output:

 full_name | country
----------+---------
Alice | USA
Bob | Canada
Bob | USA
Claire | Canada
(4 rows)

Explanation:

  • Adding a literal column is the simplest way to label provenance.

Example 3: UNION vs UNION ALL (Deduplication) (Intermediate)

SELECT full_name
FROM customers_usa
UNION
SELECT full_name
FROM customers_canada
ORDER BY full_name;

Expected output:

 full_name
----------
Alice
Bob
Claire
(3 rows)

Explanation:

  • UNION deduplicates, so Bob appears once.

Example 4: ORDER BY and LIMIT Apply to the Whole Union (Advanced)

SELECT full_name
FROM customers_usa
UNION ALL
SELECT full_name
FROM customers_canada
ORDER BY full_name
LIMIT 2;

Expected output:

 full_name
----------
Alice
Bob
(2 rows)

Explanation:

  • The ORDER BY and LIMIT apply to the combined output.

Example 5: Type Alignment with CAST (Advanced)

CREATE TABLE events_2025 (
event_id bigint PRIMARY KEY,
occurred_on date NOT NULL
);

CREATE TABLE events_2026 (
event_id bigint PRIMARY KEY,
occurred_at timestamptz NOT NULL
);

INSERT INTO events_2025 (event_id, occurred_on) VALUES
(1, '2025-12-31');

INSERT INTO events_2026 (event_id, occurred_at) VALUES
(2, '2026-01-01 00:00:00+00');

-- Align both branches to timestamptz
SELECT event_id, occurred_on::timestamptz AS occurred_at
FROM events_2025
UNION ALL
SELECT event_id, occurred_at
FROM events_2026
ORDER BY event_id;

Expected output:

 event_id | occurred_at
----------+------------------------
1 | 2025-12-31 00:00:00+00
2 | 2026-01-01 00:00:00+00
(2 rows)

Explanation:

  • Explicit casting makes type compatibility clear and prevents surprises.

Practical Use Cases

1) Merge partitioned tables for reporting

SELECT * FROM sales_2025
UNION ALL
SELECT * FROM sales_2026;

2) Combine logs from multiple systems

SELECT occurred_at, message, 'api' AS source
FROM api_logs
UNION ALL
SELECT occurred_at, message, 'worker' AS source
FROM worker_logs;

3) Build a multi-region customer view

SELECT customer_id, full_name, 'US' AS region FROM customers_us
UNION ALL
SELECT customer_id, full_name, 'EU' AS region FROM customers_eu;

4) Append staging data for validation

SELECT * FROM canonical_orders
UNION ALL
SELECT * FROM staging_orders;

5) Compare UNION ALL and UNION costs

-- Use UNION only when dedup is required.
SELECT email FROM mailing_list_a
UNION
SELECT email FROM mailing_list_b;

Common Mistakes & Troubleshooting

1) Mismatched column counts

Wrong SQL:

SELECT customer_id, full_name
FROM customers_usa
UNION ALL
SELECT full_name
FROM customers_canada;

Typical error:

ERROR:  each UNION query must have the same number of columns

Fix:

SELECT customer_id, full_name
FROM customers_usa
UNION ALL
SELECT customer_id, full_name
FROM customers_canada;

2) Type mismatch across branches

Wrong SQL:

SELECT occurred_on
FROM events_2025
UNION ALL
SELECT occurred_at
FROM events_2026;

Bad outcome:

  • PostgreSQL may error or pick an unexpected common type.

Fix:

SELECT occurred_on::timestamptz AS occurred_at FROM events_2025
UNION ALL
SELECT occurred_at FROM events_2026;

3) Putting ORDER BY in the middle

Wrong SQL:

SELECT full_name FROM customers_usa ORDER BY full_name
UNION ALL
SELECT full_name FROM customers_canada;

Typical error:

ERROR:  syntax error at or near "UNION"

Fix:

SELECT full_name FROM customers_usa
UNION ALL
SELECT full_name FROM customers_canada
ORDER BY full_name;

4) Expecting UNION ALL to remove duplicates

Wrong expectation:

  • assuming duplicates are removed automatically.

Fix:

  • Use UNION when you need deduplication.

Debugging checklist:

  1. Verify every SELECT has the same number of columns.
  2. Check corresponding column types and add explicit casts.
  3. Ensure ORDER BY appears only once at the end.
  4. Decide explicitly between UNION ALL (keep duplicates) and UNION (dedupe).
  5. If performance is slow, check whether UNION is forcing a large dedupe operation.

Best Practices

  • ✅ Use UNION ALL when duplicates are acceptable. ❌ Avoid UNION by default; dedupe has a cost.
  • ✅ Add provenance columns ('source' AS origin) when merging data. ❌ Avoid ambiguous combined outputs with no source context.
  • ✅ Align types explicitly with casts. ❌ Avoid relying on implicit coercions across branches.
  • ✅ Apply ORDER BY and LIMIT once at the end. ❌ Avoid trying to order individual branches without parentheses/CTEs.
  • ✅ Consider a view for repeated union logic. ❌ Avoid copy-pasting large unions across many code paths.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE employees_2023 (employee_id integer, full_name text);
CREATE TABLE employees_2024 (employee_id integer, full_name text);

INSERT INTO employees_2023 VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO employees_2024 VALUES (3, 'Charlie'), (2, 'Bob');

Exercise 1 (Easy): Combine all names

Task: Combine employee names from both years using UNION ALL.

-- Your SQL here

Solution:

SELECT full_name FROM employees_2023
UNION ALL
SELECT full_name FROM employees_2024;

Exercise 2 (Medium): Add a source year column

Task: Return full_name and year for both tables.

-- Your SQL here

Solution:

SELECT full_name, '2023' AS year
FROM employees_2023
UNION ALL
SELECT full_name, '2024' AS year
FROM employees_2024
ORDER BY full_name, year;

Exercise 3 (Advanced): Deduplicate with UNION

Task: Return distinct names across both years.

-- Your SQL here

Solution:

SELECT full_name FROM employees_2023
UNION
SELECT full_name FROM employees_2024
ORDER BY full_name;

Connection to Other Concepts

ConceptWhy it matters
UNIONsame shape as UNION ALL but deduplicates
JOINcombines columns horizontally (different from union)
ORDER BYapplies to the full union result set
Data typesunions require compatible types across branches
Viewsoften used to encapsulate repeated union logic

Visual Learning Diagram

flowchart TD
A[SELECT 1] --> B[UNION ALL]
C[SELECT 2] --> B
B --> D[Rows Stacked]
B --> E[Duplicates Kept]
B --> F[ORDER BY at End]
B --> G[Type Alignment]
E --> H[Fast Path]
B --> I[UNION (Dedup)]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H,I allNodes
class B highlight

Common Pitfalls

PitfallConsequencePrevention
Different column countsquery errorensure each branch returns same number of columns
Incompatible typeserrors or unexpected castscast explicitly to a common type
ORDER BY not at the endsyntax errorplace ORDER BY after the final SELECT
Using UNION when you don't need dedupeslower queriesuse UNION ALL by default
Losing provenancehard to debug resultsadd a source column

Quick Reference

SELECT a FROM t1 UNION ALL SELECT a FROM t2;
SELECT a FROM t1 UNION SELECT a FROM t2; -- dedupe
SELECT a, 'src1' AS src FROM t1 UNION ALL SELECT a, 'src2' FROM t2;
SELECT a::text FROM t1 UNION ALL SELECT b::text FROM t2; -- align types
SELECT * FROM (SELECT ... UNION ALL SELECT ...) u ORDER BY ... LIMIT ...;

What's Next