Skip to main content

Common Table Expressions (CTEs)

Learning Focus

Use this lesson to understand Common Table Expressions (CTEs) with practical syntax and examples.

Concept Overview

A Common Table Expression (CTE) is a named, reusable subquery defined with WITH that exists for the duration of a single SQL statement.

CTEs help you write SQL as a sequence of readable steps:

  • extract a subset (paid orders)
  • transform it (monthly revenue)
  • filter and present it (top months)

Why is it important?

  • Readability: breaks complex queries into named, reviewable chunks
  • Correctness: encourages you to validate each step and keep logic explicit
  • Reusability within a statement: reference the same intermediate set multiple times
  • Planner control (PostgreSQL): MATERIALIZED / NOT MATERIALIZED can influence inlining behavior

Where does it fit?

CTEs are part of advanced query structuring. They appear in analytics queries, ETL jobs, data audits, and migrations. Recursive CTEs (WITH RECURSIVE) build on this idea for hierarchical traversal.


Syntax & Rules

Core Syntax

WITH cte_name AS (
SELECT ...
)
SELECT ...
FROM cte_name;

Multiple CTEs:

WITH a AS (
SELECT ...
),
b AS (
SELECT ... FROM a
)
SELECT ...
FROM b;

Available Options / Parameters

FeatureSyntaxWhy you use it
CTE definitionWITH name AS (...)name an intermediate result
Multiple CTEsWITH a AS (...), b AS (...)multi-step transformations
Recursive CTEWITH RECURSIVE ...hierarchies and iterative expansion
Materialization hintWITH MATERIALIZED name AS (...)force evaluation once, then reuse
Inline hintWITH NOT MATERIALIZED name AS (...)encourage planner to inline

Key Rules and Considerations

  • A CTE is scoped to a single statement.
  • CTEs are not persisted objects (unlike tables or views).
  • In modern PostgreSQL, a CTE may be inlined by the planner when safe.
  • Use MATERIALIZED when you intentionally want to compute the CTE once and reuse it (or to create a planning “fence”).
  • Use NOT MATERIALIZED when the CTE is simple and you want the planner to optimize across it.

Step-by-Step Examples

Example 1: Break a Query into Steps (Beginner)

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0),
created_at date NOT NULL
);

INSERT INTO orders (customer_id, status, amount, created_at) VALUES
(10, 'paid', 25.00, '2026-01-01'),
(10, 'paid', 10.00, '2026-01-02'),
(11, 'paid', 50.00, '2026-01-03'),
(12, 'failed', 20.00, '2026-01-03'),
(13, 'paid', 40.00, '2026-02-01');

WITH paid_orders AS (
SELECT customer_id, amount, created_at
FROM orders
WHERE status = 'paid'
)
SELECT customer_id, SUM(amount) AS revenue
FROM paid_orders
GROUP BY customer_id
ORDER BY customer_id;

Expected output:

 customer_id | revenue
-------------+---------
10 | 35.00
11 | 50.00
13 | 40.00
(3 rows)

Explanation:

  • The CTE isolates “paid” logic so the main query reads like a report.

Example 2: Multiple CTEs as a Pipeline (Intermediate)

WITH paid_orders AS (
SELECT customer_id, amount, created_at
FROM orders
WHERE status = 'paid'
),
monthly AS (
SELECT
date_trunc('month', created_at::timestamptz) AS month,
SUM(amount) AS revenue
FROM paid_orders
GROUP BY 1
)
SELECT month, revenue
FROM monthly
ORDER BY month;

Expected output:

 month                  | revenue
------------------------+---------
2026-01-01 00:00:00+00 | 85.00
2026-02-01 00:00:00+00 | 40.00
(2 rows)

Explanation:

  • First step filters to paid orders.
  • Second step aggregates by month.

Example 3: Reuse the Same CTE More Than Once (Advanced)

If you need to reference the same intermediate set multiple times, a CTE keeps it defined in one place.

WITH paid_orders AS (
SELECT customer_id, amount
FROM orders
WHERE status = 'paid'
)
SELECT
(SELECT COUNT(*) FROM paid_orders) AS paid_order_count,
(SELECT SUM(amount) FROM paid_orders) AS paid_revenue;

Expected output:

 paid_order_count | paid_revenue
------------------+-------------
4 | 125.00
(1 row)

Explanation:

  • This pattern can also be written with aggregates in one SELECT, but it demonstrates multi-reference reuse.

Example 4: CTE + DML with RETURNING (Advanced)

CTEs can stage a target set for UPDATE/DELETE and return affected rows.

CREATE TABLE jobs (
job_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
status text NOT NULL,
run_after timestamptz,
locked_at timestamptz
);

INSERT INTO jobs (status, run_after, locked_at) VALUES
('queued', now() - interval '5 minutes', NULL),
('queued', now() + interval '5 minutes', NULL),
('queued', now() - interval '1 minute', NULL);

WITH ready AS (
SELECT job_id
FROM jobs
WHERE status = 'queued'
AND (run_after IS NULL OR run_after <= now())
AND locked_at IS NULL
ORDER BY job_id
LIMIT 2
)
UPDATE jobs j
SET locked_at = now()
FROM ready r
WHERE j.job_id = r.job_id
RETURNING j.job_id, j.status, j.locked_at;

Expected output (shape):

 job_id | status | locked_at
--------+--------+------------------------
1 | queued | <timestamp>
3 | queued | <timestamp>
(2 rows)

Explanation:

  • The CTE selects a batch.
  • The UPDATE ... FROM applies the change to that batch.
  • RETURNING verifies which rows were updated.

Practical Use Cases

1) Multi-step reporting queries

WITH base AS (
SELECT * FROM orders WHERE created_at >= current_date - 30
)
SELECT customer_id, COUNT(*)
FROM base
GROUP BY customer_id;

2) Debugging intermediate states

WITH step AS (
SELECT * FROM events WHERE payload @> '{"type":"purchase"}'
)
SELECT COUNT(*) FROM step;

3) Batch updates/deletes

WITH batch AS (
SELECT id FROM audit_events ORDER BY id LIMIT 10000
)
DELETE FROM audit_events e
USING batch b
WHERE e.id = b.id;

4) Dedup pipelines

WITH ranked AS (
SELECT id, row_number() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
FROM users
)
DELETE FROM users u
USING ranked r
WHERE u.id = r.id AND r.rn > 1;

5) Control planner behavior (rare, but useful)

WITH MATERIALIZED expensive AS (
SELECT ...
)
SELECT ... FROM expensive;

Common Mistakes & Troubleshooting

1) Thinking a CTE persists

Wrong mental model:

  • expecting WITH x AS (...) to create a reusable object.

Bad outcome:

  • Query fails if you reference x in a later statement.

Fix:

  • Use a view (CREATE VIEW ...) if you need persistence.

2) Using CTEs everywhere without measuring

Wrong approach:

  • rewriting every subquery into a CTE.

Bad outcome:

  • Harder to reason about performance; may block some optimizations depending on version and hints.

Fix:

  • Use CTEs when they improve clarity or reuse, then validate performance with EXPLAIN.

3) Repeating logic in multiple CTEs

Wrong approach:

  • copying the same filter into multiple steps.

Bad outcome:

  • drift and bugs when one copy changes.

Fix:

  • Put the logic in one CTE and reference it.

4) Misusing MATERIALIZED/NOT MATERIALIZED

Bad outcome:

  • Forcing materialization can increase memory and I/O.
  • Forcing inlining can duplicate work if referenced multiple times.

Fix:

  • Use hints only when you have evidence (plan inspection) that they help.

Debugging checklist:

  1. Run each CTE body as a standalone SELECT to validate intermediate results.
  2. Add LIMIT to check shapes and edge cases.
  3. Use EXPLAIN (ANALYZE, BUFFERS) for performance-critical queries.
  4. If the same CTE is referenced multiple times, consider MATERIALIZED.
  5. If a simple CTE prevents optimization, consider NOT MATERIALIZED or inline it.

Best Practices

  • ✅ Name CTEs after what they represent (paid_orders, monthly_revenue). ❌ Avoid generic names (tmp, t1) in complex queries.
  • ✅ Keep each CTE focused on one transformation. ❌ Avoid 10-layer CTE chains without clear purpose.
  • ✅ Prefer FILTER for conditional aggregates inside CTE pipelines. ❌ Avoid duplicating almost-identical CTEs for each condition.
  • ✅ Use RETURNING with CTE-driven DML to verify changes. ❌ Avoid bulk updates without visibility into affected rows.
  • ✅ Measure performance when it matters. ❌ Avoid using MATERIALIZED/NOT MATERIALIZED as a default style rule.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE practice_orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
amount numeric(12,2) NOT NULL,
created_at date NOT NULL
);

INSERT INTO practice_orders (customer_id, status, amount, created_at) VALUES
(1, 'paid', 10.00, current_date),
(1, 'paid', 20.00, current_date),
(2, 'failed', 5.00, current_date);

Exercise 1 (Easy): One CTE

Task: Create a CTE of paid orders and return revenue per customer.

-- Your SQL here

Solution:

WITH paid AS (
SELECT customer_id, amount
FROM practice_orders
WHERE status = 'paid'
)
SELECT customer_id, SUM(amount) AS revenue
FROM paid
GROUP BY customer_id
ORDER BY customer_id;

Exercise 2 (Medium): Two-step pipeline

Task: Use two CTEs: one for paid orders, one that aggregates per customer.

-- Your SQL here

Solution:

WITH paid AS (
SELECT customer_id, amount
FROM practice_orders
WHERE status = 'paid'
),
per_customer AS (
SELECT customer_id, SUM(amount) AS revenue
FROM paid
GROUP BY customer_id
)
SELECT *
FROM per_customer
ORDER BY revenue DESC;

Exercise 3 (Advanced): Reuse a CTE twice

Task: Compute both the count and total amount of paid orders by referencing the same CTE twice.

-- Your SQL here

Solution:

WITH paid AS (
SELECT amount
FROM practice_orders
WHERE status = 'paid'
)
SELECT
(SELECT COUNT(*) FROM paid) AS paid_count,
(SELECT SUM(amount) FROM paid) AS paid_total;

Connection to Other Concepts

ConceptWhy it matters
Recursive CTEsextend WITH to iterative/hierarchical queries
Temporary tablesalternative for multi-statement staging
Window functionsoften used inside CTE pipelines for ranking/dedup
Transactionswrap multi-step DML for safety
EXPLAINvalidate performance of complex CTE queries

Visual Learning Diagram

flowchart LR
A[Base Tables] --> B[CTE Step 1]
B --> C[CTE Step 2]
C --> D[Final SELECT]
B --> E[MATERIALIZED?]
E --> F[Reuse]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F allNodes
class B highlight

Common Pitfalls

PitfallConsequencePrevention
Treating CTEs as persisted objectsmissing table errorsremember scope is one statement
Over-stacking CTE layershard-to-maintain querieskeep each step meaningful
Copy/paste logic across stepsdrift and bugsreuse one CTE instead
Incorrect hint usageworse performanceadd hints only with evidence
Not verifying intermediate resultssilent errorsrun CTE bodies standalone while debugging

Quick Reference

WITH cte AS (SELECT ...) SELECT * FROM cte;
WITH a AS (...), b AS (...) SELECT * FROM b;
WITH MATERIALIZED x AS (SELECT ...) SELECT ... FROM x;
WITH NOT MATERIALIZED x AS (SELECT ...) SELECT ... FROM x;
WITH x AS (SELECT ...) UPDATE t SET ... FROM x WHERE ... RETURNING ...;

What's Next