Common Table Expressions (CTEs)

Learning Focus

Use this lesson to understand Common Table Expressions (CTEs) with practical syntax and examples.

Concept Overview

A Common Table Expression (CTE) is a named, reusable subquery defined with WITH that exists for the duration of a single SQL statement.

CTEs help you write SQL as a sequence of readable steps:

extract a subset (paid orders)
transform it (monthly revenue)
filter and present it (top months)

Why is it important?

Readability: breaks complex queries into named, reviewable chunks
Correctness: encourages you to validate each step and keep logic explicit
Reusability within a statement: reference the same intermediate set multiple times
Planner control (PostgreSQL): MATERIALIZED / NOT MATERIALIZED can influence inlining behavior

Where does it fit?

CTEs are part of advanced query structuring. They appear in analytics queries, ETL jobs, data audits, and migrations. Recursive CTEs (WITH RECURSIVE) build on this idea for hierarchical traversal.

Syntax & Rules

Core Syntax

WITH cte_name AS (
  SELECT ...
)
SELECT ...
FROM cte_name;

Multiple CTEs:

WITH a AS (
  SELECT ...
),
b AS (
  SELECT ... FROM a
)
SELECT ...
FROM b;

Available Options / Parameters

Feature	Syntax	Why you use it
CTE definition	`WITH name AS (...)`	name an intermediate result
Multiple CTEs	`WITH a AS (...), b AS (...)`	multi-step transformations
Recursive CTE	`WITH RECURSIVE ...`	hierarchies and iterative expansion
Materialization hint	`WITH MATERIALIZED name AS (...)`	force evaluation once, then reuse
Inline hint	`WITH NOT MATERIALIZED name AS (...)`	encourage planner to inline

Key Rules and Considerations

A CTE is scoped to a single statement.
CTEs are not persisted objects (unlike tables or views).
In modern PostgreSQL, a CTE may be inlined by the planner when safe.
Use MATERIALIZED when you intentionally want to compute the CTE once and reuse it (or to create a planning “fence”).
Use NOT MATERIALIZED when the CTE is simple and you want the planner to optimize across it.

Step-by-Step Examples

Example 1: Break a Query into Steps (Beginner)

CREATE TABLE orders (
  order_id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL,
  status      text NOT NULL,
  amount      numeric(12,2) NOT NULL CHECK (amount >= 0),
  created_at  date NOT NULL
);

INSERT INTO orders (customer_id, status, amount, created_at) VALUES
  (10, 'paid',  25.00, '2026-01-01'),
  (10, 'paid',  10.00, '2026-01-02'),
  (11, 'paid',  50.00, '2026-01-03'),
  (12, 'failed', 20.00, '2026-01-03'),
  (13, 'paid',  40.00, '2026-02-01');

WITH paid_orders AS (
  SELECT customer_id, amount, created_at
  FROM orders
  WHERE status = 'paid'
)
SELECT customer_id, SUM(amount) AS revenue
FROM paid_orders
GROUP BY customer_id
ORDER BY customer_id;

Expected output:

 customer_id | revenue
-------------+---------
 10          | 35.00
 11          | 50.00
 13          | 40.00
(3 rows)

Explanation:

The CTE isolates “paid” logic so the main query reads like a report.

Example 2: Multiple CTEs as a Pipeline (Intermediate)

WITH paid_orders AS (
  SELECT customer_id, amount, created_at
  FROM orders
  WHERE status = 'paid'
),
monthly AS (
  SELECT
    date_trunc('month', created_at::timestamptz) AS month,
    SUM(amount) AS revenue
  FROM paid_orders
  GROUP BY 1
)
SELECT month, revenue
FROM monthly
ORDER BY month;

Expected output:

 month                  | revenue
------------------------+---------
 2026-01-01 00:00:00+00 | 85.00
 2026-02-01 00:00:00+00 | 40.00
(2 rows)

Explanation:

First step filters to paid orders.
Second step aggregates by month.

Example 3: Reuse the Same CTE More Than Once (Advanced)

If you need to reference the same intermediate set multiple times, a CTE keeps it defined in one place.

WITH paid_orders AS (
  SELECT customer_id, amount
  FROM orders
  WHERE status = 'paid'
)
SELECT
  (SELECT COUNT(*) FROM paid_orders) AS paid_order_count,
  (SELECT SUM(amount) FROM paid_orders) AS paid_revenue;

Expected output:

 paid_order_count | paid_revenue
------------------+-------------
 4                | 125.00
(1 row)

Explanation:

This pattern can also be written with aggregates in one SELECT, but it demonstrates multi-reference reuse.

Example 4: CTE + DML with RETURNING (Advanced)

CTEs can stage a target set for UPDATE/DELETE and return affected rows.

CREATE TABLE jobs (
  job_id     bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  status     text NOT NULL,
  run_after  timestamptz,
  locked_at  timestamptz
);

INSERT INTO jobs (status, run_after, locked_at) VALUES
  ('queued', now() - interval '5 minutes', NULL),
  ('queued', now() + interval '5 minutes', NULL),
  ('queued', now() - interval '1 minute', NULL);

WITH ready AS (
  SELECT job_id
  FROM jobs
  WHERE status = 'queued'
    AND (run_after IS NULL OR run_after <= now())
    AND locked_at IS NULL
  ORDER BY job_id
  LIMIT 2
)
UPDATE jobs j
SET locked_at = now()
FROM ready r
WHERE j.job_id = r.job_id
RETURNING j.job_id, j.status, j.locked_at;

Expected output (shape):

 job_id | status | locked_at
--------+--------+------------------------
 1      | queued | <timestamp>
 3      | queued | <timestamp>
(2 rows)

Explanation:

The CTE selects a batch.
The UPDATE ... FROM applies the change to that batch.
RETURNING verifies which rows were updated.

Practical Use Cases

1) Multi-step reporting queries

WITH base AS (
  SELECT * FROM orders WHERE created_at >= current_date - 30
)
SELECT customer_id, COUNT(*)
FROM base
GROUP BY customer_id;

2) Debugging intermediate states

WITH step AS (
  SELECT * FROM events WHERE payload @> '{"type":"purchase"}'
)
SELECT COUNT(*) FROM step;

3) Batch updates/deletes

WITH batch AS (
  SELECT id FROM audit_events ORDER BY id LIMIT 10000
)
DELETE FROM audit_events e
USING batch b
WHERE e.id = b.id;

4) Dedup pipelines

WITH ranked AS (
  SELECT id, row_number() OVER (PARTITION BY email ORDER BY created_at DESC) AS rn
  FROM users
)
DELETE FROM users u
USING ranked r
WHERE u.id = r.id AND r.rn > 1;

5) Control planner behavior (rare, but useful)

WITH MATERIALIZED expensive AS (
  SELECT ...
)
SELECT ... FROM expensive;

Common Mistakes & Troubleshooting

1) Thinking a CTE persists

Wrong mental model:

expecting WITH x AS (...) to create a reusable object.

Bad outcome:

Query fails if you reference x in a later statement.

Fix:

Use a view (CREATE VIEW ...) if you need persistence.

2) Using CTEs everywhere without measuring

Wrong approach:

rewriting every subquery into a CTE.

Bad outcome:

Harder to reason about performance; may block some optimizations depending on version and hints.

Fix:

Use CTEs when they improve clarity or reuse, then validate performance with EXPLAIN.

3) Repeating logic in multiple CTEs

Wrong approach:

copying the same filter into multiple steps.

Bad outcome:

drift and bugs when one copy changes.

Fix:

Put the logic in one CTE and reference it.

4) Misusing MATERIALIZED/NOT MATERIALIZED

Bad outcome:

Forcing materialization can increase memory and I/O.
Forcing inlining can duplicate work if referenced multiple times.

Fix:

Use hints only when you have evidence (plan inspection) that they help.

Debugging checklist:

Run each CTE body as a standalone SELECT to validate intermediate results.
Add LIMIT to check shapes and edge cases.
Use EXPLAIN (ANALYZE, BUFFERS) for performance-critical queries.
If the same CTE is referenced multiple times, consider MATERIALIZED.
If a simple CTE prevents optimization, consider NOT MATERIALIZED or inline it.

Best Practices

✅ Name CTEs after what they represent (paid_orders, monthly_revenue). ❌ Avoid generic names (tmp, t1) in complex queries.
✅ Keep each CTE focused on one transformation. ❌ Avoid 10-layer CTE chains without clear purpose.
✅ Prefer FILTER for conditional aggregates inside CTE pipelines. ❌ Avoid duplicating almost-identical CTEs for each condition.
✅ Use RETURNING with CTE-driven DML to verify changes. ❌ Avoid bulk updates without visibility into affected rows.
✅ Measure performance when it matters. ❌ Avoid using MATERIALIZED/NOT MATERIALIZED as a default style rule.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE practice_orders (
  order_id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL,
  status      text NOT NULL,
  amount      numeric(12,2) NOT NULL,
  created_at  date NOT NULL
);

INSERT INTO practice_orders (customer_id, status, amount, created_at) VALUES
  (1, 'paid', 10.00, current_date),
  (1, 'paid', 20.00, current_date),
  (2, 'failed', 5.00, current_date);

Exercise 1 (Easy): One CTE

Task: Create a CTE of paid orders and return revenue per customer.

-- Your SQL here

Solution:

WITH paid AS (
  SELECT customer_id, amount
  FROM practice_orders
  WHERE status = 'paid'
)
SELECT customer_id, SUM(amount) AS revenue
FROM paid
GROUP BY customer_id
ORDER BY customer_id;

Exercise 2 (Medium): Two-step pipeline

Task: Use two CTEs: one for paid orders, one that aggregates per customer.

-- Your SQL here

Solution:

WITH paid AS (
  SELECT customer_id, amount
  FROM practice_orders
  WHERE status = 'paid'
),
per_customer AS (
  SELECT customer_id, SUM(amount) AS revenue
  FROM paid
  GROUP BY customer_id
)
SELECT *
FROM per_customer
ORDER BY revenue DESC;

Exercise 3 (Advanced): Reuse a CTE twice

Task: Compute both the count and total amount of paid orders by referencing the same CTE twice.

-- Your SQL here

Solution:

WITH paid AS (
  SELECT amount
  FROM practice_orders
  WHERE status = 'paid'
)
SELECT
  (SELECT COUNT(*) FROM paid) AS paid_count,
  (SELECT SUM(amount) FROM paid) AS paid_total;

Connection to Other Concepts

Concept	Why it matters
Recursive CTEs	extend WITH to iterative/hierarchical queries
Temporary tables	alternative for multi-statement staging
Window functions	often used inside CTE pipelines for ranking/dedup
Transactions	wrap multi-step DML for safety
`EXPLAIN`	validate performance of complex CTE queries

Visual Learning Diagram

flowchart LR
  A[Base Tables] --> B[CTE Step 1]
  B --> C[CTE Step 2]
  C --> D[Final SELECT]
  B --> E[MATERIALIZED?]
  E --> F[Reuse]

  classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
  classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
  class A,B,C,D,E,F allNodes
  class B highlight

Common Pitfalls

Pitfall	Consequence	Prevention
Treating CTEs as persisted objects	missing table errors	remember scope is one statement
Over-stacking CTE layers	hard-to-maintain queries	keep each step meaningful
Copy/paste logic across steps	drift and bugs	reuse one CTE instead
Incorrect hint usage	worse performance	add hints only with evidence
Not verifying intermediate results	silent errors	run CTE bodies standalone while debugging

Quick Reference

WITH cte AS (SELECT ...) SELECT * FROM cte;
WITH a AS (...), b AS (...) SELECT * FROM b;
WITH MATERIALIZED x AS (SELECT ...) SELECT ... FROM x;
WITH NOT MATERIALIZED x AS (SELECT ...) SELECT ... FROM x;
WITH x AS (SELECT ...) UPDATE t SET ... FROM x WHERE ... RETURNING ...;

What's Next

Next: Recursive CTE Patterns - Continue to the next concept with incremental complexity.
Module Overview - Return to this module index and choose another related lesson.

Concept Overview​

Why is it important?​

Where does it fit?​

Syntax & Rules​

Core Syntax​

Available Options / Parameters​

Key Rules and Considerations​

Step-by-Step Examples​

Example 1: Break a Query into Steps (Beginner)​

Example 2: Multiple CTEs as a Pipeline (Intermediate)​

Example 3: Reuse the Same CTE More Than Once (Advanced)​

Example 4: CTE + DML with RETURNING (Advanced)​

Practical Use Cases​

1) Multi-step reporting queries​

2) Debugging intermediate states​

3) Batch updates/deletes​

4) Dedup pipelines​

5) Control planner behavior (rare, but useful)​

Common Mistakes & Troubleshooting​

1) Thinking a CTE persists​

2) Using CTEs everywhere without measuring​

3) Repeating logic in multiple CTEs​

4) Misusing MATERIALIZED/NOT MATERIALIZED​

Best Practices​

Hands-On Practice​

Exercise 1 (Easy): One CTE​

Exercise 2 (Medium): Two-step pipeline​

Exercise 3 (Advanced): Reuse a CTE twice​

Connection to Other Concepts​

Visual Learning Diagram​

Common Pitfalls​

Quick Reference​

What's Next​

Concept Overview

Why is it important?

Where does it fit?

Syntax & Rules

Core Syntax

Available Options / Parameters

Key Rules and Considerations

Step-by-Step Examples

Example 1: Break a Query into Steps (Beginner)

Example 2: Multiple CTEs as a Pipeline (Intermediate)

Example 3: Reuse the Same CTE More Than Once (Advanced)

Example 4: CTE + DML with RETURNING (Advanced)

Practical Use Cases

1) Multi-step reporting queries

2) Debugging intermediate states

3) Batch updates/deletes

4) Dedup pipelines

5) Control planner behavior (rare, but useful)

Common Mistakes & Troubleshooting

1) Thinking a CTE persists

2) Using CTEs everywhere without measuring

3) Repeating logic in multiple CTEs

4) Misusing MATERIALIZED/NOT MATERIALIZED

Best Practices

Hands-On Practice

Exercise 1 (Easy): One CTE

Exercise 2 (Medium): Two-step pipeline

Exercise 3 (Advanced): Reuse a CTE twice

Connection to Other Concepts

Visual Learning Diagram

Common Pitfalls

Quick Reference

What's Next