Window Functions
Use this lesson to apply PostgreSQL window functions (OVER) to compute analytics while keeping row-level detail.
Concept Overview
Window functions compute a value for each row using a window (a related set of rows).
Unlike GROUP BY, window functions do not collapse rows; you keep the original row granularity.
Why is it important?
- Analytics without losing detail: totals, ranks, and deltas per row
- Top-N per group: common reporting pattern without awkward self-joins
- Deduplication: pick “latest per key” reliably with
row_number() - Time-series insights: running totals, moving averages, period-over-period changes
Where does it fit?
Window functions are an advanced SELECT feature often combined with:
- CTE pipelines (
WITH) for readable multi-step analytics - JOINs for dimensional enrichment
- Indexes (especially on
(partition_key, order_key)) for better ordering performance
Syntax & Rules
Core Syntax
SELECT
customer_id,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders;
General form:
<window_function>() OVER (
[PARTITION BY expr [, ...]]
[ORDER BY expr [ASC|DESC] [NULLS FIRST|LAST] [, ...]]
[frame_clause]
)
Available Options / Parameters
| Component | Example | Meaning / Notes |
|---|---|---|
PARTITION BY | PARTITION BY customer_id | splits rows into independent groups (windows) |
ORDER BY | ORDER BY created_at | defines row order within a partition; required for many “running” metrics |
| Named window | WINDOW w AS (...) | reuse the same window spec across multiple functions |
| Frame type | ROWS / RANGE / GROUPS | chooses how the “current row’s frame” is defined |
| Frame bounds | UNBOUNDED PRECEDING, CURRENT ROW, n PRECEDING/FOLLOWING | limits which rows are included for that row’s calculation |
Key Rules and Considerations
- If order matters (running total, moving average), include
ORDER BYin the window. - If you omit a frame clause and include
ORDER BY, PostgreSQL uses a default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) which can surprise you with ties. Use explicitROWSfor per-row running calculations. - You cannot filter on a window function in
WHERE. Use a subquery/CTE and filter in an outer query. GROUP BYand window functions can coexist: aggregate first, then window over aggregated results.
Common window functions include:
row_number(),rank(),dense_rank()lag(),lead()sum(),avg(),min(),max()over a windowfirst_value(),last_value(),nth_value()ntile(n),percent_rank(),cume_dist()
Step-by-Step Examples
Use this setup for the examples:
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
created_at timestamptz NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0)
);
INSERT INTO orders (customer_id, status, created_at, amount) VALUES
(1, 'paid', '2026-01-02 09:00+00', 10.00),
(1, 'paid', '2026-01-05 12:00+00', 25.00),
(1, 'failed', '2026-01-10 08:00+00', 5.00),
(2, 'paid', '2026-01-03 10:00+00', 50.00),
(2, 'paid', '2026-01-07 10:00+00', 20.00),
(3, 'paid', '2026-01-04 11:00+00', 15.00);
Example 1: Total Per Customer Without Collapsing Rows (Beginner)
SELECT
customer_id,
created_at,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;
Expected output:
customer_id | created_at | amount | total_per_customer
-------------+------------------------+--------+-------------------
1 | 2026-01-02 09:00:00+00 | 10.00 | 35.00
1 | 2026-01-05 12:00:00+00 | 25.00 | 35.00
2 | 2026-01-03 10:00:00+00 | 50.00 | 70.00
2 | 2026-01-07 10:00:00+00 | 20.00 | 70.00
3 | 2026-01-04 11:00:00+00 | 15.00 | 15.00
(5 rows)
Explanation:
PARTITION BY customer_idcreates one window per customer.- The sum is repeated on each row because window functions preserve row detail.
Example 2: Running Total With an Explicit Frame (Intermediate)
SELECT
customer_id,
created_at,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;
Expected output:
customer_id | created_at | amount | running_total
-------------+------------------------+--------+--------------
1 | 2026-01-02 09:00:00+00 | 10.00 | 10.00
1 | 2026-01-05 12:00:00+00 | 25.00 | 35.00
2 | 2026-01-03 10:00:00+00 | 50.00 | 50.00
2 | 2026-01-07 10:00:00+00 | 20.00 | 70.00
3 | 2026-01-04 11:00:00+00 | 15.00 | 15.00
(5 rows)
Explanation:
ORDER BY created_atdefines a sequence inside each customer.- The
ROWS ...frame makes this a true per-row running sum (not a tie-sensitiveRANGEdefault).
Example 3: Top-N Per Group With row_number() (Intermediate)
WITH ranked AS (
SELECT
order_id,
customer_id,
created_at,
amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY amount DESC, created_at DESC
) AS rn
FROM orders o
WHERE status = 'paid'
)
SELECT order_id, customer_id, amount, rn
FROM ranked
WHERE rn <= 1
ORDER BY customer_id, rn;
Expected output:
order_id | customer_id | amount | rn
----------+-------------+--------+----
2 | 1 | 25.00 | 1
4 | 2 | 50.00 | 1
6 | 3 | 15.00 | 1
(3 rows)
Explanation:
row_number()assigns a unique sequence per customer.- Use a deterministic tie-breaker (here
created_at DESC) so the “top row” is stable. - PostgreSQL has no
QUALIFY, so you filterrnin an outer SELECT.
Example 4: Differences With lag() (Advanced)
SELECT
customer_id,
created_at,
amount,
LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
) AS prev_amount,
amount - LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
) AS delta
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;
Expected output:
customer_id | created_at | amount | prev_amount | delta
-------------+------------------------+--------+-------------+-------
1 | 2026-01-02 09:00:00+00 | 10.00 | |
1 | 2026-01-05 12:00:00+00 | 25.00 | 10.00 | 15.00
2 | 2026-01-03 10:00:00+00 | 50.00 | |
2 | 2026-01-07 10:00:00+00 | 20.00 | 50.00 | -30.00
3 | 2026-01-04 11:00:00+00 | 15.00 | |
(5 rows)
Explanation:
lag()looks “back” within each partition.- The first row in each partition has no previous value, so
prev_amountisNULL.
Practical Use Cases
1) Latest Record Per Key (Dedup)
Context: Keep the most recent profile update per user.
WITH ranked AS (
SELECT
user_id,
updated_at,
payload,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
FROM user_profiles
)
SELECT user_id, updated_at, payload
FROM ranked
WHERE rn = 1;
2) Percentiles / Buckets For Reporting
Context: Split customers into spend quartiles.
WITH spend AS (
SELECT customer_id, SUM(amount) AS total_spend
FROM orders
WHERE status = 'paid'
GROUP BY customer_id
)
SELECT
customer_id,
total_spend,
NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM spend
ORDER BY total_spend DESC;
3) Moving Average
Context: Smooth daily revenue with a 7-day trailing average.
WITH daily AS (
SELECT
date_trunc('day', created_at)::date AS day,
SUM(amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY 1
)
SELECT
day,
revenue,
AVG(revenue) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS revenue_7d_avg
FROM daily
ORDER BY day;
Common Mistakes & Troubleshooting
Mistake 1: Filtering On a Window Function In WHERE
Wrong:
SELECT *
FROM orders
WHERE row_number() OVER (PARTITION BY customer_id ORDER BY created_at) = 1;
What happens: PostgreSQL errors because window functions are not allowed in WHERE.
Fix (wrap in a subquery/CTE):
WITH ranked AS (
SELECT
*,
row_number() OVER (PARTITION BY customer_id ORDER BY created_at) AS rn
FROM orders
)
SELECT *
FROM ranked
WHERE rn = 1;
Mistake 2: Implicit Default Frame With ORDER BY
Wrong (tie-sensitive when values repeat):
SELECT
created_at,
amount,
SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM orders;
What happens: the default frame can behave unexpectedly with ties; results may include “peer” rows.
Fix (use explicit ROWS frame):
SELECT
created_at,
amount,
SUM(amount) OVER (ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;
Mistake 3: Non-Deterministic Ordering
Wrong:
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC)
What happens: if two rows have the same amount, the “winner” can change between runs.
Fix (add a stable tie-breaker):
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, order_id DESC)
Debugging tips:
- Start by
SELECT ... ORDER BY partition_key, order_keywithout window functions to confirm the input ordering. - Add the window function columns one at a time to validate each computation.
- For surprising frames, print
row_number()and a few key columns to understand boundaries. - When performance is slow, check indexes on
(partition_key, order_key)and consider pre-aggregating.
Best Practices
- ✅ Use
ROWS BETWEEN ...for running totals and moving averages; ❌ rely on implicit default frames. - ✅ Add deterministic tie-breakers in
ORDER BY; ❌ assume equal values will pick a stable row. - ✅ Use
WINDOW w AS (...)when repeating the same window spec; ❌ copy/paste longOVER(...)blocks everywhere. - ✅ Pre-aggregate then window over the smaller result when possible; ❌ window over a huge raw table when you only need daily totals.
- ✅ Keep partitions small and meaningful; ❌ partition by high-cardinality keys unless needed.
Hands-On Practice
Use this setup for the exercises:
DROP TABLE IF EXISTS practice_payments;
CREATE TABLE practice_payments (
payment_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
paid_at timestamptz NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0)
);
INSERT INTO practice_payments (customer_id, paid_at, amount) VALUES
(1, '2026-02-01 10:00+00', 10.00),
(1, '2026-02-03 10:00+00', 15.00),
(2, '2026-02-02 10:00+00', 20.00),
(2, '2026-02-05 10:00+00', 5.00);
Exercise 1 (Easy): Total per customer
Task: Show each payment with a total_per_customer column.
-- Your SQL here
Solution:
SELECT
customer_id,
paid_at,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM practice_payments
ORDER BY customer_id, paid_at;
Exercise 2 (Medium): Running total per customer
Task: Add a running_total ordered by paid_at within each customer.
-- Your SQL here
Solution:
SELECT
customer_id,
paid_at,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY paid_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM practice_payments
ORDER BY customer_id, paid_at;
Exercise 3 (Advanced): Top payment per customer
Task: Return only the largest payment row per customer.
-- Your SQL here
Solution:
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, paid_at DESC) AS rn
FROM practice_payments
)
SELECT payment_id, customer_id, paid_at, amount
FROM ranked
WHERE rn = 1
ORDER BY customer_id;
Connection to Other Concepts
| Concept | Why it matters |
|---|---|
| Common Table Expressions (CTEs) | structure window logic into readable steps |
ORDER BY and sorting | windows depend on deterministic ordering |
| Indexes | improve performance for partition/order keys |
| GROUP BY aggregates | aggregate first, then window for advanced analytics |
| Temporary tables | stage intermediate aggregates before windowing |
Visual Learning Diagram
flowchart LR
A[Input Rows] --> B[PARTITION BY]
B --> C[ORDER BY]
C --> D[Frame]
D --> E[Window Function]
E --> F[Output Rows\n(detail preserved)]
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F allNodes
class E highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
Using GROUP BY when you need detail rows | loses row-level information | use a window function instead |
Missing tie-breakers in ORDER BY | unstable top-N/dedup results | add a deterministic column (id/timestamp) |
| Relying on implicit frames | surprising running totals with ties | specify ROWS BETWEEN ... |
| Windowing over raw fact tables unnecessarily | slow queries | pre-aggregate (daily/customer) first |
Trying to use window results in WHERE | errors | wrap in a subquery/CTE |
Quick Reference
SUM(x) OVER (PARTITION BY k)
SUM(x) OVER (PARTITION BY k ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ROW_NUMBER() OVER (PARTITION BY k ORDER BY metric DESC, id DESC)
LAG(x, 1) OVER (PARTITION BY k ORDER BY ts)
NTILE(4) OVER (ORDER BY metric DESC)
What's Next
- Previous: Temporary Tables - Stage data when you need multi-statement reuse.
- Next: JSONB Querying and Indexing - Query semi-structured data efficiently in PostgreSQL.
- Module Overview - Return to the Advanced SQL Features index.