Skip to main content

Window Functions

Learning Focus

Use this lesson to apply PostgreSQL window functions (OVER) to compute analytics while keeping row-level detail.

Concept Overview

Window functions compute a value for each row using a window (a related set of rows).

Unlike GROUP BY, window functions do not collapse rows; you keep the original row granularity.

Why is it important?

  • Analytics without losing detail: totals, ranks, and deltas per row
  • Top-N per group: common reporting pattern without awkward self-joins
  • Deduplication: pick “latest per key” reliably with row_number()
  • Time-series insights: running totals, moving averages, period-over-period changes

Where does it fit?

Window functions are an advanced SELECT feature often combined with:

  • CTE pipelines (WITH) for readable multi-step analytics
  • JOINs for dimensional enrichment
  • Indexes (especially on (partition_key, order_key)) for better ordering performance

Syntax & Rules

Core Syntax

SELECT
customer_id,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders;

General form:

<window_function>() OVER (
[PARTITION BY expr [, ...]]
[ORDER BY expr [ASC|DESC] [NULLS FIRST|LAST] [, ...]]
[frame_clause]
)

Available Options / Parameters

ComponentExampleMeaning / Notes
PARTITION BYPARTITION BY customer_idsplits rows into independent groups (windows)
ORDER BYORDER BY created_atdefines row order within a partition; required for many “running” metrics
Named windowWINDOW w AS (...)reuse the same window spec across multiple functions
Frame typeROWS / RANGE / GROUPSchooses how the “current row’s frame” is defined
Frame boundsUNBOUNDED PRECEDING, CURRENT ROW, n PRECEDING/FOLLOWINGlimits which rows are included for that row’s calculation

Key Rules and Considerations

  • If order matters (running total, moving average), include ORDER BY in the window.
  • If you omit a frame clause and include ORDER BY, PostgreSQL uses a default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) which can surprise you with ties. Use explicit ROWS for per-row running calculations.
  • You cannot filter on a window function in WHERE. Use a subquery/CTE and filter in an outer query.
  • GROUP BY and window functions can coexist: aggregate first, then window over aggregated results.

Common window functions include:

  • row_number(), rank(), dense_rank()
  • lag(), lead()
  • sum(), avg(), min(), max() over a window
  • first_value(), last_value(), nth_value()
  • ntile(n), percent_rank(), cume_dist()

Step-by-Step Examples

Use this setup for the examples:

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
created_at timestamptz NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0)
);

INSERT INTO orders (customer_id, status, created_at, amount) VALUES
(1, 'paid', '2026-01-02 09:00+00', 10.00),
(1, 'paid', '2026-01-05 12:00+00', 25.00),
(1, 'failed', '2026-01-10 08:00+00', 5.00),
(2, 'paid', '2026-01-03 10:00+00', 50.00),
(2, 'paid', '2026-01-07 10:00+00', 20.00),
(3, 'paid', '2026-01-04 11:00+00', 15.00);

Example 1: Total Per Customer Without Collapsing Rows (Beginner)

SELECT
customer_id,
created_at,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;

Expected output:

 customer_id |       created_at       | amount | total_per_customer
-------------+------------------------+--------+-------------------
1 | 2026-01-02 09:00:00+00 | 10.00 | 35.00
1 | 2026-01-05 12:00:00+00 | 25.00 | 35.00
2 | 2026-01-03 10:00:00+00 | 50.00 | 70.00
2 | 2026-01-07 10:00:00+00 | 20.00 | 70.00
3 | 2026-01-04 11:00:00+00 | 15.00 | 15.00
(5 rows)

Explanation:

  • PARTITION BY customer_id creates one window per customer.
  • The sum is repeated on each row because window functions preserve row detail.

Example 2: Running Total With an Explicit Frame (Intermediate)

SELECT
customer_id,
created_at,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;

Expected output:

 customer_id |       created_at       | amount | running_total
-------------+------------------------+--------+--------------
1 | 2026-01-02 09:00:00+00 | 10.00 | 10.00
1 | 2026-01-05 12:00:00+00 | 25.00 | 35.00
2 | 2026-01-03 10:00:00+00 | 50.00 | 50.00
2 | 2026-01-07 10:00:00+00 | 20.00 | 70.00
3 | 2026-01-04 11:00:00+00 | 15.00 | 15.00
(5 rows)

Explanation:

  • ORDER BY created_at defines a sequence inside each customer.
  • The ROWS ... frame makes this a true per-row running sum (not a tie-sensitive RANGE default).

Example 3: Top-N Per Group With row_number() (Intermediate)

WITH ranked AS (
SELECT
order_id,
customer_id,
created_at,
amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY amount DESC, created_at DESC
) AS rn
FROM orders o
WHERE status = 'paid'
)
SELECT order_id, customer_id, amount, rn
FROM ranked
WHERE rn <= 1
ORDER BY customer_id, rn;

Expected output:

 order_id | customer_id | amount | rn
----------+-------------+--------+----
2 | 1 | 25.00 | 1
4 | 2 | 50.00 | 1
6 | 3 | 15.00 | 1
(3 rows)

Explanation:

  • row_number() assigns a unique sequence per customer.
  • Use a deterministic tie-breaker (here created_at DESC) so the “top row” is stable.
  • PostgreSQL has no QUALIFY, so you filter rn in an outer SELECT.

Example 4: Differences With lag() (Advanced)

SELECT
customer_id,
created_at,
amount,
LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
) AS prev_amount,
amount - LAG(amount) OVER (
PARTITION BY customer_id
ORDER BY created_at
) AS delta
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;

Expected output:

 customer_id |       created_at       | amount | prev_amount | delta
-------------+------------------------+--------+-------------+-------
1 | 2026-01-02 09:00:00+00 | 10.00 | |
1 | 2026-01-05 12:00:00+00 | 25.00 | 10.00 | 15.00
2 | 2026-01-03 10:00:00+00 | 50.00 | |
2 | 2026-01-07 10:00:00+00 | 20.00 | 50.00 | -30.00
3 | 2026-01-04 11:00:00+00 | 15.00 | |
(5 rows)

Explanation:

  • lag() looks “back” within each partition.
  • The first row in each partition has no previous value, so prev_amount is NULL.

Practical Use Cases

1) Latest Record Per Key (Dedup)

Context: Keep the most recent profile update per user.

WITH ranked AS (
SELECT
user_id,
updated_at,
payload,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
FROM user_profiles
)
SELECT user_id, updated_at, payload
FROM ranked
WHERE rn = 1;

2) Percentiles / Buckets For Reporting

Context: Split customers into spend quartiles.

WITH spend AS (
SELECT customer_id, SUM(amount) AS total_spend
FROM orders
WHERE status = 'paid'
GROUP BY customer_id
)
SELECT
customer_id,
total_spend,
NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM spend
ORDER BY total_spend DESC;

3) Moving Average

Context: Smooth daily revenue with a 7-day trailing average.

WITH daily AS (
SELECT
date_trunc('day', created_at)::date AS day,
SUM(amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY 1
)
SELECT
day,
revenue,
AVG(revenue) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS revenue_7d_avg
FROM daily
ORDER BY day;

Common Mistakes & Troubleshooting

Mistake 1: Filtering On a Window Function In WHERE

Wrong:

SELECT *
FROM orders
WHERE row_number() OVER (PARTITION BY customer_id ORDER BY created_at) = 1;

What happens: PostgreSQL errors because window functions are not allowed in WHERE.

Fix (wrap in a subquery/CTE):

WITH ranked AS (
SELECT
*,
row_number() OVER (PARTITION BY customer_id ORDER BY created_at) AS rn
FROM orders
)
SELECT *
FROM ranked
WHERE rn = 1;

Mistake 2: Implicit Default Frame With ORDER BY

Wrong (tie-sensitive when values repeat):

SELECT
created_at,
amount,
SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM orders;

What happens: the default frame can behave unexpectedly with ties; results may include “peer” rows.

Fix (use explicit ROWS frame):

SELECT
created_at,
amount,
SUM(amount) OVER (ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;

Mistake 3: Non-Deterministic Ordering

Wrong:

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC)

What happens: if two rows have the same amount, the “winner” can change between runs.

Fix (add a stable tie-breaker):

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, order_id DESC)

Debugging tips:

  1. Start by SELECT ... ORDER BY partition_key, order_key without window functions to confirm the input ordering.
  2. Add the window function columns one at a time to validate each computation.
  3. For surprising frames, print row_number() and a few key columns to understand boundaries.
  4. When performance is slow, check indexes on (partition_key, order_key) and consider pre-aggregating.

Best Practices

  • ✅ Use ROWS BETWEEN ... for running totals and moving averages; ❌ rely on implicit default frames.
  • ✅ Add deterministic tie-breakers in ORDER BY; ❌ assume equal values will pick a stable row.
  • ✅ Use WINDOW w AS (...) when repeating the same window spec; ❌ copy/paste long OVER(...) blocks everywhere.
  • ✅ Pre-aggregate then window over the smaller result when possible; ❌ window over a huge raw table when you only need daily totals.
  • ✅ Keep partitions small and meaningful; ❌ partition by high-cardinality keys unless needed.

Hands-On Practice

Use this setup for the exercises:

DROP TABLE IF EXISTS practice_payments;

CREATE TABLE practice_payments (
payment_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
paid_at timestamptz NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0)
);

INSERT INTO practice_payments (customer_id, paid_at, amount) VALUES
(1, '2026-02-01 10:00+00', 10.00),
(1, '2026-02-03 10:00+00', 15.00),
(2, '2026-02-02 10:00+00', 20.00),
(2, '2026-02-05 10:00+00', 5.00);

Exercise 1 (Easy): Total per customer

Task: Show each payment with a total_per_customer column.

-- Your SQL here

Solution:

SELECT
customer_id,
paid_at,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM practice_payments
ORDER BY customer_id, paid_at;

Exercise 2 (Medium): Running total per customer

Task: Add a running_total ordered by paid_at within each customer.

-- Your SQL here

Solution:

SELECT
customer_id,
paid_at,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY paid_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM practice_payments
ORDER BY customer_id, paid_at;

Exercise 3 (Advanced): Top payment per customer

Task: Return only the largest payment row per customer.

-- Your SQL here

Solution:

WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, paid_at DESC) AS rn
FROM practice_payments
)
SELECT payment_id, customer_id, paid_at, amount
FROM ranked
WHERE rn = 1
ORDER BY customer_id;

Connection to Other Concepts

ConceptWhy it matters
Common Table Expressions (CTEs)structure window logic into readable steps
ORDER BY and sortingwindows depend on deterministic ordering
Indexesimprove performance for partition/order keys
GROUP BY aggregatesaggregate first, then window for advanced analytics
Temporary tablesstage intermediate aggregates before windowing

Visual Learning Diagram

flowchart LR
A[Input Rows] --> B[PARTITION BY]
B --> C[ORDER BY]
C --> D[Frame]
D --> E[Window Function]
E --> F[Output Rows\n(detail preserved)]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F allNodes
class E highlight

Common Pitfalls

PitfallConsequencePrevention
Using GROUP BY when you need detail rowsloses row-level informationuse a window function instead
Missing tie-breakers in ORDER BYunstable top-N/dedup resultsadd a deterministic column (id/timestamp)
Relying on implicit framessurprising running totals with tiesspecify ROWS BETWEEN ...
Windowing over raw fact tables unnecessarilyslow queriespre-aggregate (daily/customer) first
Trying to use window results in WHEREerrorswrap in a subquery/CTE

Quick Reference

SUM(x) OVER (PARTITION BY k)
SUM(x) OVER (PARTITION BY k ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ROW_NUMBER() OVER (PARTITION BY k ORDER BY metric DESC, id DESC)
LAG(x, 1) OVER (PARTITION BY k ORDER BY ts)
NTILE(4) OVER (ORDER BY metric DESC)

What's Next