Window Functions

Learning Focus

Use this lesson to apply PostgreSQL window functions (OVER) to compute analytics while keeping row-level detail.

Concept Overview

Window functions compute a value for each row using a window (a related set of rows).

Unlike GROUP BY, window functions do not collapse rows; you keep the original row granularity.

Why is it important?

Analytics without losing detail: totals, ranks, and deltas per row
Top-N per group: common reporting pattern without awkward self-joins
Deduplication: pick “latest per key” reliably with row_number()
Time-series insights: running totals, moving averages, period-over-period changes

Where does it fit?

Window functions are an advanced SELECT feature often combined with:

CTE pipelines (WITH) for readable multi-step analytics
JOINs for dimensional enrichment
Indexes (especially on (partition_key, order_key)) for better ordering performance

Syntax & Rules

Core Syntax

SELECT
  customer_id,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders;

General form:

<window_function>() OVER (
  [PARTITION BY expr [, ...]]
  [ORDER BY expr [ASC|DESC] [NULLS FIRST|LAST] [, ...]]
  [frame_clause]
)

Available Options / Parameters

Component	Example	Meaning / Notes
`PARTITION BY`	`PARTITION BY customer_id`	splits rows into independent groups (windows)
`ORDER BY`	`ORDER BY created_at`	defines row order within a partition; required for many “running” metrics
Named window	`WINDOW w AS (...)`	reuse the same window spec across multiple functions
Frame type	`ROWS` / `RANGE` / `GROUPS`	chooses how the “current row’s frame” is defined
Frame bounds	`UNBOUNDED PRECEDING`, `CURRENT ROW`, `n PRECEDING/FOLLOWING`	limits which rows are included for that row’s calculation

Key Rules and Considerations

If order matters (running total, moving average), include ORDER BY in the window.
If you omit a frame clause and include ORDER BY, PostgreSQL uses a default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) which can surprise you with ties. Use explicit ROWS for per-row running calculations.
You cannot filter on a window function in WHERE. Use a subquery/CTE and filter in an outer query.
GROUP BY and window functions can coexist: aggregate first, then window over aggregated results.

Common window functions include:

row_number(), rank(), dense_rank()
lag(), lead()
sum(), avg(), min(), max() over a window
first_value(), last_value(), nth_value()
ntile(n), percent_rank(), cume_dist()

Step-by-Step Examples

Use this setup for the examples:

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
  order_id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL,
  status      text NOT NULL,
  created_at  timestamptz NOT NULL,
  amount      numeric(12,2) NOT NULL CHECK (amount >= 0)
);

INSERT INTO orders (customer_id, status, created_at, amount) VALUES
  (1, 'paid',   '2026-01-02 09:00+00', 10.00),
  (1, 'paid',   '2026-01-05 12:00+00', 25.00),
  (1, 'failed', '2026-01-10 08:00+00',  5.00),
  (2, 'paid',   '2026-01-03 10:00+00', 50.00),
  (2, 'paid',   '2026-01-07 10:00+00', 20.00),
  (3, 'paid',   '2026-01-04 11:00+00', 15.00);

Example 1: Total Per Customer Without Collapsing Rows (Beginner)

SELECT
  customer_id,
  created_at,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;

Expected output:

 customer_id |       created_at       | amount | total_per_customer
-------------+------------------------+--------+-------------------
 1           | 2026-01-02 09:00:00+00 | 10.00  | 35.00
 1           | 2026-01-05 12:00:00+00 | 25.00  | 35.00
 2           | 2026-01-03 10:00:00+00 | 50.00  | 70.00
 2           | 2026-01-07 10:00:00+00 | 20.00  | 70.00
 3           | 2026-01-04 11:00:00+00 | 15.00  | 15.00
(5 rows)

Explanation:

PARTITION BY customer_id creates one window per customer.
The sum is repeated on each row because window functions preserve row detail.

Example 2: Running Total With an Explicit Frame (Intermediate)

SELECT
  customer_id,
  created_at,
  amount,
  SUM(amount) OVER (
    PARTITION BY customer_id
    ORDER BY created_at
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;

Expected output:

 customer_id |       created_at       | amount | running_total
-------------+------------------------+--------+--------------
 1           | 2026-01-02 09:00:00+00 | 10.00  | 10.00
 1           | 2026-01-05 12:00:00+00 | 25.00  | 35.00
 2           | 2026-01-03 10:00:00+00 | 50.00  | 50.00
 2           | 2026-01-07 10:00:00+00 | 20.00  | 70.00
 3           | 2026-01-04 11:00:00+00 | 15.00  | 15.00
(5 rows)

Explanation:

ORDER BY created_at defines a sequence inside each customer.
The ROWS ... frame makes this a true per-row running sum (not a tie-sensitive RANGE default).

Example 3: Top-N Per Group With `row_number()` (Intermediate)

WITH ranked AS (
  SELECT
    order_id,
    customer_id,
    created_at,
    amount,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY amount DESC, created_at DESC
    ) AS rn
  FROM orders o
  WHERE status = 'paid'
)
SELECT order_id, customer_id, amount, rn
FROM ranked
WHERE rn <= 1
ORDER BY customer_id, rn;

Expected output:

 order_id | customer_id | amount | rn
----------+-------------+--------+----
 2        | 1           | 25.00  | 1
 4        | 2           | 50.00  | 1
 6        | 3           | 15.00  | 1
(3 rows)

Explanation:

row_number() assigns a unique sequence per customer.
Use a deterministic tie-breaker (here created_at DESC) so the “top row” is stable.
PostgreSQL has no QUALIFY, so you filter rn in an outer SELECT.

Example 4: Differences With `lag()` (Advanced)

SELECT
  customer_id,
  created_at,
  amount,
  LAG(amount) OVER (
    PARTITION BY customer_id
    ORDER BY created_at
  ) AS prev_amount,
  amount - LAG(amount) OVER (
    PARTITION BY customer_id
    ORDER BY created_at
  ) AS delta
FROM orders
WHERE status = 'paid'
ORDER BY customer_id, created_at;

Expected output:

 customer_id |       created_at       | amount | prev_amount | delta
-------------+------------------------+--------+-------------+-------
 1           | 2026-01-02 09:00:00+00 | 10.00  |             |
 1           | 2026-01-05 12:00:00+00 | 25.00  | 10.00       | 15.00
 2           | 2026-01-03 10:00:00+00 | 50.00  |             |
 2           | 2026-01-07 10:00:00+00 | 20.00  | 50.00       | -30.00
 3           | 2026-01-04 11:00:00+00 | 15.00  |             |
(5 rows)

Explanation:

lag() looks “back” within each partition.
The first row in each partition has no previous value, so prev_amount is NULL.

Practical Use Cases

1) Latest Record Per Key (Dedup)

Context: Keep the most recent profile update per user.

WITH ranked AS (
  SELECT
    user_id,
    updated_at,
    payload,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
  FROM user_profiles
)
SELECT user_id, updated_at, payload
FROM ranked
WHERE rn = 1;

2) Percentiles / Buckets For Reporting

Context: Split customers into spend quartiles.

WITH spend AS (
  SELECT customer_id, SUM(amount) AS total_spend
  FROM orders
  WHERE status = 'paid'
  GROUP BY customer_id
)
SELECT
  customer_id,
  total_spend,
  NTILE(4) OVER (ORDER BY total_spend DESC) AS spend_quartile
FROM spend
ORDER BY total_spend DESC;

3) Moving Average

Context: Smooth daily revenue with a 7-day trailing average.

WITH daily AS (
  SELECT
    date_trunc('day', created_at)::date AS day,
    SUM(amount) AS revenue
  FROM orders
  WHERE status = 'paid'
  GROUP BY 1
)
SELECT
  day,
  revenue,
  AVG(revenue) OVER (
    ORDER BY day
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS revenue_7d_avg
FROM daily
ORDER BY day;

Common Mistakes & Troubleshooting

Mistake 1: Filtering On a Window Function In `WHERE`

Wrong:

SELECT *
FROM orders
WHERE row_number() OVER (PARTITION BY customer_id ORDER BY created_at) = 1;

What happens: PostgreSQL errors because window functions are not allowed in WHERE.

Fix (wrap in a subquery/CTE):

WITH ranked AS (
  SELECT
    *,
    row_number() OVER (PARTITION BY customer_id ORDER BY created_at) AS rn
  FROM orders
)
SELECT *
FROM ranked
WHERE rn = 1;

Mistake 2: Implicit Default Frame With `ORDER BY`

Wrong (tie-sensitive when values repeat):

SELECT
  created_at,
  amount,
  SUM(amount) OVER (ORDER BY created_at) AS running_total
FROM orders;

What happens: the default frame can behave unexpectedly with ties; results may include “peer” rows.

Fix (use explicit ROWS frame):

SELECT
  created_at,
  amount,
  SUM(amount) OVER (ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;

Mistake 3: Non-Deterministic Ordering

Wrong:

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC)

What happens: if two rows have the same amount, the “winner” can change between runs.

Fix (add a stable tie-breaker):

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, order_id DESC)

Debugging tips:

Start by SELECT ... ORDER BY partition_key, order_key without window functions to confirm the input ordering.
Add the window function columns one at a time to validate each computation.
For surprising frames, print row_number() and a few key columns to understand boundaries.
When performance is slow, check indexes on (partition_key, order_key) and consider pre-aggregating.

Best Practices

✅ Use ROWS BETWEEN ... for running totals and moving averages; ❌ rely on implicit default frames.
✅ Add deterministic tie-breakers in ORDER BY; ❌ assume equal values will pick a stable row.
✅ Use WINDOW w AS (...) when repeating the same window spec; ❌ copy/paste long OVER(...) blocks everywhere.
✅ Pre-aggregate then window over the smaller result when possible; ❌ window over a huge raw table when you only need daily totals.
✅ Keep partitions small and meaningful; ❌ partition by high-cardinality keys unless needed.

Hands-On Practice

Use this setup for the exercises:

DROP TABLE IF EXISTS practice_payments;

CREATE TABLE practice_payments (
  payment_id  bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL,
  paid_at     timestamptz NOT NULL,
  amount      numeric(12,2) NOT NULL CHECK (amount >= 0)
);

INSERT INTO practice_payments (customer_id, paid_at, amount) VALUES
  (1, '2026-02-01 10:00+00', 10.00),
  (1, '2026-02-03 10:00+00', 15.00),
  (2, '2026-02-02 10:00+00', 20.00),
  (2, '2026-02-05 10:00+00',  5.00);

Exercise 1 (Easy): Total per customer

Task: Show each payment with a total_per_customer column.

-- Your SQL here

Solution:

SELECT
  customer_id,
  paid_at,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM practice_payments
ORDER BY customer_id, paid_at;

Exercise 2 (Medium): Running total per customer

Task: Add a running_total ordered by paid_at within each customer.

-- Your SQL here

Solution:

SELECT
  customer_id,
  paid_at,
  amount,
  SUM(amount) OVER (
    PARTITION BY customer_id
    ORDER BY paid_at
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS running_total
FROM practice_payments
ORDER BY customer_id, paid_at;

Exercise 3 (Advanced): Top payment per customer

Task: Return only the largest payment row per customer.

-- Your SQL here

Solution:

WITH ranked AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC, paid_at DESC) AS rn
  FROM practice_payments
)
SELECT payment_id, customer_id, paid_at, amount
FROM ranked
WHERE rn = 1
ORDER BY customer_id;

Connection to Other Concepts

Concept	Why it matters
Common Table Expressions (CTEs)	structure window logic into readable steps
`ORDER BY` and sorting	windows depend on deterministic ordering
Indexes	improve performance for partition/order keys
GROUP BY aggregates	aggregate first, then window for advanced analytics
Temporary tables	stage intermediate aggregates before windowing

Visual Learning Diagram

flowchart LR
  A[Input Rows] --> B[PARTITION BY]
  B --> C[ORDER BY]
  C --> D[Frame]
  D --> E[Window Function]
  E --> F[Output Rows\n(detail preserved)]

  classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
  classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
  class A,B,C,D,E,F allNodes
  class E highlight

Common Pitfalls

Pitfall	Consequence	Prevention
Using `GROUP BY` when you need detail rows	loses row-level information	use a window function instead
Missing tie-breakers in `ORDER BY`	unstable top-N/dedup results	add a deterministic column (id/timestamp)
Relying on implicit frames	surprising running totals with ties	specify `ROWS BETWEEN ...`
Windowing over raw fact tables unnecessarily	slow queries	pre-aggregate (daily/customer) first
Trying to use window results in `WHERE`	errors	wrap in a subquery/CTE

Quick Reference

SUM(x) OVER (PARTITION BY k)
SUM(x) OVER (PARTITION BY k ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ROW_NUMBER() OVER (PARTITION BY k ORDER BY metric DESC, id DESC)
LAG(x, 1) OVER (PARTITION BY k ORDER BY ts)
NTILE(4) OVER (ORDER BY metric DESC)

What's Next

Previous: Temporary Tables - Stage data when you need multi-statement reuse.
Next: JSONB Querying and Indexing - Query semi-structured data efficiently in PostgreSQL.
Module Overview - Return to the Advanced SQL Features index.

Concept Overview​

Why is it important?​

Where does it fit?​

Syntax & Rules​

Core Syntax​

Available Options / Parameters​

Key Rules and Considerations​

Step-by-Step Examples​

Example 1: Total Per Customer Without Collapsing Rows (Beginner)​

Example 2: Running Total With an Explicit Frame (Intermediate)​

Example 3: Top-N Per Group With row_number() (Intermediate)​

Example 4: Differences With lag() (Advanced)​

Practical Use Cases​

1) Latest Record Per Key (Dedup)​

2) Percentiles / Buckets For Reporting​

3) Moving Average​

Common Mistakes & Troubleshooting​

Mistake 1: Filtering On a Window Function In WHERE​

Mistake 2: Implicit Default Frame With ORDER BY​

Mistake 3: Non-Deterministic Ordering​

Best Practices​

Hands-On Practice​

Exercise 1 (Easy): Total per customer​

Exercise 2 (Medium): Running total per customer​

Exercise 3 (Advanced): Top payment per customer​

Connection to Other Concepts​

Visual Learning Diagram​

Common Pitfalls​

Quick Reference​

What's Next​

Concept Overview

Why is it important?

Where does it fit?

Syntax & Rules

Core Syntax

Available Options / Parameters

Key Rules and Considerations

Step-by-Step Examples

Example 1: Total Per Customer Without Collapsing Rows (Beginner)

Example 2: Running Total With an Explicit Frame (Intermediate)

Example 3: Top-N Per Group With `row_number()` (Intermediate)

Example 4: Differences With `lag()` (Advanced)

Practical Use Cases

1) Latest Record Per Key (Dedup)

2) Percentiles / Buckets For Reporting

3) Moving Average

Common Mistakes & Troubleshooting

Mistake 1: Filtering On a Window Function In `WHERE`

Mistake 2: Implicit Default Frame With `ORDER BY`

Mistake 3: Non-Deterministic Ordering

Best Practices

Hands-On Practice

Exercise 1 (Easy): Total per customer

Exercise 2 (Medium): Running total per customer

Exercise 3 (Advanced): Top payment per customer

Connection to Other Concepts

Visual Learning Diagram

Common Pitfalls

Quick Reference

What's Next