Index Strategy and Maintenance

Learning Focus

Use this lesson to choose the right PostgreSQL index type, build indexes that match query predicates and ordering, and verify improvements with EXPLAIN (ANALYZE, BUFFERS).

Concept Overview

Indexes provide faster access paths for reads, but they add overhead to writes and storage.

Good indexing is evidence-driven:

Identify real queries (slow logs, metrics, pg_stat_statements).
Inspect execution plans.
Add/adjust indexes to match the query shape.
Measure again.

Why is it important?

Performance: avoid full table scans for selective queries
Predictability: stable response times under load
Scalability: indexes can turn O(n) scans into O(log n) lookups
Operational hygiene: maintenance prevents bloat and keeps planner stats accurate

Where does it fit?

Indexing connects schema design to query performance:

PK/UNIQUE constraints create indexes automatically
FK columns often need manual indexing
analytics queries benefit from composite and covering indexes

Syntax & Rules

Core Syntax

CREATE INDEX index_name ON table_name (column_name);
DROP INDEX index_name;

Common variations:

-- Composite index
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at DESC);

-- Expression index
CREATE INDEX idx_users_email_lower ON users ((lower(email)));

-- Partial index
CREATE INDEX idx_orders_paid ON orders (created_at) WHERE status = 'paid';

-- Covering index (include extra columns)
CREATE INDEX idx_orders_customer ON orders (customer_id) INCLUDE (amount, status);

-- Concurrent build (minimize blocking writes)
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

Available Options / Parameters

Feature	Example	Meaning / Notes
`UNIQUE`	`CREATE UNIQUE INDEX ...`	enforce uniqueness at index level
`CONCURRENTLY`	`CREATE INDEX CONCURRENTLY ...`	builds index with less blocking; cannot run in a transaction block
`IF NOT EXISTS`	`CREATE INDEX IF NOT EXISTS ...`	avoid error if already present
Sort order	`(created_at DESC)`	helps `ORDER BY` + `LIMIT` patterns
`INCLUDE`	`INCLUDE (amount)`	covering columns for index-only scans
`WHERE` (partial)	`WHERE status='paid'`	index only matching rows
Index method	`USING gin (...)`	choose type: btree (default), gin, brin, gist, etc.

Key Rules and Considerations

The index must match the query predicate and ordering to be useful.
Composite indexes are order-sensitive: put equality filters first, then range/order columns.
Partial and expression indexes only help queries that match their predicate/expression.
CREATE INDEX CONCURRENTLY is slower but reduces write blocking; it cannot run inside BEGIN/COMMIT.
Indexes are not free: each additional index increases insert/update/delete cost.

Step-by-Step Examples

Use this setup for the examples:

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
  order_id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL,
  status      text NOT NULL,
  created_at  timestamptz NOT NULL,
  amount      numeric(12,2) NOT NULL CHECK (amount >= 0)
);

INSERT INTO orders (customer_id, status, created_at, amount) VALUES
  (1, 'paid',   '2026-03-01 10:00+00', 10.00),
  (1, 'paid',   '2026-03-02 10:00+00', 25.00),
  (1, 'failed', '2026-03-03 10:00+00',  5.00),
  (2, 'paid',   '2026-03-01 12:00+00', 50.00),
  (2, 'paid',   '2026-03-05 09:00+00', 20.00),
  (3, 'paid',   '2026-03-02 08:00+00', 15.00);

Example 1: Turn a Seq Scan Into an Index Scan (Beginner)

EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 2;

Expected output (example):

 Seq Scan on orders  (cost=0.00..1.07 rows=2 width=16)
   Filter: (customer_id = 2)

Add an index and re-check:

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 2;

Expected output (example):

 Index Scan using idx_orders_customer_id on orders  (cost=0.13..8.15 rows=2 width=16)
   Index Cond: (customer_id = 2)

Explanation:

PostgreSQL can use the index to find matching rows faster when the predicate is selective.

Example 2: Match `WHERE` + `ORDER BY` + `LIMIT` With a Composite Index (Intermediate)

Query: "latest 2 paid orders for customer 1":

EXPLAIN
SELECT order_id, created_at, amount
FROM orders
WHERE customer_id = 1 AND status = 'paid'
ORDER BY created_at DESC
LIMIT 2;

Expected output (example):

 Sort  (cost=...)
   Sort Key: created_at DESC
   ->  Seq Scan on orders  (cost=...)

Create a composite index aligned with the query:

CREATE INDEX idx_orders_cust_status_created
ON orders (customer_id, status, created_at DESC);

EXPLAIN
SELECT order_id, created_at, amount
FROM orders
WHERE customer_id = 1 AND status = 'paid'
ORDER BY created_at DESC
LIMIT 2;

Expected output (example):

 Limit  (cost=...)
   ->  Index Scan using idx_orders_cust_status_created on orders  (cost=...)
         Index Cond: ((customer_id = 1) AND (status = 'paid'))

Explanation:

With the right index order, PostgreSQL can return the newest rows without a separate sort.

Example 3: Partial Index for Skewed Predicates (Intermediate)

If most orders are not paid, a partial index can be smaller and faster:

CREATE INDEX idx_orders_paid_created
ON orders (created_at DESC)
WHERE status = 'paid';

EXPLAIN
SELECT order_id, created_at
FROM orders
WHERE status = 'paid'
ORDER BY created_at DESC
LIMIT 3;

Expected output (example):

 Limit  (cost=...)
   ->  Index Scan using idx_orders_paid_created on orders  (cost=...)
         Filter: (status = 'paid')

Explanation:

A partial index only contains rows where status='paid'.

Example 4: Expression Index for Case-Insensitive Search (Advanced)

Setup:

DROP TABLE IF EXISTS users;

CREATE TABLE users (
  user_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email text NOT NULL
);

INSERT INTO users (email) VALUES
  ('Alice@Example.com'),
  ('bob@example.com');

Query:

CREATE INDEX idx_users_email_lower ON users ((lower(email)));

EXPLAIN
SELECT user_id
FROM users
WHERE lower(email) = lower('ALICE@example.com');

Expected output (example):

 Index Scan using idx_users_email_lower on users  (cost=...)
   Index Cond: (lower(email) = 'alice@example.com'::text)

Explanation:

An expression index works only if the query uses the same expression.

Practical Use Cases

1) Top-N Per Customer

Context: recent orders per customer.

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

2) Soft Delete and Active-Only Lookups

Context: most queries ignore deleted rows.

CREATE INDEX idx_users_active_email
ON users (email)
WHERE deleted_at IS NULL;

3) JSONB Containment Queries

Context: frequent @> searches on a jsonb payload.

CREATE INDEX idx_events_payload_gin
ON events
USING gin (payload);

4) Huge Time-Series Tables

Context: data is naturally ordered by time.

CREATE INDEX idx_metrics_created_brin
ON metrics
USING brin (created_at);

Common Mistakes & Troubleshooting

Mistake 1: Indexing Every Column

What happens: writes get slower and disk grows.

Fix: index for real query predicates and join keys.

Mistake 2: Wrong Composite Index Order

Wrong:

CREATE INDEX idx_orders_created_customer
ON orders (created_at, customer_id);

What happens: this is often less useful for WHERE customer_id = ... ORDER BY created_at.

Fix:

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

Mistake 3: Expecting an Index to Help Low-Selectivity Filters

What happens: if most rows match, a sequential scan can be faster.

Fix: consider partial indexes, better predicates, or pre-aggregation.

Mistake 4: Using `CREATE INDEX CONCURRENTLY` Inside a Transaction

Wrong:

BEGIN;
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);
COMMIT;

Typical error:

ERROR:  CREATE INDEX CONCURRENTLY cannot run inside a transaction block

Fix: run CREATE INDEX CONCURRENTLY as a standalone statement.

Debugging tips:

Always start with EXPLAIN (ANALYZE, BUFFERS) on the real query.
Confirm the predicate matches the index definition (expression/partial indexes must match exactly).
Run ANALYZE after large data changes so planner statistics are accurate.
Use \d+ table in psql to list existing indexes.

Best Practices

✅ Design indexes for queries, not tables; ❌ add indexes "just in case".
✅ Use composite indexes that match WHERE + ORDER BY; ❌ rely on separate single-column indexes for every pattern.
✅ Use partial and expression indexes for targeted workloads; ❌ create huge general indexes when only a subset matters.
✅ Verify improvements with EXPLAIN (ANALYZE, BUFFERS); ❌ assume an index is used.
✅ Keep index count reasonable; ❌ let every feature add multiple indexes without review.

Hands-On Practice

Use this setup for the exercises:

DROP TABLE IF EXISTS practice_logs;

CREATE TABLE practice_logs (
  log_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  level text NOT NULL,
  created_at timestamptz NOT NULL,
  message text NOT NULL
);

INSERT INTO practice_logs (level, created_at, message) VALUES
  ('info',  '2026-03-01 10:00+00', 'started'),
  ('error', '2026-03-01 10:05+00', 'failed'),
  ('info',  '2026-03-01 10:06+00', 'retry');

Exercise 1 (Easy): Add a simple index

Task: Create an index to speed up WHERE level = 'error'.

-- Your SQL here

Solution:

CREATE INDEX idx_practice_logs_level
ON practice_logs (level);

Exercise 2 (Medium): Create a composite index for ordering

Task: Create an index that helps WHERE level = ... ORDER BY created_at DESC LIMIT ....

-- Your SQL here

Solution:

CREATE INDEX idx_practice_logs_level_created
ON practice_logs (level, created_at DESC);

Exercise 3 (Advanced): Create a partial index

Task: Create a partial index that only indexes error logs.

-- Your SQL here

Solution:

CREATE INDEX idx_practice_logs_error_created
ON practice_logs (created_at DESC)
WHERE level = 'error';

Connection to Other Concepts

Concept	Why it matters
Primary/foreign keys	PK/UNIQUE create indexes; FK columns often need indexes
Query planning	indexes influence plans; validate with `EXPLAIN`
JSONB	GIN indexes make containment queries fast
Vacuum/analyze	maintenance affects visibility maps and planner stats
Transactions	concurrent index builds cannot run in a transaction block

Visual Learning Diagram

flowchart LR
  A[Query Pattern] --> B[Predicate + Order]
  B --> C[Choose Index Type]
  C --> D[B-tree\n(equality/range/order)]
  C --> E[GIN\n(jsonb/arrays)]
  C --> F[BRIN\n(huge ordered tables)]
  B --> G[Composite/Partial/Expression]
  G --> H[EXPLAIN (ANALYZE) verify]

  classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
  classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
  class A,B,C,D,E,F,G,H allNodes
  class H highlight

Common Pitfalls

Pitfall	Consequence	Prevention
Index doesn't match query	planner ignores it	align columns, order, and expressions
Too many indexes	slow writes, big disk	review indexes periodically
Missing ANALYZE	bad plans	ensure stats are up to date
Wrong composite order	limited usefulness	equality columns first, then range/order
Concurrent index in transaction	immediate error	run concurrently builds standalone

Quick Reference

CREATE INDEX ON t (col)
CREATE INDEX ON t (a, b DESC)
CREATE INDEX ON t ((lower(col)))
CREATE INDEX ON t (col) WHERE active = true
EXPLAIN (ANALYZE, BUFFERS) SELECT ...

What's Next

Previous: Column and Table Constraints - Enforce data quality with constraints.
Continue to 11. Date and Time Functions - Work with timestamps, intervals, and time bucketing.
Module Overview - Return to Constraints and Indexes overview.

Concept Overview​

Why is it important?​

Where does it fit?​

Syntax & Rules​

Core Syntax​

Available Options / Parameters​

Key Rules and Considerations​

Step-by-Step Examples​

Example 1: Turn a Seq Scan Into an Index Scan (Beginner)​

Example 2: Match WHERE + ORDER BY + LIMIT With a Composite Index (Intermediate)​

Example 3: Partial Index for Skewed Predicates (Intermediate)​

Example 4: Expression Index for Case-Insensitive Search (Advanced)​

Practical Use Cases​

1) Top-N Per Customer​

2) Soft Delete and Active-Only Lookups​

3) JSONB Containment Queries​

4) Huge Time-Series Tables​

Common Mistakes & Troubleshooting​

Mistake 1: Indexing Every Column​

Mistake 2: Wrong Composite Index Order​

Mistake 3: Expecting an Index to Help Low-Selectivity Filters​

Mistake 4: Using CREATE INDEX CONCURRENTLY Inside a Transaction​

Best Practices​

Hands-On Practice​

Exercise 1 (Easy): Add a simple index​

Exercise 2 (Medium): Create a composite index for ordering​

Exercise 3 (Advanced): Create a partial index​

Connection to Other Concepts​

Visual Learning Diagram​

Common Pitfalls​

Quick Reference​

What's Next​

Concept Overview

Why is it important?

Where does it fit?

Syntax & Rules

Core Syntax

Available Options / Parameters

Key Rules and Considerations

Step-by-Step Examples

Example 1: Turn a Seq Scan Into an Index Scan (Beginner)

Example 2: Match `WHERE` + `ORDER BY` + `LIMIT` With a Composite Index (Intermediate)

Example 3: Partial Index for Skewed Predicates (Intermediate)

Example 4: Expression Index for Case-Insensitive Search (Advanced)

Practical Use Cases

1) Top-N Per Customer

2) Soft Delete and Active-Only Lookups

3) JSONB Containment Queries

4) Huge Time-Series Tables

Common Mistakes & Troubleshooting

Mistake 1: Indexing Every Column

Mistake 2: Wrong Composite Index Order

Mistake 3: Expecting an Index to Help Low-Selectivity Filters

Mistake 4: Using `CREATE INDEX CONCURRENTLY` Inside a Transaction

Best Practices

Hands-On Practice

Exercise 1 (Easy): Add a simple index

Exercise 2 (Medium): Create a composite index for ordering

Exercise 3 (Advanced): Create a partial index

Connection to Other Concepts

Visual Learning Diagram

Common Pitfalls

Quick Reference

What's Next