Skip to main content

Index Strategy and Maintenance

Learning Focus

Use this lesson to choose the right PostgreSQL index type, build indexes that match query predicates and ordering, and verify improvements with EXPLAIN (ANALYZE, BUFFERS).

Concept Overview

Indexes provide faster access paths for reads, but they add overhead to writes and storage.

Good indexing is evidence-driven:

  1. Identify real queries (slow logs, metrics, pg_stat_statements).
  2. Inspect execution plans.
  3. Add/adjust indexes to match the query shape.
  4. Measure again.

Why is it important?

  • Performance: avoid full table scans for selective queries
  • Predictability: stable response times under load
  • Scalability: indexes can turn O(n) scans into O(log n) lookups
  • Operational hygiene: maintenance prevents bloat and keeps planner stats accurate

Where does it fit?

Indexing connects schema design to query performance:

  • PK/UNIQUE constraints create indexes automatically
  • FK columns often need manual indexing
  • analytics queries benefit from composite and covering indexes

Syntax & Rules

Core Syntax

CREATE INDEX index_name ON table_name (column_name);
DROP INDEX index_name;

Common variations:

-- Composite index
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at DESC);

-- Expression index
CREATE INDEX idx_users_email_lower ON users ((lower(email)));

-- Partial index
CREATE INDEX idx_orders_paid ON orders (created_at) WHERE status = 'paid';

-- Covering index (include extra columns)
CREATE INDEX idx_orders_customer ON orders (customer_id) INCLUDE (amount, status);

-- Concurrent build (minimize blocking writes)
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

Available Options / Parameters

FeatureExampleMeaning / Notes
UNIQUECREATE UNIQUE INDEX ...enforce uniqueness at index level
CONCURRENTLYCREATE INDEX CONCURRENTLY ...builds index with less blocking; cannot run in a transaction block
IF NOT EXISTSCREATE INDEX IF NOT EXISTS ...avoid error if already present
Sort order(created_at DESC)helps ORDER BY + LIMIT patterns
INCLUDEINCLUDE (amount)covering columns for index-only scans
WHERE (partial)WHERE status='paid'index only matching rows
Index methodUSING gin (...)choose type: btree (default), gin, brin, gist, etc.

Key Rules and Considerations

  • The index must match the query predicate and ordering to be useful.
  • Composite indexes are order-sensitive: put equality filters first, then range/order columns.
  • Partial and expression indexes only help queries that match their predicate/expression.
  • CREATE INDEX CONCURRENTLY is slower but reduces write blocking; it cannot run inside BEGIN/COMMIT.
  • Indexes are not free: each additional index increases insert/update/delete cost.

Step-by-Step Examples

Use this setup for the examples:

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
created_at timestamptz NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0)
);

INSERT INTO orders (customer_id, status, created_at, amount) VALUES
(1, 'paid', '2026-03-01 10:00+00', 10.00),
(1, 'paid', '2026-03-02 10:00+00', 25.00),
(1, 'failed', '2026-03-03 10:00+00', 5.00),
(2, 'paid', '2026-03-01 12:00+00', 50.00),
(2, 'paid', '2026-03-05 09:00+00', 20.00),
(3, 'paid', '2026-03-02 08:00+00', 15.00);

Example 1: Turn a Seq Scan Into an Index Scan (Beginner)

EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 2;

Expected output (example):

 Seq Scan on orders  (cost=0.00..1.07 rows=2 width=16)
Filter: (customer_id = 2)

Add an index and re-check:

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 2;

Expected output (example):

 Index Scan using idx_orders_customer_id on orders  (cost=0.13..8.15 rows=2 width=16)
Index Cond: (customer_id = 2)

Explanation:

  • PostgreSQL can use the index to find matching rows faster when the predicate is selective.

Example 2: Match WHERE + ORDER BY + LIMIT With a Composite Index (Intermediate)

Query: "latest 2 paid orders for customer 1":

EXPLAIN
SELECT order_id, created_at, amount
FROM orders
WHERE customer_id = 1 AND status = 'paid'
ORDER BY created_at DESC
LIMIT 2;

Expected output (example):

 Sort  (cost=...)
Sort Key: created_at DESC
-> Seq Scan on orders (cost=...)

Create a composite index aligned with the query:

CREATE INDEX idx_orders_cust_status_created
ON orders (customer_id, status, created_at DESC);

EXPLAIN
SELECT order_id, created_at, amount
FROM orders
WHERE customer_id = 1 AND status = 'paid'
ORDER BY created_at DESC
LIMIT 2;

Expected output (example):

 Limit  (cost=...)
-> Index Scan using idx_orders_cust_status_created on orders (cost=...)
Index Cond: ((customer_id = 1) AND (status = 'paid'))

Explanation:

  • With the right index order, PostgreSQL can return the newest rows without a separate sort.

Example 3: Partial Index for Skewed Predicates (Intermediate)

If most orders are not paid, a partial index can be smaller and faster:

CREATE INDEX idx_orders_paid_created
ON orders (created_at DESC)
WHERE status = 'paid';

EXPLAIN
SELECT order_id, created_at
FROM orders
WHERE status = 'paid'
ORDER BY created_at DESC
LIMIT 3;

Expected output (example):

 Limit  (cost=...)
-> Index Scan using idx_orders_paid_created on orders (cost=...)
Filter: (status = 'paid')

Explanation:

  • A partial index only contains rows where status='paid'.

Example 4: Expression Index for Case-Insensitive Search (Advanced)

Setup:

DROP TABLE IF EXISTS users;

CREATE TABLE users (
user_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL
);

INSERT INTO users (email) VALUES
('Alice@Example.com'),
('bob@example.com');

Query:

CREATE INDEX idx_users_email_lower ON users ((lower(email)));

EXPLAIN
SELECT user_id
FROM users
WHERE lower(email) = lower('ALICE@example.com');

Expected output (example):

 Index Scan using idx_users_email_lower on users  (cost=...)
Index Cond: (lower(email) = 'alice@example.com'::text)

Explanation:

  • An expression index works only if the query uses the same expression.

Practical Use Cases

1) Top-N Per Customer

Context: recent orders per customer.

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

2) Soft Delete and Active-Only Lookups

Context: most queries ignore deleted rows.

CREATE INDEX idx_users_active_email
ON users (email)
WHERE deleted_at IS NULL;

3) JSONB Containment Queries

Context: frequent @> searches on a jsonb payload.

CREATE INDEX idx_events_payload_gin
ON events
USING gin (payload);

4) Huge Time-Series Tables

Context: data is naturally ordered by time.

CREATE INDEX idx_metrics_created_brin
ON metrics
USING brin (created_at);

Common Mistakes & Troubleshooting

Mistake 1: Indexing Every Column

What happens: writes get slower and disk grows.

Fix: index for real query predicates and join keys.


Mistake 2: Wrong Composite Index Order

Wrong:

CREATE INDEX idx_orders_created_customer
ON orders (created_at, customer_id);

What happens: this is often less useful for WHERE customer_id = ... ORDER BY created_at.

Fix:

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

Mistake 3: Expecting an Index to Help Low-Selectivity Filters

What happens: if most rows match, a sequential scan can be faster.

Fix: consider partial indexes, better predicates, or pre-aggregation.


Mistake 4: Using CREATE INDEX CONCURRENTLY Inside a Transaction

Wrong:

BEGIN;
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);
COMMIT;

Typical error:

ERROR:  CREATE INDEX CONCURRENTLY cannot run inside a transaction block

Fix: run CREATE INDEX CONCURRENTLY as a standalone statement.

Debugging tips:

  1. Always start with EXPLAIN (ANALYZE, BUFFERS) on the real query.
  2. Confirm the predicate matches the index definition (expression/partial indexes must match exactly).
  3. Run ANALYZE after large data changes so planner statistics are accurate.
  4. Use \d+ table in psql to list existing indexes.

Best Practices

  • ✅ Design indexes for queries, not tables; ❌ add indexes "just in case".
  • ✅ Use composite indexes that match WHERE + ORDER BY; ❌ rely on separate single-column indexes for every pattern.
  • ✅ Use partial and expression indexes for targeted workloads; ❌ create huge general indexes when only a subset matters.
  • ✅ Verify improvements with EXPLAIN (ANALYZE, BUFFERS); ❌ assume an index is used.
  • ✅ Keep index count reasonable; ❌ let every feature add multiple indexes without review.

Hands-On Practice

Use this setup for the exercises:

DROP TABLE IF EXISTS practice_logs;

CREATE TABLE practice_logs (
log_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
level text NOT NULL,
created_at timestamptz NOT NULL,
message text NOT NULL
);

INSERT INTO practice_logs (level, created_at, message) VALUES
('info', '2026-03-01 10:00+00', 'started'),
('error', '2026-03-01 10:05+00', 'failed'),
('info', '2026-03-01 10:06+00', 'retry');

Exercise 1 (Easy): Add a simple index

Task: Create an index to speed up WHERE level = 'error'.

-- Your SQL here

Solution:

CREATE INDEX idx_practice_logs_level
ON practice_logs (level);

Exercise 2 (Medium): Create a composite index for ordering

Task: Create an index that helps WHERE level = ... ORDER BY created_at DESC LIMIT ....

-- Your SQL here

Solution:

CREATE INDEX idx_practice_logs_level_created
ON practice_logs (level, created_at DESC);

Exercise 3 (Advanced): Create a partial index

Task: Create a partial index that only indexes error logs.

-- Your SQL here

Solution:

CREATE INDEX idx_practice_logs_error_created
ON practice_logs (created_at DESC)
WHERE level = 'error';

Connection to Other Concepts

ConceptWhy it matters
Primary/foreign keysPK/UNIQUE create indexes; FK columns often need indexes
Query planningindexes influence plans; validate with EXPLAIN
JSONBGIN indexes make containment queries fast
Vacuum/analyzemaintenance affects visibility maps and planner stats
Transactionsconcurrent index builds cannot run in a transaction block

Visual Learning Diagram

flowchart LR
A[Query Pattern] --> B[Predicate + Order]
B --> C[Choose Index Type]
C --> D[B-tree\n(equality/range/order)]
C --> E[GIN\n(jsonb/arrays)]
C --> F[BRIN\n(huge ordered tables)]
B --> G[Composite/Partial/Expression]
G --> H[EXPLAIN (ANALYZE) verify]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H allNodes
class H highlight

Common Pitfalls

PitfallConsequencePrevention
Index doesn't match queryplanner ignores italign columns, order, and expressions
Too many indexesslow writes, big diskreview indexes periodically
Missing ANALYZEbad plansensure stats are up to date
Wrong composite orderlimited usefulnessequality columns first, then range/order
Concurrent index in transactionimmediate errorrun concurrently builds standalone

Quick Reference

CREATE INDEX ON t (col)
CREATE INDEX ON t (a, b DESC)
CREATE INDEX ON t ((lower(col)))
CREATE INDEX ON t (col) WHERE active = true
EXPLAIN (ANALYZE, BUFFERS) SELECT ...

What's Next