Index Strategy and Maintenance
Use this lesson to choose the right PostgreSQL index type, build indexes that match query predicates and ordering, and verify improvements with EXPLAIN (ANALYZE, BUFFERS).
Concept Overview
Indexes provide faster access paths for reads, but they add overhead to writes and storage.
Good indexing is evidence-driven:
- Identify real queries (slow logs, metrics,
pg_stat_statements). - Inspect execution plans.
- Add/adjust indexes to match the query shape.
- Measure again.
Why is it important?
- Performance: avoid full table scans for selective queries
- Predictability: stable response times under load
- Scalability: indexes can turn O(n) scans into O(log n) lookups
- Operational hygiene: maintenance prevents bloat and keeps planner stats accurate
Where does it fit?
Indexing connects schema design to query performance:
- PK/UNIQUE constraints create indexes automatically
- FK columns often need manual indexing
- analytics queries benefit from composite and covering indexes
Syntax & Rules
Core Syntax
CREATE INDEX index_name ON table_name (column_name);
DROP INDEX index_name;
Common variations:
-- Composite index
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at DESC);
-- Expression index
CREATE INDEX idx_users_email_lower ON users ((lower(email)));
-- Partial index
CREATE INDEX idx_orders_paid ON orders (created_at) WHERE status = 'paid';
-- Covering index (include extra columns)
CREATE INDEX idx_orders_customer ON orders (customer_id) INCLUDE (amount, status);
-- Concurrent build (minimize blocking writes)
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);
Available Options / Parameters
| Feature | Example | Meaning / Notes |
|---|---|---|
UNIQUE | CREATE UNIQUE INDEX ... | enforce uniqueness at index level |
CONCURRENTLY | CREATE INDEX CONCURRENTLY ... | builds index with less blocking; cannot run in a transaction block |
IF NOT EXISTS | CREATE INDEX IF NOT EXISTS ... | avoid error if already present |
| Sort order | (created_at DESC) | helps ORDER BY + LIMIT patterns |
INCLUDE | INCLUDE (amount) | covering columns for index-only scans |
WHERE (partial) | WHERE status='paid' | index only matching rows |
| Index method | USING gin (...) | choose type: btree (default), gin, brin, gist, etc. |
Key Rules and Considerations
- The index must match the query predicate and ordering to be useful.
- Composite indexes are order-sensitive: put equality filters first, then range/order columns.
- Partial and expression indexes only help queries that match their predicate/expression.
CREATE INDEX CONCURRENTLYis slower but reduces write blocking; it cannot run insideBEGIN/COMMIT.- Indexes are not free: each additional index increases insert/update/delete cost.
Step-by-Step Examples
Use this setup for the examples:
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
created_at timestamptz NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0)
);
INSERT INTO orders (customer_id, status, created_at, amount) VALUES
(1, 'paid', '2026-03-01 10:00+00', 10.00),
(1, 'paid', '2026-03-02 10:00+00', 25.00),
(1, 'failed', '2026-03-03 10:00+00', 5.00),
(2, 'paid', '2026-03-01 12:00+00', 50.00),
(2, 'paid', '2026-03-05 09:00+00', 20.00),
(3, 'paid', '2026-03-02 08:00+00', 15.00);
Example 1: Turn a Seq Scan Into an Index Scan (Beginner)
EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 2;
Expected output (example):
Seq Scan on orders (cost=0.00..1.07 rows=2 width=16)
Filter: (customer_id = 2)
Add an index and re-check:
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 2;
Expected output (example):
Index Scan using idx_orders_customer_id on orders (cost=0.13..8.15 rows=2 width=16)
Index Cond: (customer_id = 2)
Explanation:
- PostgreSQL can use the index to find matching rows faster when the predicate is selective.
Example 2: Match WHERE + ORDER BY + LIMIT With a Composite Index (Intermediate)
Query: "latest 2 paid orders for customer 1":
EXPLAIN
SELECT order_id, created_at, amount
FROM orders
WHERE customer_id = 1 AND status = 'paid'
ORDER BY created_at DESC
LIMIT 2;
Expected output (example):
Sort (cost=...)
Sort Key: created_at DESC
-> Seq Scan on orders (cost=...)
Create a composite index aligned with the query:
CREATE INDEX idx_orders_cust_status_created
ON orders (customer_id, status, created_at DESC);
EXPLAIN
SELECT order_id, created_at, amount
FROM orders
WHERE customer_id = 1 AND status = 'paid'
ORDER BY created_at DESC
LIMIT 2;
Expected output (example):
Limit (cost=...)
-> Index Scan using idx_orders_cust_status_created on orders (cost=...)
Index Cond: ((customer_id = 1) AND (status = 'paid'))
Explanation:
- With the right index order, PostgreSQL can return the newest rows without a separate sort.
Example 3: Partial Index for Skewed Predicates (Intermediate)
If most orders are not paid, a partial index can be smaller and faster:
CREATE INDEX idx_orders_paid_created
ON orders (created_at DESC)
WHERE status = 'paid';
EXPLAIN
SELECT order_id, created_at
FROM orders
WHERE status = 'paid'
ORDER BY created_at DESC
LIMIT 3;
Expected output (example):
Limit (cost=...)
-> Index Scan using idx_orders_paid_created on orders (cost=...)
Filter: (status = 'paid')
Explanation:
- A partial index only contains rows where
status='paid'.
Example 4: Expression Index for Case-Insensitive Search (Advanced)
Setup:
DROP TABLE IF EXISTS users;
CREATE TABLE users (
user_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL
);
INSERT INTO users (email) VALUES
('Alice@Example.com'),
('bob@example.com');
Query:
CREATE INDEX idx_users_email_lower ON users ((lower(email)));
EXPLAIN
SELECT user_id
FROM users
WHERE lower(email) = lower('ALICE@example.com');
Expected output (example):
Index Scan using idx_users_email_lower on users (cost=...)
Index Cond: (lower(email) = 'alice@example.com'::text)
Explanation:
- An expression index works only if the query uses the same expression.
Practical Use Cases
1) Top-N Per Customer
Context: recent orders per customer.
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);
2) Soft Delete and Active-Only Lookups
Context: most queries ignore deleted rows.
CREATE INDEX idx_users_active_email
ON users (email)
WHERE deleted_at IS NULL;
3) JSONB Containment Queries
Context: frequent @> searches on a jsonb payload.
CREATE INDEX idx_events_payload_gin
ON events
USING gin (payload);
4) Huge Time-Series Tables
Context: data is naturally ordered by time.
CREATE INDEX idx_metrics_created_brin
ON metrics
USING brin (created_at);
Common Mistakes & Troubleshooting
Mistake 1: Indexing Every Column
What happens: writes get slower and disk grows.
Fix: index for real query predicates and join keys.
Mistake 2: Wrong Composite Index Order
Wrong:
CREATE INDEX idx_orders_created_customer
ON orders (created_at, customer_id);
What happens: this is often less useful for WHERE customer_id = ... ORDER BY created_at.
Fix:
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);
Mistake 3: Expecting an Index to Help Low-Selectivity Filters
What happens: if most rows match, a sequential scan can be faster.
Fix: consider partial indexes, better predicates, or pre-aggregation.
Mistake 4: Using CREATE INDEX CONCURRENTLY Inside a Transaction
Wrong:
BEGIN;
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);
COMMIT;
Typical error:
ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
Fix: run CREATE INDEX CONCURRENTLY as a standalone statement.
Debugging tips:
- Always start with
EXPLAIN (ANALYZE, BUFFERS)on the real query. - Confirm the predicate matches the index definition (expression/partial indexes must match exactly).
- Run
ANALYZEafter large data changes so planner statistics are accurate. - Use
\d+ tableinpsqlto list existing indexes.
Best Practices
- ✅ Design indexes for queries, not tables; ❌ add indexes "just in case".
- ✅ Use composite indexes that match
WHERE+ORDER BY; ❌ rely on separate single-column indexes for every pattern. - ✅ Use partial and expression indexes for targeted workloads; ❌ create huge general indexes when only a subset matters.
- ✅ Verify improvements with
EXPLAIN (ANALYZE, BUFFERS); ❌ assume an index is used. - ✅ Keep index count reasonable; ❌ let every feature add multiple indexes without review.
Hands-On Practice
Use this setup for the exercises:
DROP TABLE IF EXISTS practice_logs;
CREATE TABLE practice_logs (
log_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
level text NOT NULL,
created_at timestamptz NOT NULL,
message text NOT NULL
);
INSERT INTO practice_logs (level, created_at, message) VALUES
('info', '2026-03-01 10:00+00', 'started'),
('error', '2026-03-01 10:05+00', 'failed'),
('info', '2026-03-01 10:06+00', 'retry');
Exercise 1 (Easy): Add a simple index
Task: Create an index to speed up WHERE level = 'error'.
-- Your SQL here
Solution:
CREATE INDEX idx_practice_logs_level
ON practice_logs (level);
Exercise 2 (Medium): Create a composite index for ordering
Task: Create an index that helps WHERE level = ... ORDER BY created_at DESC LIMIT ....
-- Your SQL here
Solution:
CREATE INDEX idx_practice_logs_level_created
ON practice_logs (level, created_at DESC);
Exercise 3 (Advanced): Create a partial index
Task: Create a partial index that only indexes error logs.
-- Your SQL here
Solution:
CREATE INDEX idx_practice_logs_error_created
ON practice_logs (created_at DESC)
WHERE level = 'error';
Connection to Other Concepts
| Concept | Why it matters |
|---|---|
| Primary/foreign keys | PK/UNIQUE create indexes; FK columns often need indexes |
| Query planning | indexes influence plans; validate with EXPLAIN |
| JSONB | GIN indexes make containment queries fast |
| Vacuum/analyze | maintenance affects visibility maps and planner stats |
| Transactions | concurrent index builds cannot run in a transaction block |
Visual Learning Diagram
flowchart LR
A[Query Pattern] --> B[Predicate + Order]
B --> C[Choose Index Type]
C --> D[B-tree\n(equality/range/order)]
C --> E[GIN\n(jsonb/arrays)]
C --> F[BRIN\n(huge ordered tables)]
B --> G[Composite/Partial/Expression]
G --> H[EXPLAIN (ANALYZE) verify]
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H allNodes
class H highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Index doesn't match query | planner ignores it | align columns, order, and expressions |
| Too many indexes | slow writes, big disk | review indexes periodically |
| Missing ANALYZE | bad plans | ensure stats are up to date |
| Wrong composite order | limited usefulness | equality columns first, then range/order |
| Concurrent index in transaction | immediate error | run concurrently builds standalone |
Quick Reference
CREATE INDEX ON t (col)
CREATE INDEX ON t (a, b DESC)
CREATE INDEX ON t ((lower(col)))
CREATE INDEX ON t (col) WHERE active = true
EXPLAIN (ANALYZE, BUFFERS) SELECT ...
What's Next
- Previous: Column and Table Constraints - Enforce data quality with constraints.
- Continue to 11. Date and Time Functions - Work with timestamps, intervals, and time bucketing.
- Module Overview - Return to Constraints and Indexes overview.