Deadlocks and Retry Patterns
Use this lesson to understand deadlocks and serialization failures in PostgreSQL, learn how to diagnose them with lock views and logs, and implement safe retry patterns.
Concept Overview
A deadlock occurs when two or more transactions wait for each other’s locks, forming a circular dependency that cannot resolve.
PostgreSQL has a built-in deadlock detector. When it detects a deadlock, it:
- chooses a "victim" transaction
- aborts that transaction
- returns SQLSTATE
40P01(deadlock_detected) to the client - allows the other transaction(s) to continue
PostgreSQL can also abort transactions in high-isolation mode even without a deadlock:
- in SERIALIZABLE isolation, it may abort with SQLSTATE
40001(serialization_failure) - this is expected under contention; the error hint typically says the transaction might succeed if retried
Why is it important?
- Correctness: deadlocks and serialization failures abort whole transactions; without retries, you can lose legitimate user actions.
- Reliability: safe retries turn transient conflicts into successful outcomes.
- Performance: stable locking patterns reduce wasted work and tail latency.
- Operability: diagnosis skills help you fix hotspots (hot rows, missing indexes, long-running transactions).
Where does it fit?
You use these patterns when building:
- money transfers, reservation systems, and multi-row updates
- job queues with multiple workers
- SERIALIZABLE workflows that must behave like transactions ran one-at-a-time
- write-heavy systems where contention is normal
Syntax & Rules
Retryable Transaction Conflicts (Common SQLSTATEs)
| Condition | SQLSTATE | When you see it | Typical response |
|---|---|---|---|
deadlock_detected | 40P01 | deadlock detector aborts your transaction | retry whole transaction |
serialization_failure | 40001 | SERIALIZABLE abort to prevent anomalies | retry whole transaction |
lock_not_available | 55P03 | NOWAIT or lock_timeout | retry or degrade (skip/queue) |
Timeouts and Logging Knobs
These settings don’t prevent deadlocks, but they help you fail fast and diagnose contention.
-- Fail if a statement waits too long on a lock
SET lock_timeout = '2s';
-- Fail if a statement runs too long
SET statement_timeout = '15s';
-- Log long lock waits (cluster setting; shown here for awareness)
-- SET log_lock_waits = on;
Available options (conceptual):
| Setting | Scope | Example | Meaning |
|---|---|---|---|
deadlock_timeout | server / session | SET deadlock_timeout = '200ms' | wait before checking for deadlock cycles |
lock_timeout | session / transaction (SET LOCAL) | SET LOCAL lock_timeout = '1s' | cancel when waiting too long for locks |
statement_timeout | session / transaction (SET LOCAL) | SET LOCAL statement_timeout = '10s' | cancel long-running statements |
log_lock_waits | server | log_lock_waits = on | log lock waits beyond deadlock_timeout |
Key Rules and Considerations
- A deadlock aborts one whole transaction, not just a single statement.
- A lock wait is different: it can resolve when the blocker commits/rolls back.
- SERIALIZABLE aborts (SQLSTATE
40001) are expected under contention; design your code to retry. - Retries must be safe: either the operation is idempotent, or you guard side effects with idempotency keys / outbox.
- The most effective deadlock prevention technique is stable lock ordering: lock or update rows in a deterministic order across code paths.
Step-by-Step Examples
These examples use multiple sessions. In psql, open two terminals:
- Session A
- Session B
Example 1: Create a Deadlock (Intentionally)
Setup (run once):
DROP TABLE IF EXISTS accounts;
CREATE TABLE accounts (
account_id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
balance_cents integer NOT NULL CHECK (balance_cents >= 0)
);
INSERT INTO accounts (balance_cents)
VALUES (10000), (10000);
SELECT account_id, balance_cents
FROM accounts
ORDER BY account_id;
Expected output:
account_id | balance_cents
------------+--------------
1 | 10000
2 | 10000
(2 rows)
Session A:
BEGIN;
-- Lock account 1
UPDATE accounts
SET balance_cents = balance_cents - 100
WHERE account_id = 1;
Expected output (Session A):
BEGIN
UPDATE 1
Session B:
BEGIN;
-- Lock account 2
UPDATE accounts
SET balance_cents = balance_cents - 200
WHERE account_id = 2;
-- Now try to lock account 1 (held by Session A)
UPDATE accounts
SET balance_cents = balance_cents + 200
WHERE account_id = 1;
-- this statement waits...
Expected output up to the wait (Session B):
BEGIN
UPDATE 1
-- second UPDATE is waiting
Back to Session A (try to lock account 2, held by Session B):
UPDATE accounts
SET balance_cents = balance_cents + 100
WHERE account_id = 2;
Expected output (one session becomes the victim; example shown for Session A):
ERROR: deadlock detected
DETAIL: Process ... waits for ShareLock on transaction ...; blocked by process ...
HINT: See server log for query details.
Explanation:
- Each session holds one row lock and waits on the other.
- PostgreSQL detects the cycle and aborts one transaction (SQLSTATE
40P01). - The surviving transaction can continue after the victim rolls back.
Cleanup:
-- In the victim session
ROLLBACK;
-- In the surviving session
COMMIT;
Example 2: Lock Wait vs Deadlock (NOWAIT and lock_timeout)
Not all lock contention is a deadlock. Sometimes you simply wait.
Session A (hold a lock):
BEGIN;
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;
-- keep the transaction open
Session B (fail fast instead of waiting):
BEGIN;
-- Option A: NOWAIT
SELECT *
FROM accounts
WHERE account_id = 1
FOR UPDATE NOWAIT;
Expected output (Session B):
ERROR: could not obtain lock on row in relation "accounts"
Session B (alternative: time-based wait limit):
SET LOCAL lock_timeout = '500ms';
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;
Expected output (Session B):
ERROR: canceling statement due to lock timeout
Finish the demo:
-- Session A
COMMIT;
-- Session B
ROLLBACK;
Explanation:
- A lock wait may resolve if the blocker commits.
NOWAITandlock_timeoutconvert waits into fast failures (often preferable in APIs).
Example 3: Observe Blocking in pg_stat_activity
While Session B is waiting on a lock (or failing with timeouts), you can inspect blocking relationships.
Run this in any session (often a third session):
SELECT
a.pid,
a.usename,
a.state,
a.wait_event_type,
a.wait_event,
pg_blocking_pids(a.pid) AS blocking_pids,
left(a.query, 70) AS query_prefix
FROM pg_stat_activity a
WHERE a.datname = current_database()
AND a.pid <> pg_backend_pid()
ORDER BY a.pid;
Expected output (example shape):
pid | usename | state | wait_event_type | wait_event | blocking_pids | query_prefix
------+---------+---------------------+-----------------+---------------+---------------+------------------------------
4101 | app | idle in transaction | | | {} | UPDATE accounts SET balance...
4123 | app | active | Lock | transactionid | {4101} | UPDATE accounts SET balance...
Explanation:
blocking_pidstells you which backend(s) block a given session.idle in transactiontypically means a transaction is holding locks while doing no work (a common cause of lock storms).
Example 4: SERIALIZABLE Retry (Serialization Failure)
SERIALIZABLE provides strict guarantees, but it may abort transactions under contention.
Setup:
DROP TABLE IF EXISTS inventory;
CREATE TABLE inventory (
sku text PRIMARY KEY,
available integer NOT NULL CHECK (available >= 0)
);
INSERT INTO inventory (sku, available)
VALUES ('wrench', 1);
Session A:
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT available FROM inventory WHERE sku = 'wrench';
UPDATE inventory SET available = available - 1 WHERE sku = 'wrench';
COMMIT;
Session B (run concurrently):
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT available FROM inventory WHERE sku = 'wrench';
UPDATE inventory SET available = available - 1 WHERE sku = 'wrench';
COMMIT;
Expected output (one session succeeds; the other aborts):
ERROR: could not serialize access due to read/write dependencies among transactions
HINT: The transaction might succeed if retried.
Explanation:
- SQLSTATE
40001is PostgreSQL telling you it prevented a SERIALIZABLE anomaly. - The correct response is to retry the whole transaction (often 3-5 times with backoff + jitter).
Practical Use Cases
1. Two-Row Transfers (Canonical Deadlock Hotspot)
Lock both rows in stable order, then update.
BEGIN;
SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;
UPDATE accounts SET balance_cents = balance_cents - 500 WHERE account_id = 1;
UPDATE accounts SET balance_cents = balance_cents + 500 WHERE account_id = 2;
COMMIT;
2. Inventory Reservations With Safe Retries
Treat 40001 and 40P01 as retryable and keep each attempt short.
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE inventory
SET available = available - 1
WHERE sku = 'wrench' AND available > 0
RETURNING sku, available;
COMMIT;
3. Work Queue Processing (Reduce Blocking)
Workers claim different rows without waiting.
-- jobs(job_id, status, created_at, started_at)
WITH next_job AS (
SELECT job_id
FROM jobs
WHERE status = 'queued'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE jobs j
SET status = 'running', started_at = now()
FROM next_job
WHERE j.job_id = next_job.job_id
RETURNING j.job_id;
4. Batch Updates in Small, Ordered Chunks
Reduce time holding locks and reduce deadlock surface area.
WITH batch AS (
SELECT order_id
FROM orders
WHERE status = 'pending'
ORDER BY order_id
LIMIT 500
)
UPDATE orders o
SET status = 'processing'
FROM batch
WHERE o.order_id = batch.order_id;
5. Idempotency for Retry Safety
Use an idempotency key to ensure retries don’t double-apply external side effects.
-- payments(idempotency_key primary key, ...)
INSERT INTO payments (idempotency_key, amount_cents)
VALUES ('req_123', 500)
ON CONFLICT (idempotency_key) DO NOTHING;
Common Mistakes & Troubleshooting
Mistake 1: Not Retrying SQLSTATE 40P01 / 40001
Wrong approach: treat deadlock/serialization errors as hard failures.
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... do work ...
COMMIT;
Bad outcome:
ERROR: deadlock detected
-- or
ERROR: could not serialize access due to read/write dependencies among transactions
Fix: retry the whole transaction on 40P01 and 40001 (bounded attempts).
Mistake 2: Inconsistent Lock Ordering Across Code Paths
Wrong (input order varies):
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 10 WHERE account_id = 2;
UPDATE accounts SET balance_cents = balance_cents + 10 WHERE account_id = 1;
COMMIT;
Fix (lock deterministically):
BEGIN;
SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;
-- then update
COMMIT;
Mistake 3: Retrying Non-Idempotent Side Effects
Wrong outcome pattern: you perform external side effects (charge, email) and then the DB transaction fails and is retried.
-- DB transaction retry happens, but the side effect already occurred
BEGIN;
-- record something about the side effect
COMMIT;
Fix: use idempotency keys, outbox tables, or move side effects after commit.
Mistake 4: Leaving Transactions Open ("Idle in Transaction")
Wrong:
BEGIN;
SELECT * FROM accounts WHERE account_id = 1 FOR UPDATE;
-- pause for user input / network IO
COMMIT;
Bad outcome: long lock holds increase deadlock frequency and lock timeout failures.
Fix: do user interaction and network calls outside the transaction. Keep the transaction window tight.
Debugging Tips (Practical Checklist)
- In your app logs, record SQLSTATE for aborted transactions (
40P01,40001,55P03). - When you see waits, query
pg_stat_activityand checkpg_blocking_pids(pid). - Look for long-running transactions and
idle in transactionsessions. - Check whether the conflicting statements touch rows in inconsistent order.
- Add missing indexes so updates/select-for-update target fewer rows and complete faster.
Best Practices
✅ Treat SQLSTATE 40P01 (deadlock_detected) as retryable
❌ Don’t surface deadlocks as unexplained user errors
✅ Treat SQLSTATE 40001 (serialization_failure) as retryable in SERIALIZABLE
❌ Don’t enable SERIALIZABLE without bounded retries
✅ Lock/update rows in a stable order (ORDER BY primary key)
❌ Don’t lock in user-provided order when it can vary
✅ Keep transactions short and purely database-focused
❌ Don’t hold locks while waiting on user input, HTTP calls, or long computations
✅ Use backoff + jitter for retries (avoid retry storms)
❌ Don’t retry immediately in a tight loop
✅ Use `lock_timeout` / `statement_timeout` in latency-sensitive endpoints
❌ Don’t allow unbounded lock waits
✅ Make retries safe with idempotency keys or outbox patterns
❌ Don’t retry when side effects are not protected from duplication
Hands-On Practice
Exercise 1 (Easy): Trigger and Recognize a Deadlock
Task: in two sessions, reproduce the deadlock from Example 1 and capture the error text in the victim session.
-- Your work here
Solution:
-- Session A
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 1;
-- Session B
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 2;
-- Session A
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 2;
-- Session B
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;
Exercise 2 (Medium): Prevent the Deadlock with Stable Ordering
Task: write a single transaction that locks accounts 1 and 2 in ascending order and then performs a transfer.
-- Your work here
Solution:
BEGIN;
SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 1;
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 2;
COMMIT;
Exercise 3 (Advanced): Retryable Transfer Function
Task: create a PL/pgSQL function that retries a transfer up to 3 times when it hits deadlock_detected or serialization_failure.
-- Your work here
Solution:
CREATE OR REPLACE FUNCTION safe_transfer(
p_from integer,
p_to integer,
p_amount integer
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
v_attempt integer := 0;
BEGIN
IF p_from = p_to THEN
RAISE EXCEPTION 'from and to must differ';
END IF;
LOOP
v_attempt := v_attempt + 1;
BEGIN
-- Stable ordering lock of both accounts
PERFORM 1
FROM accounts
WHERE account_id IN (LEAST(p_from, p_to), GREATEST(p_from, p_to))
ORDER BY account_id
FOR UPDATE;
UPDATE accounts SET balance_cents = balance_cents - p_amount WHERE account_id = p_from;
UPDATE accounts SET balance_cents = balance_cents + p_amount WHERE account_id = p_to;
RETURN;
EXCEPTION
WHEN deadlock_detected OR serialization_failure THEN
IF v_attempt >= 3 THEN
RAISE;
END IF;
-- Backoff + jitter (simple)
PERFORM pg_sleep(0.05 * v_attempt + random() * 0.05);
END;
END LOOP;
END;
$$;
Connection to Other Concepts
| Related concept | Why it matters |
|---|---|
| ACID and Transaction Control | Deadlocks/serialization failures abort whole transactions; you need to understand rollback scope and savepoints. |
| Isolation Levels and Locking | SERIALIZABLE requires retries, and row lock patterns (FOR UPDATE, NOWAIT, SKIP LOCKED) shape contention. |
| Index strategy and maintenance | Missing indexes can widen lock scope and extend lock hold time, increasing conflicts. |
| Background jobs and scheduling | Worker concurrency often needs SKIP LOCKED patterns to avoid blocking piles. |
Visual Learning Diagram
flowchart TD
A["Session A\nlocks Row 1"] --> B["Session B\nlocks Row 2"]
B --> C["Session A\nwaits for Row 2"]
A --> D["Session B\nwaits for Row 1"]
C --> E["Circular wait\n= deadlock"]
D --> E
E --> F["PostgreSQL aborts\nvictim txn\nSQLSTATE 40P01"]
F --> G["App retries\n(backoff + jitter)"]
G --> H["Stable ordering\nreduces frequency"]
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H allNodes
class E,F,G highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Retrying without idempotency | duplicate charges/emails/tasks | idempotency keys, outbox pattern, side effects after commit |
| Inconsistent lock ordering | repeated deadlocks between the same code paths | lock rows deterministically (ORDER BY PK) |
| Long transactions | more lock hold time; increased conflicts | short transactions; avoid "idle in transaction" |
| Missing indexes on hot WHERE clauses | slow updates under lock; bigger conflict window | add indexes; tighten predicates |
| No timeouts | requests hang and pile up under contention | set lock_timeout and statement_timeout |
| Aggressive retries | retry storms and latency spikes | bounded retries + exponential backoff + jitter |
Quick Reference
SELECT pid, pg_blocking_pids(pid) AS blocking_pids FROM pg_stat_activity WHERE wait_event_type = 'Lock';
SET lock_timeout = '2s';
SET statement_timeout = '15s';
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT job_id FROM jobs WHERE status = 'queued' ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1;
What's Next
- Previous: Isolation Levels and Locking - Review how locks and SERIALIZABLE behavior create retryable conflicts.
- Next: 17. User Management and Security - Secure access to data and control privileges in PostgreSQL.
- Module Overview - Return to this module index and choose another related lesson.