Skip to main content

Deadlocks and Retry Patterns

Learning Focus

Use this lesson to understand deadlocks and serialization failures in PostgreSQL, learn how to diagnose them with lock views and logs, and implement safe retry patterns.

Concept Overview

A deadlock occurs when two or more transactions wait for each other’s locks, forming a circular dependency that cannot resolve.

PostgreSQL has a built-in deadlock detector. When it detects a deadlock, it:

  1. chooses a "victim" transaction
  2. aborts that transaction
  3. returns SQLSTATE 40P01 (deadlock_detected) to the client
  4. allows the other transaction(s) to continue

PostgreSQL can also abort transactions in high-isolation mode even without a deadlock:

  • in SERIALIZABLE isolation, it may abort with SQLSTATE 40001 (serialization_failure)
  • this is expected under contention; the error hint typically says the transaction might succeed if retried

Why is it important?

  • Correctness: deadlocks and serialization failures abort whole transactions; without retries, you can lose legitimate user actions.
  • Reliability: safe retries turn transient conflicts into successful outcomes.
  • Performance: stable locking patterns reduce wasted work and tail latency.
  • Operability: diagnosis skills help you fix hotspots (hot rows, missing indexes, long-running transactions).

Where does it fit?

You use these patterns when building:

  • money transfers, reservation systems, and multi-row updates
  • job queues with multiple workers
  • SERIALIZABLE workflows that must behave like transactions ran one-at-a-time
  • write-heavy systems where contention is normal

Syntax & Rules

Retryable Transaction Conflicts (Common SQLSTATEs)

ConditionSQLSTATEWhen you see itTypical response
deadlock_detected40P01deadlock detector aborts your transactionretry whole transaction
serialization_failure40001SERIALIZABLE abort to prevent anomaliesretry whole transaction
lock_not_available55P03NOWAIT or lock_timeoutretry or degrade (skip/queue)

Timeouts and Logging Knobs

These settings don’t prevent deadlocks, but they help you fail fast and diagnose contention.

-- Fail if a statement waits too long on a lock
SET lock_timeout = '2s';

-- Fail if a statement runs too long
SET statement_timeout = '15s';

-- Log long lock waits (cluster setting; shown here for awareness)
-- SET log_lock_waits = on;

Available options (conceptual):

SettingScopeExampleMeaning
deadlock_timeoutserver / sessionSET deadlock_timeout = '200ms'wait before checking for deadlock cycles
lock_timeoutsession / transaction (SET LOCAL)SET LOCAL lock_timeout = '1s'cancel when waiting too long for locks
statement_timeoutsession / transaction (SET LOCAL)SET LOCAL statement_timeout = '10s'cancel long-running statements
log_lock_waitsserverlog_lock_waits = onlog lock waits beyond deadlock_timeout

Key Rules and Considerations

  • A deadlock aborts one whole transaction, not just a single statement.
  • A lock wait is different: it can resolve when the blocker commits/rolls back.
  • SERIALIZABLE aborts (SQLSTATE 40001) are expected under contention; design your code to retry.
  • Retries must be safe: either the operation is idempotent, or you guard side effects with idempotency keys / outbox.
  • The most effective deadlock prevention technique is stable lock ordering: lock or update rows in a deterministic order across code paths.

Step-by-Step Examples

These examples use multiple sessions. In psql, open two terminals:

  • Session A
  • Session B

Example 1: Create a Deadlock (Intentionally)

Setup (run once):

DROP TABLE IF EXISTS accounts;

CREATE TABLE accounts (
account_id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
balance_cents integer NOT NULL CHECK (balance_cents >= 0)
);

INSERT INTO accounts (balance_cents)
VALUES (10000), (10000);

SELECT account_id, balance_cents
FROM accounts
ORDER BY account_id;

Expected output:

 account_id | balance_cents
------------+--------------
1 | 10000
2 | 10000
(2 rows)

Session A:

BEGIN;

-- Lock account 1
UPDATE accounts
SET balance_cents = balance_cents - 100
WHERE account_id = 1;

Expected output (Session A):

BEGIN
UPDATE 1

Session B:

BEGIN;

-- Lock account 2
UPDATE accounts
SET balance_cents = balance_cents - 200
WHERE account_id = 2;

-- Now try to lock account 1 (held by Session A)
UPDATE accounts
SET balance_cents = balance_cents + 200
WHERE account_id = 1;
-- this statement waits...

Expected output up to the wait (Session B):

BEGIN
UPDATE 1
-- second UPDATE is waiting

Back to Session A (try to lock account 2, held by Session B):

UPDATE accounts
SET balance_cents = balance_cents + 100
WHERE account_id = 2;

Expected output (one session becomes the victim; example shown for Session A):

ERROR:  deadlock detected
DETAIL: Process ... waits for ShareLock on transaction ...; blocked by process ...
HINT: See server log for query details.

Explanation:

  • Each session holds one row lock and waits on the other.
  • PostgreSQL detects the cycle and aborts one transaction (SQLSTATE 40P01).
  • The surviving transaction can continue after the victim rolls back.

Cleanup:

-- In the victim session
ROLLBACK;

-- In the surviving session
COMMIT;

Example 2: Lock Wait vs Deadlock (NOWAIT and lock_timeout)

Not all lock contention is a deadlock. Sometimes you simply wait.

Session A (hold a lock):

BEGIN;
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;
-- keep the transaction open

Session B (fail fast instead of waiting):

BEGIN;

-- Option A: NOWAIT
SELECT *
FROM accounts
WHERE account_id = 1
FOR UPDATE NOWAIT;

Expected output (Session B):

ERROR:  could not obtain lock on row in relation "accounts"

Session B (alternative: time-based wait limit):

SET LOCAL lock_timeout = '500ms';

UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;

Expected output (Session B):

ERROR:  canceling statement due to lock timeout

Finish the demo:

-- Session A
COMMIT;

-- Session B
ROLLBACK;

Explanation:

  • A lock wait may resolve if the blocker commits.
  • NOWAIT and lock_timeout convert waits into fast failures (often preferable in APIs).

Example 3: Observe Blocking in pg_stat_activity

While Session B is waiting on a lock (or failing with timeouts), you can inspect blocking relationships.

Run this in any session (often a third session):

SELECT
a.pid,
a.usename,
a.state,
a.wait_event_type,
a.wait_event,
pg_blocking_pids(a.pid) AS blocking_pids,
left(a.query, 70) AS query_prefix
FROM pg_stat_activity a
WHERE a.datname = current_database()
AND a.pid <> pg_backend_pid()
ORDER BY a.pid;

Expected output (example shape):

 pid  | usename |        state        | wait_event_type |  wait_event   | blocking_pids | query_prefix
------+---------+---------------------+-----------------+---------------+---------------+------------------------------
4101 | app | idle in transaction | | | {} | UPDATE accounts SET balance...
4123 | app | active | Lock | transactionid | {4101} | UPDATE accounts SET balance...

Explanation:

  • blocking_pids tells you which backend(s) block a given session.
  • idle in transaction typically means a transaction is holding locks while doing no work (a common cause of lock storms).

Example 4: SERIALIZABLE Retry (Serialization Failure)

SERIALIZABLE provides strict guarantees, but it may abort transactions under contention.

Setup:

DROP TABLE IF EXISTS inventory;

CREATE TABLE inventory (
sku text PRIMARY KEY,
available integer NOT NULL CHECK (available >= 0)
);

INSERT INTO inventory (sku, available)
VALUES ('wrench', 1);

Session A:

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT available FROM inventory WHERE sku = 'wrench';
UPDATE inventory SET available = available - 1 WHERE sku = 'wrench';
COMMIT;

Session B (run concurrently):

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT available FROM inventory WHERE sku = 'wrench';
UPDATE inventory SET available = available - 1 WHERE sku = 'wrench';
COMMIT;

Expected output (one session succeeds; the other aborts):

ERROR:  could not serialize access due to read/write dependencies among transactions
HINT: The transaction might succeed if retried.

Explanation:

  • SQLSTATE 40001 is PostgreSQL telling you it prevented a SERIALIZABLE anomaly.
  • The correct response is to retry the whole transaction (often 3-5 times with backoff + jitter).

Practical Use Cases

1. Two-Row Transfers (Canonical Deadlock Hotspot)

Lock both rows in stable order, then update.

BEGIN;

SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;

UPDATE accounts SET balance_cents = balance_cents - 500 WHERE account_id = 1;
UPDATE accounts SET balance_cents = balance_cents + 500 WHERE account_id = 2;

COMMIT;

2. Inventory Reservations With Safe Retries

Treat 40001 and 40P01 as retryable and keep each attempt short.

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

UPDATE inventory
SET available = available - 1
WHERE sku = 'wrench' AND available > 0
RETURNING sku, available;

COMMIT;

3. Work Queue Processing (Reduce Blocking)

Workers claim different rows without waiting.

-- jobs(job_id, status, created_at, started_at)
WITH next_job AS (
SELECT job_id
FROM jobs
WHERE status = 'queued'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE jobs j
SET status = 'running', started_at = now()
FROM next_job
WHERE j.job_id = next_job.job_id
RETURNING j.job_id;

4. Batch Updates in Small, Ordered Chunks

Reduce time holding locks and reduce deadlock surface area.

WITH batch AS (
SELECT order_id
FROM orders
WHERE status = 'pending'
ORDER BY order_id
LIMIT 500
)
UPDATE orders o
SET status = 'processing'
FROM batch
WHERE o.order_id = batch.order_id;

5. Idempotency for Retry Safety

Use an idempotency key to ensure retries don’t double-apply external side effects.

-- payments(idempotency_key primary key, ...)
INSERT INTO payments (idempotency_key, amount_cents)
VALUES ('req_123', 500)
ON CONFLICT (idempotency_key) DO NOTHING;

Common Mistakes & Troubleshooting

Mistake 1: Not Retrying SQLSTATE 40P01 / 40001

Wrong approach: treat deadlock/serialization errors as hard failures.

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... do work ...
COMMIT;

Bad outcome:

ERROR:  deadlock detected
-- or
ERROR: could not serialize access due to read/write dependencies among transactions

Fix: retry the whole transaction on 40P01 and 40001 (bounded attempts).


Mistake 2: Inconsistent Lock Ordering Across Code Paths

Wrong (input order varies):

BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 10 WHERE account_id = 2;
UPDATE accounts SET balance_cents = balance_cents + 10 WHERE account_id = 1;
COMMIT;

Fix (lock deterministically):

BEGIN;
SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;
-- then update
COMMIT;

Mistake 3: Retrying Non-Idempotent Side Effects

Wrong outcome pattern: you perform external side effects (charge, email) and then the DB transaction fails and is retried.

-- DB transaction retry happens, but the side effect already occurred
BEGIN;
-- record something about the side effect
COMMIT;

Fix: use idempotency keys, outbox tables, or move side effects after commit.


Mistake 4: Leaving Transactions Open ("Idle in Transaction")

Wrong:

BEGIN;
SELECT * FROM accounts WHERE account_id = 1 FOR UPDATE;
-- pause for user input / network IO
COMMIT;

Bad outcome: long lock holds increase deadlock frequency and lock timeout failures.

Fix: do user interaction and network calls outside the transaction. Keep the transaction window tight.


Debugging Tips (Practical Checklist)

  1. In your app logs, record SQLSTATE for aborted transactions (40P01, 40001, 55P03).
  2. When you see waits, query pg_stat_activity and check pg_blocking_pids(pid).
  3. Look for long-running transactions and idle in transaction sessions.
  4. Check whether the conflicting statements touch rows in inconsistent order.
  5. Add missing indexes so updates/select-for-update target fewer rows and complete faster.

Best Practices

✅ Treat SQLSTATE 40P01 (deadlock_detected) as retryable
❌ Don’t surface deadlocks as unexplained user errors

✅ Treat SQLSTATE 40001 (serialization_failure) as retryable in SERIALIZABLE
❌ Don’t enable SERIALIZABLE without bounded retries

✅ Lock/update rows in a stable order (ORDER BY primary key)
❌ Don’t lock in user-provided order when it can vary

✅ Keep transactions short and purely database-focused
❌ Don’t hold locks while waiting on user input, HTTP calls, or long computations

✅ Use backoff + jitter for retries (avoid retry storms)
❌ Don’t retry immediately in a tight loop

✅ Use `lock_timeout` / `statement_timeout` in latency-sensitive endpoints
❌ Don’t allow unbounded lock waits

✅ Make retries safe with idempotency keys or outbox patterns
❌ Don’t retry when side effects are not protected from duplication

Hands-On Practice

Exercise 1 (Easy): Trigger and Recognize a Deadlock

Task: in two sessions, reproduce the deadlock from Example 1 and capture the error text in the victim session.

-- Your work here

Solution:

-- Session A
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 1;

-- Session B
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 2;

-- Session A
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 2;

-- Session B
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;

Exercise 2 (Medium): Prevent the Deadlock with Stable Ordering

Task: write a single transaction that locks accounts 1 and 2 in ascending order and then performs a transfer.

-- Your work here

Solution:

BEGIN;

SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;

UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 1;
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 2;

COMMIT;

Exercise 3 (Advanced): Retryable Transfer Function

Task: create a PL/pgSQL function that retries a transfer up to 3 times when it hits deadlock_detected or serialization_failure.

-- Your work here

Solution:

CREATE OR REPLACE FUNCTION safe_transfer(
p_from integer,
p_to integer,
p_amount integer
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
v_attempt integer := 0;
BEGIN
IF p_from = p_to THEN
RAISE EXCEPTION 'from and to must differ';
END IF;

LOOP
v_attempt := v_attempt + 1;

BEGIN
-- Stable ordering lock of both accounts
PERFORM 1
FROM accounts
WHERE account_id IN (LEAST(p_from, p_to), GREATEST(p_from, p_to))
ORDER BY account_id
FOR UPDATE;

UPDATE accounts SET balance_cents = balance_cents - p_amount WHERE account_id = p_from;
UPDATE accounts SET balance_cents = balance_cents + p_amount WHERE account_id = p_to;

RETURN;
EXCEPTION
WHEN deadlock_detected OR serialization_failure THEN
IF v_attempt >= 3 THEN
RAISE;
END IF;

-- Backoff + jitter (simple)
PERFORM pg_sleep(0.05 * v_attempt + random() * 0.05);
END;
END LOOP;
END;
$$;

Connection to Other Concepts

Related conceptWhy it matters
ACID and Transaction ControlDeadlocks/serialization failures abort whole transactions; you need to understand rollback scope and savepoints.
Isolation Levels and LockingSERIALIZABLE requires retries, and row lock patterns (FOR UPDATE, NOWAIT, SKIP LOCKED) shape contention.
Index strategy and maintenanceMissing indexes can widen lock scope and extend lock hold time, increasing conflicts.
Background jobs and schedulingWorker concurrency often needs SKIP LOCKED patterns to avoid blocking piles.

Visual Learning Diagram

flowchart TD
A["Session A\nlocks Row 1"] --> B["Session B\nlocks Row 2"]
B --> C["Session A\nwaits for Row 2"]
A --> D["Session B\nwaits for Row 1"]
C --> E["Circular wait\n= deadlock"]
D --> E
E --> F["PostgreSQL aborts\nvictim txn\nSQLSTATE 40P01"]
F --> G["App retries\n(backoff + jitter)"]
G --> H["Stable ordering\nreduces frequency"]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H allNodes
class E,F,G highlight

Common Pitfalls

PitfallConsequencePrevention
Retrying without idempotencyduplicate charges/emails/tasksidempotency keys, outbox pattern, side effects after commit
Inconsistent lock orderingrepeated deadlocks between the same code pathslock rows deterministically (ORDER BY PK)
Long transactionsmore lock hold time; increased conflictsshort transactions; avoid "idle in transaction"
Missing indexes on hot WHERE clausesslow updates under lock; bigger conflict windowadd indexes; tighten predicates
No timeoutsrequests hang and pile up under contentionset lock_timeout and statement_timeout
Aggressive retriesretry storms and latency spikesbounded retries + exponential backoff + jitter

Quick Reference

SELECT pid, pg_blocking_pids(pid) AS blocking_pids FROM pg_stat_activity WHERE wait_event_type = 'Lock';
SET lock_timeout = '2s';
SET statement_timeout = '15s';
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT job_id FROM jobs WHERE status = 'queued' ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1;

What's Next