Deadlocks and Retry Patterns

Learning Focus

Use this lesson to understand deadlocks and serialization failures in PostgreSQL, learn how to diagnose them with lock views and logs, and implement safe retry patterns.

Concept Overview

A deadlock occurs when two or more transactions wait for each other’s locks, forming a circular dependency that cannot resolve.

PostgreSQL has a built-in deadlock detector. When it detects a deadlock, it:

chooses a "victim" transaction
aborts that transaction
returns SQLSTATE 40P01 (deadlock_detected) to the client
allows the other transaction(s) to continue

PostgreSQL can also abort transactions in high-isolation mode even without a deadlock:

in SERIALIZABLE isolation, it may abort with SQLSTATE 40001 (serialization_failure)
this is expected under contention; the error hint typically says the transaction might succeed if retried

Why is it important?

Correctness: deadlocks and serialization failures abort whole transactions; without retries, you can lose legitimate user actions.
Reliability: safe retries turn transient conflicts into successful outcomes.
Performance: stable locking patterns reduce wasted work and tail latency.
Operability: diagnosis skills help you fix hotspots (hot rows, missing indexes, long-running transactions).

Where does it fit?

You use these patterns when building:

money transfers, reservation systems, and multi-row updates
job queues with multiple workers
SERIALIZABLE workflows that must behave like transactions ran one-at-a-time
write-heavy systems where contention is normal

Syntax & Rules

Retryable Transaction Conflicts (Common SQLSTATEs)

Condition	SQLSTATE	When you see it	Typical response
`deadlock_detected`	`40P01`	deadlock detector aborts your transaction	retry whole transaction
`serialization_failure`	`40001`	SERIALIZABLE abort to prevent anomalies	retry whole transaction
`lock_not_available`	`55P03`	`NOWAIT` or `lock_timeout`	retry or degrade (skip/queue)

Timeouts and Logging Knobs

These settings don’t prevent deadlocks, but they help you fail fast and diagnose contention.

-- Fail if a statement waits too long on a lock
SET lock_timeout = '2s';

-- Fail if a statement runs too long
SET statement_timeout = '15s';

-- Log long lock waits (cluster setting; shown here for awareness)
-- SET log_lock_waits = on;

Available options (conceptual):

Setting	Scope	Example	Meaning
`deadlock_timeout`	server / session	`SET deadlock_timeout = '200ms'`	wait before checking for deadlock cycles
`lock_timeout`	session / transaction (`SET LOCAL`)	`SET LOCAL lock_timeout = '1s'`	cancel when waiting too long for locks
`statement_timeout`	session / transaction (`SET LOCAL`)	`SET LOCAL statement_timeout = '10s'`	cancel long-running statements
`log_lock_waits`	server	`log_lock_waits = on`	log lock waits beyond `deadlock_timeout`

Key Rules and Considerations

A deadlock aborts one whole transaction, not just a single statement.
A lock wait is different: it can resolve when the blocker commits/rolls back.
SERIALIZABLE aborts (SQLSTATE 40001) are expected under contention; design your code to retry.
Retries must be safe: either the operation is idempotent, or you guard side effects with idempotency keys / outbox.
The most effective deadlock prevention technique is stable lock ordering: lock or update rows in a deterministic order across code paths.

Step-by-Step Examples

These examples use multiple sessions. In psql, open two terminals:

Session A
Session B

Example 1: Create a Deadlock (Intentionally)

Setup (run once):

DROP TABLE IF EXISTS accounts;

CREATE TABLE accounts (
  account_id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  balance_cents integer NOT NULL CHECK (balance_cents >= 0)
);

INSERT INTO accounts (balance_cents)
VALUES (10000), (10000);

SELECT account_id, balance_cents
FROM accounts
ORDER BY account_id;

Expected output:

 account_id | balance_cents
------------+--------------
          1 |        10000
          2 |        10000
(2 rows)

Session A:

BEGIN;

-- Lock account 1
UPDATE accounts
SET balance_cents = balance_cents - 100
WHERE account_id = 1;

Expected output (Session A):

BEGIN
UPDATE 1

Session B:

BEGIN;

-- Lock account 2
UPDATE accounts
SET balance_cents = balance_cents - 200
WHERE account_id = 2;

-- Now try to lock account 1 (held by Session A)
UPDATE accounts
SET balance_cents = balance_cents + 200
WHERE account_id = 1;
-- this statement waits...

Expected output up to the wait (Session B):

BEGIN
UPDATE 1
-- second UPDATE is waiting

Back to Session A (try to lock account 2, held by Session B):

UPDATE accounts
SET balance_cents = balance_cents + 100
WHERE account_id = 2;

Expected output (one session becomes the victim; example shown for Session A):

ERROR:  deadlock detected
DETAIL:  Process ... waits for ShareLock on transaction ...; blocked by process ...
HINT:  See server log for query details.

Explanation:

Each session holds one row lock and waits on the other.
PostgreSQL detects the cycle and aborts one transaction (SQLSTATE 40P01).
The surviving transaction can continue after the victim rolls back.

Cleanup:

-- In the victim session
ROLLBACK;

-- In the surviving session
COMMIT;

Example 2: Lock Wait vs Deadlock (NOWAIT and lock_timeout)

Not all lock contention is a deadlock. Sometimes you simply wait.

Session A (hold a lock):

BEGIN;
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;
-- keep the transaction open

Session B (fail fast instead of waiting):

BEGIN;

-- Option A: NOWAIT
SELECT *
FROM accounts
WHERE account_id = 1
FOR UPDATE NOWAIT;

Expected output (Session B):

ERROR:  could not obtain lock on row in relation "accounts"

Session B (alternative: time-based wait limit):

SET LOCAL lock_timeout = '500ms';

UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;

Expected output (Session B):

ERROR:  canceling statement due to lock timeout

Finish the demo:

-- Session A
COMMIT;

-- Session B
ROLLBACK;

Explanation:

A lock wait may resolve if the blocker commits.
NOWAIT and lock_timeout convert waits into fast failures (often preferable in APIs).

Example 3: Observe Blocking in pg_stat_activity

While Session B is waiting on a lock (or failing with timeouts), you can inspect blocking relationships.

Run this in any session (often a third session):

SELECT
  a.pid,
  a.usename,
  a.state,
  a.wait_event_type,
  a.wait_event,
  pg_blocking_pids(a.pid) AS blocking_pids,
  left(a.query, 70) AS query_prefix
FROM pg_stat_activity a
WHERE a.datname = current_database()
  AND a.pid <> pg_backend_pid()
ORDER BY a.pid;

Expected output (example shape):

 pid  | usename |        state        | wait_event_type |  wait_event   | blocking_pids | query_prefix
------+---------+---------------------+-----------------+---------------+---------------+------------------------------
 4101 | app     | idle in transaction |                 |               | {}            | UPDATE accounts SET balance...
 4123 | app     | active              | Lock            | transactionid | {4101}        | UPDATE accounts SET balance...

Explanation:

blocking_pids tells you which backend(s) block a given session.
idle in transaction typically means a transaction is holding locks while doing no work (a common cause of lock storms).

Example 4: SERIALIZABLE Retry (Serialization Failure)

SERIALIZABLE provides strict guarantees, but it may abort transactions under contention.

Setup:

DROP TABLE IF EXISTS inventory;

CREATE TABLE inventory (
  sku text PRIMARY KEY,
  available integer NOT NULL CHECK (available >= 0)
);

INSERT INTO inventory (sku, available)
VALUES ('wrench', 1);

Session A:

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT available FROM inventory WHERE sku = 'wrench';
UPDATE inventory SET available = available - 1 WHERE sku = 'wrench';
COMMIT;

Session B (run concurrently):

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT available FROM inventory WHERE sku = 'wrench';
UPDATE inventory SET available = available - 1 WHERE sku = 'wrench';
COMMIT;

Expected output (one session succeeds; the other aborts):

ERROR:  could not serialize access due to read/write dependencies among transactions
HINT:  The transaction might succeed if retried.

Explanation:

SQLSTATE 40001 is PostgreSQL telling you it prevented a SERIALIZABLE anomaly.
The correct response is to retry the whole transaction (often 3-5 times with backoff + jitter).

Practical Use Cases

1. Two-Row Transfers (Canonical Deadlock Hotspot)

Lock both rows in stable order, then update.

BEGIN;

SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;

UPDATE accounts SET balance_cents = balance_cents - 500 WHERE account_id = 1;
UPDATE accounts SET balance_cents = balance_cents + 500 WHERE account_id = 2;

COMMIT;

2. Inventory Reservations With Safe Retries

Treat 40001 and 40P01 as retryable and keep each attempt short.

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

UPDATE inventory
SET available = available - 1
WHERE sku = 'wrench' AND available > 0
RETURNING sku, available;

COMMIT;

3. Work Queue Processing (Reduce Blocking)

Workers claim different rows without waiting.

-- jobs(job_id, status, created_at, started_at)
WITH next_job AS (
  SELECT job_id
  FROM jobs
  WHERE status = 'queued'
  ORDER BY created_at
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
UPDATE jobs j
SET status = 'running', started_at = now()
FROM next_job
WHERE j.job_id = next_job.job_id
RETURNING j.job_id;

4. Batch Updates in Small, Ordered Chunks

Reduce time holding locks and reduce deadlock surface area.

WITH batch AS (
  SELECT order_id
  FROM orders
  WHERE status = 'pending'
  ORDER BY order_id
  LIMIT 500
)
UPDATE orders o
SET status = 'processing'
FROM batch
WHERE o.order_id = batch.order_id;

5. Idempotency for Retry Safety

Use an idempotency key to ensure retries don’t double-apply external side effects.

-- payments(idempotency_key primary key, ...)
INSERT INTO payments (idempotency_key, amount_cents)
VALUES ('req_123', 500)
ON CONFLICT (idempotency_key) DO NOTHING;

Common Mistakes & Troubleshooting

Mistake 1: Not Retrying SQLSTATE 40P01 / 40001

Wrong approach: treat deadlock/serialization errors as hard failures.

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... do work ...
COMMIT;

Bad outcome:

ERROR:  deadlock detected
-- or
ERROR:  could not serialize access due to read/write dependencies among transactions

Fix: retry the whole transaction on 40P01 and 40001 (bounded attempts).

Mistake 2: Inconsistent Lock Ordering Across Code Paths

Wrong (input order varies):

BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 10 WHERE account_id = 2;
UPDATE accounts SET balance_cents = balance_cents + 10 WHERE account_id = 1;
COMMIT;

Fix (lock deterministically):

BEGIN;
SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;
-- then update
COMMIT;

Mistake 3: Retrying Non-Idempotent Side Effects

Wrong outcome pattern: you perform external side effects (charge, email) and then the DB transaction fails and is retried.

-- DB transaction retry happens, but the side effect already occurred
BEGIN;
-- record something about the side effect
COMMIT;

Fix: use idempotency keys, outbox tables, or move side effects after commit.

Mistake 4: Leaving Transactions Open ("Idle in Transaction")

Wrong:

BEGIN;
SELECT * FROM accounts WHERE account_id = 1 FOR UPDATE;
-- pause for user input / network IO
COMMIT;

Bad outcome: long lock holds increase deadlock frequency and lock timeout failures.

Fix: do user interaction and network calls outside the transaction. Keep the transaction window tight.

Debugging Tips (Practical Checklist)

In your app logs, record SQLSTATE for aborted transactions (40P01, 40001, 55P03).
When you see waits, query pg_stat_activity and check pg_blocking_pids(pid).
Look for long-running transactions and idle in transaction sessions.
Check whether the conflicting statements touch rows in inconsistent order.
Add missing indexes so updates/select-for-update target fewer rows and complete faster.

Best Practices

✅ Treat SQLSTATE 40P01 (deadlock_detected) as retryable
❌ Don’t surface deadlocks as unexplained user errors

✅ Treat SQLSTATE 40001 (serialization_failure) as retryable in SERIALIZABLE
❌ Don’t enable SERIALIZABLE without bounded retries

✅ Lock/update rows in a stable order (ORDER BY primary key)
❌ Don’t lock in user-provided order when it can vary

✅ Keep transactions short and purely database-focused
❌ Don’t hold locks while waiting on user input, HTTP calls, or long computations

✅ Use backoff + jitter for retries (avoid retry storms)
❌ Don’t retry immediately in a tight loop

✅ Use `lock_timeout` / `statement_timeout` in latency-sensitive endpoints
❌ Don’t allow unbounded lock waits

✅ Make retries safe with idempotency keys or outbox patterns
❌ Don’t retry when side effects are not protected from duplication

Hands-On Practice

Exercise 1 (Easy): Trigger and Recognize a Deadlock

Task: in two sessions, reproduce the deadlock from Example 1 and capture the error text in the victim session.

-- Your work here

Solution:

-- Session A
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 1;

-- Session B
BEGIN;
UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 2;

-- Session A
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 2;

-- Session B
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 1;

Exercise 2 (Medium): Prevent the Deadlock with Stable Ordering

Task: write a single transaction that locks accounts 1 and 2 in ascending order and then performs a transfer.

-- Your work here

Solution:

BEGIN;

SELECT account_id
FROM accounts
WHERE account_id IN (1, 2)
ORDER BY account_id
FOR UPDATE;

UPDATE accounts SET balance_cents = balance_cents - 1 WHERE account_id = 1;
UPDATE accounts SET balance_cents = balance_cents + 1 WHERE account_id = 2;

COMMIT;

Exercise 3 (Advanced): Retryable Transfer Function

Task: create a PL/pgSQL function that retries a transfer up to 3 times when it hits deadlock_detected or serialization_failure.

-- Your work here

Solution:

CREATE OR REPLACE FUNCTION safe_transfer(
  p_from integer,
  p_to integer,
  p_amount integer
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
  v_attempt integer := 0;
BEGIN
  IF p_from = p_to THEN
    RAISE EXCEPTION 'from and to must differ';
  END IF;

  LOOP
    v_attempt := v_attempt + 1;

    BEGIN
      -- Stable ordering lock of both accounts
      PERFORM 1
      FROM accounts
      WHERE account_id IN (LEAST(p_from, p_to), GREATEST(p_from, p_to))
      ORDER BY account_id
      FOR UPDATE;

      UPDATE accounts SET balance_cents = balance_cents - p_amount WHERE account_id = p_from;
      UPDATE accounts SET balance_cents = balance_cents + p_amount WHERE account_id = p_to;

      RETURN;
    EXCEPTION
      WHEN deadlock_detected OR serialization_failure THEN
        IF v_attempt >= 3 THEN
          RAISE;
        END IF;

        -- Backoff + jitter (simple)
        PERFORM pg_sleep(0.05 * v_attempt + random() * 0.05);
    END;
  END LOOP;
END;
$$;

Connection to Other Concepts

Related concept	Why it matters
ACID and Transaction Control	Deadlocks/serialization failures abort whole transactions; you need to understand rollback scope and savepoints.
Isolation Levels and Locking	SERIALIZABLE requires retries, and row lock patterns (`FOR UPDATE`, `NOWAIT`, `SKIP LOCKED`) shape contention.
Index strategy and maintenance	Missing indexes can widen lock scope and extend lock hold time, increasing conflicts.
Background jobs and scheduling	Worker concurrency often needs `SKIP LOCKED` patterns to avoid blocking piles.

Visual Learning Diagram

flowchart TD
  A["Session A\nlocks Row 1"] --> B["Session B\nlocks Row 2"]
  B --> C["Session A\nwaits for Row 2"]
  A --> D["Session B\nwaits for Row 1"]
  C --> E["Circular wait\n= deadlock"]
  D --> E
  E --> F["PostgreSQL aborts\nvictim txn\nSQLSTATE 40P01"]
  F --> G["App retries\n(backoff + jitter)"]
  G --> H["Stable ordering\nreduces frequency"]

  classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
  classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
  class A,B,C,D,E,F,G,H allNodes
  class E,F,G highlight

Common Pitfalls

Pitfall	Consequence	Prevention
Retrying without idempotency	duplicate charges/emails/tasks	idempotency keys, outbox pattern, side effects after commit
Inconsistent lock ordering	repeated deadlocks between the same code paths	lock rows deterministically (ORDER BY PK)
Long transactions	more lock hold time; increased conflicts	short transactions; avoid "idle in transaction"
Missing indexes on hot WHERE clauses	slow updates under lock; bigger conflict window	add indexes; tighten predicates
No timeouts	requests hang and pile up under contention	set `lock_timeout` and `statement_timeout`
Aggressive retries	retry storms and latency spikes	bounded retries + exponential backoff + jitter

Quick Reference

SELECT pid, pg_blocking_pids(pid) AS blocking_pids FROM pg_stat_activity WHERE wait_event_type = 'Lock';
SET lock_timeout = '2s';
SET statement_timeout = '15s';
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT job_id FROM jobs WHERE status = 'queued' ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1;

What's Next

Previous: Isolation Levels and Locking - Review how locks and SERIALIZABLE behavior create retryable conflicts.
Next: 17. User Management and Security - Secure access to data and control privileges in PostgreSQL.
Module Overview - Return to this module index and choose another related lesson.

Concept Overview​

Why is it important?​

Where does it fit?​

Syntax & Rules​

Retryable Transaction Conflicts (Common SQLSTATEs)​

Timeouts and Logging Knobs​

Key Rules and Considerations​

Step-by-Step Examples​

Example 1: Create a Deadlock (Intentionally)​

Example 2: Lock Wait vs Deadlock (NOWAIT and lock_timeout)​

Example 3: Observe Blocking in pg_stat_activity​

Example 4: SERIALIZABLE Retry (Serialization Failure)​

Practical Use Cases​

1. Two-Row Transfers (Canonical Deadlock Hotspot)​

2. Inventory Reservations With Safe Retries​

3. Work Queue Processing (Reduce Blocking)​

4. Batch Updates in Small, Ordered Chunks​

5. Idempotency for Retry Safety​

Common Mistakes & Troubleshooting​

Mistake 1: Not Retrying SQLSTATE 40P01 / 40001​

Mistake 2: Inconsistent Lock Ordering Across Code Paths​

Mistake 3: Retrying Non-Idempotent Side Effects​

Mistake 4: Leaving Transactions Open ("Idle in Transaction")​

Debugging Tips (Practical Checklist)​

Best Practices​

Hands-On Practice​

Exercise 1 (Easy): Trigger and Recognize a Deadlock​

Exercise 2 (Medium): Prevent the Deadlock with Stable Ordering​

Exercise 3 (Advanced): Retryable Transfer Function​

Connection to Other Concepts​

Visual Learning Diagram​

Common Pitfalls​

Quick Reference​

What's Next​

Concept Overview

Why is it important?

Where does it fit?

Syntax & Rules

Retryable Transaction Conflicts (Common SQLSTATEs)

Timeouts and Logging Knobs

Key Rules and Considerations

Step-by-Step Examples

Example 1: Create a Deadlock (Intentionally)

Example 2: Lock Wait vs Deadlock (NOWAIT and lock_timeout)

Example 3: Observe Blocking in pg_stat_activity

Example 4: SERIALIZABLE Retry (Serialization Failure)

Practical Use Cases

1. Two-Row Transfers (Canonical Deadlock Hotspot)

2. Inventory Reservations With Safe Retries

3. Work Queue Processing (Reduce Blocking)

4. Batch Updates in Small, Ordered Chunks

5. Idempotency for Retry Safety

Common Mistakes & Troubleshooting

Mistake 1: Not Retrying SQLSTATE 40P01 / 40001

Mistake 2: Inconsistent Lock Ordering Across Code Paths

Mistake 3: Retrying Non-Idempotent Side Effects

Mistake 4: Leaving Transactions Open ("Idle in Transaction")

Debugging Tips (Practical Checklist)

Best Practices

Hands-On Practice

Exercise 1 (Easy): Trigger and Recognize a Deadlock

Exercise 2 (Medium): Prevent the Deadlock with Stable Ordering

Exercise 3 (Advanced): Retryable Transfer Function

Connection to Other Concepts

Visual Learning Diagram

Common Pitfalls

Quick Reference

What's Next