Skip to main content

Primary and Foreign Keys

Learning Focus

Use this lesson to design primary keys and foreign keys in PostgreSQL, choose safe referential actions, and avoid performance traps (especially missing FK indexes).

Concept Overview

Keys define identity and relationships:

  • A primary key (PK) uniquely identifies each row in a table.
  • A foreign key (FK) enforces that a column (or set of columns) references an existing row in another table.

In PostgreSQL, a primary key is implemented as a UNIQUE + NOT NULL constraint (and it creates a unique B-tree index).

Important PostgreSQL behavior: creating a foreign key does not automatically create an index on the referencing column(s). If you join on the FK, or frequently delete/update parent rows, you typically want an index on the FK column.

Why is it important?

  • Correctness: stop orphaned rows and duplicated identities
  • Data modeling: represent one-to-many and many-to-many relationships
  • Performance: good keys + supporting indexes make joins and deletes predictable
  • Operational safety: enforce rules in the database, not only in app code

Where does it fit?

Keys are foundational schema tools used in:

  • CREATE TABLE designs (define PK/FKs up front)
  • migrations (ALTER TABLE ... ADD CONSTRAINT)
  • indexing strategy (PK index + often FK indexes)
  • data cleanup and audits (find duplicates/orphans)

Syntax & Rules

Core Syntax

Primary key (inline):

CREATE TABLE customers (
customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL UNIQUE
);

Primary key (table constraint):

CREATE TABLE customers (
customer_id bigint GENERATED ALWAYS AS IDENTITY,
email text NOT NULL,
CONSTRAINT customers_pkey PRIMARY KEY (customer_id)
);

Foreign key (table constraint with a name):

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
amount numeric(12,2) NOT NULL,
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
);

Available Options / Parameters

FeatureExampleMeaning / Notes
PK columnsPRIMARY KEY (a, b)composite key across multiple columns
FK targetREFERENCES customers(customer_id)referenced table + columns must be PK/unique
Named constraintCONSTRAINT fk_orders_customer ...makes errors and migrations easier to manage
ON DELETEON DELETE RESTRICT / CASCADE / SET NULLaction when parent row is deleted
ON UPDATEON UPDATE CASCADEaction when referenced key changes (rare if PK stable)
DeferrableDEFERRABLE INITIALLY DEFERREDcheck at commit time (bulk loads/complex transactions)
Validation controlNOT VALID + VALIDATE CONSTRAINTadd constraints without immediate full validation

Key Rules and Considerations

  • FK columns and referenced columns must have matching types.
  • The referenced columns must have a PRIMARY KEY or UNIQUE constraint.
  • PostgreSQL creates indexes automatically for PRIMARY KEY and UNIQUE constraints.
  • PostgreSQL does not automatically index FK columns; add indexes intentionally.
  • Prefer stable PKs (surrogate identity or immutable natural key) to avoid cascading updates.

Step-by-Step Examples

Use this setup for the examples:

DROP TABLE IF EXISTS order_items;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS customers;

CREATE TABLE customers (
customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL UNIQUE
);

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0),
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE RESTRICT
);

INSERT INTO customers (email) VALUES
('a@example.com'),
('b@example.com');

INSERT INTO orders (customer_id, status, amount) VALUES
(1, 'paid', 10.00),
(1, 'paid', 25.00),
(2, 'failed', 5.00);

Example 1: Primary Key Uniqueness (Beginner)

SELECT customer_id, email
FROM customers
ORDER BY customer_id;

Expected output:

 customer_id |     email
-------------+---------------
1 | a@example.com
2 | b@example.com
(2 rows)

Explanation:

  • Each row gets a unique customer_id via the identity column.

Example 2: Foreign Key Prevents Orphans (Beginner)

Try inserting an order for a non-existent customer:

INSERT INTO orders (customer_id, status, amount)
VALUES (999, 'paid', 20.00);

Expected outcome:

ERROR:  insert or update on table "orders" violates foreign key constraint "fk_orders_customer"
DETAIL: Key (customer_id)=(999) is not present in table "customers".

Fix: insert the parent row first (or use a valid id).


Example 3: FK Indexing Matters (Intermediate)

Querying by FK is common:

EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 1;

Expected output (example):

 Seq Scan on orders  (cost=0.00..1.05 rows=1 width=16)
Filter: (customer_id = 1)

Add an index on the FK column:

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

EXPLAIN
SELECT order_id, amount
FROM orders
WHERE customer_id = 1;

Expected output (example):

 Index Scan using idx_orders_customer_id on orders  (cost=0.13..8.15 rows=1 width=16)
Index Cond: (customer_id = 1)

Explanation:

  • Without the index, PostgreSQL may scan the whole table.
  • With the index, it can use an index scan for selective lookups.

Example 4: Referential Actions and Safety (Advanced)

With ON DELETE RESTRICT, deleting a parent with child rows fails:

DELETE FROM customers WHERE customer_id = 1;

Expected outcome:

ERROR:  update or delete on table "customers" violates foreign key constraint "fk_orders_customer" on table "orders"
DETAIL: Key (customer_id)=(1) is still referenced from table "orders".

If your business rule is “delete customer deletes orders”, you can choose CASCADE:

ALTER TABLE orders
DROP CONSTRAINT fk_orders_customer;

ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE CASCADE;

Expected output:

ALTER TABLE
ALTER TABLE

Explanation:

  • CASCADE is powerful and risky. Use it only when you truly want cascading deletes.

Practical Use Cases

1) Many-to-Many With a Join Table

Context: users can have many roles; roles can belong to many users.

CREATE TABLE roles (
role_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL UNIQUE
);

CREATE TABLE user_roles (
user_id bigint NOT NULL REFERENCES users(user_id),
role_id bigint NOT NULL REFERENCES roles(role_id),
PRIMARY KEY (user_id, role_id)
);

2) Prevent Deletes That Would Orphan Critical Records

Context: keep invoices immutable; don't allow deleting customers with invoices.

FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
ON DELETE RESTRICT

3) Optional Relationships With SET NULL

Context: a ticket may optionally reference an assignee.

CREATE TABLE tickets (
ticket_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
assignee_id bigint REFERENCES users(user_id) ON DELETE SET NULL,
title text NOT NULL
);

4) Add Constraints Safely on Large Existing Tables

Context: introduce a foreign key without blocking for a full scan immediately.

ALTER TABLE orders
ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
NOT VALID;

ALTER TABLE orders
VALIDATE CONSTRAINT fk_orders_customer;

Common Mistakes & Troubleshooting

Mistake 1: Forgetting to Index the FK Column

Wrong assumption: the FK constraint creates an index on orders.customer_id.

What happens: slow joins and slow deletes/updates on parent rows.

Fix:

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

Mistake 2: Type Mismatch Between PK and FK

Wrong:

-- customers.customer_id is bigint
-- orders.customer_id is integer

What happens: the FK may fail to create or force casts that hurt performance.

Fix: match types exactly.


Mistake 3: Using CASCADE Casually

What happens: deletes can wipe large parts of your data graph.

Fix: default to RESTRICT/NO ACTION and document any cascades.

Debugging tips:

  1. When an FK insert fails, read the DETAIL line to find the missing parent key.
  2. For delete failures, identify child rows: SELECT * FROM orders WHERE customer_id = 1;.
  3. Use \d+ table_name in psql to inspect constraints.
  4. For performance issues, EXPLAIN the join/filter and add indexes to match predicates.

Best Practices

  • ✅ Use stable primary keys; ❌ choose a PK that changes frequently.
  • ✅ Name constraints (fk_orders_customer); ❌ rely on auto-generated names.
  • ✅ Index foreign key columns used in joins/filters; ❌ assume PostgreSQL creates FK indexes automatically.
  • ✅ Prefer RESTRICT/NO ACTION unless you truly need cascade; ❌ use CASCADE as a shortcut.
  • ✅ Validate large-table constraints safely with NOT VALID + VALIDATE; ❌ add heavyweight constraints without a rollout plan.

Hands-On Practice

Use this setup for the exercises:

DROP TABLE IF EXISTS practice_orders;
DROP TABLE IF EXISTS practice_customers;

CREATE TABLE practice_customers (
customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL UNIQUE
);

INSERT INTO practice_customers (email) VALUES ('p1@example.com'), ('p2@example.com');

Exercise 1 (Easy): Create a table with a primary key

Task: Create practice_orders with order_id as an identity primary key.

-- Your SQL here

Solution:

CREATE TABLE practice_orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
);

Exercise 2 (Medium): Add a foreign key

Task: Add customer_id bigint NOT NULL to practice_orders and create a foreign key to practice_customers(customer_id).

-- Your SQL here

Solution:

ALTER TABLE practice_orders
ADD COLUMN customer_id bigint NOT NULL;

ALTER TABLE practice_orders
ADD CONSTRAINT fk_practice_orders_customer
FOREIGN KEY (customer_id)
REFERENCES practice_customers(customer_id);

Exercise 3 (Advanced): Add an index for the FK

Task: Create an index that speeds up joins and deletes for the FK relationship.

-- Your SQL here

Solution:

CREATE INDEX idx_practice_orders_customer_id
ON practice_orders (customer_id);

Connection to Other Concepts

ConceptWhy it matters
CREATE TABLEkeys are usually defined at table creation
ALTER TABLEadd/validate constraints during migrations
Index strategyFK indexes are a common performance requirement
Transactionsdeferrable constraints can change when checks happen
JoinsPK/FK relationships drive common join patterns

Visual Learning Diagram

flowchart LR
A[customers\nPK: customer_id] -->|FK orders.customer_id| B[orders\nFK: customer_id]
B --> C[Index on orders.customer_id\n(recommended)]
A --> D[ON DELETE action\nRESTRICT/CASCADE/SET NULL]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D allNodes
class B highlight

Common Pitfalls

PitfallConsequencePrevention
No index on FK columnsslow joins, slow parent deletes/updatescreate INDEX ON child(fk)
Cascade deletes everywhereaccidental mass deletionsdefault to restrict/no action
Mismatched typesfailed constraints, poor plansmatch PK/FK types exactly
Unstable PK valuescascading updates, complexitykeep PK immutable
Anonymous constraint nameshard debuggingname constraints consistently

Quick Reference

PRIMARY KEY (id)
FOREIGN KEY (parent_id) REFERENCES parent(id)
CREATE INDEX ON child(parent_id)
ALTER TABLE t ADD CONSTRAINT fk_name FOREIGN KEY (c) REFERENCES p(id)
ALTER TABLE t VALIDATE CONSTRAINT fk_name

What's Next