SELF JOIN

Learning Focus

Use this lesson to understand SELF JOIN with practical syntax and examples.

Concept Overview

A self join is a join where both sides refer to the same table. You use it when the relationship you want to model is stored inside one table (for example, an employee row storing manager_id pointing to another employee).

Self joins are “normal” joins (INNER/LEFT/RIGHT); the defining feature is that you must use aliases to distinguish the two instances.

Why is it important?

Hierarchies: employee-manager, category trees, referral networks
Comparisons: compare rows within the same table (pairs, duplicates, peers)
Data quality: detect missing parents, broken references, or inconsistent relationships

Where does it fit?

Self joins use the same join concepts as INNER JOIN and LEFT JOIN, plus careful aliasing. For deeper hierarchies, PostgreSQL also offers recursive CTEs (WITH RECURSIVE), which build on the same idea.

Syntax & Rules

Core Syntax

SELECT ...
FROM table_name a
JOIN table_name b
  ON a.some_fk = b.some_pk;

Available Options / Parameters

Choice	What it does	Notes
Table aliases	distinguish instances	required for clarity and to avoid ambiguity
`INNER JOIN`	only rows with a match	excludes top-level rows (manager_id NULL)
`LEFT JOIN`	keep all left rows	useful for “employee with optional manager”
Pair filters	avoid duplicates	use `a.id < b.id` or `a.id <> b.id` as needed

Key Rules and Considerations

Always alias the table (employees e, employees m).
Be explicit about whether you want to keep top-level rows (use LEFT JOIN).
Self joins can multiply rows quickly if the join condition is not selective.
For deep hierarchies (manager -> manager -> manager), consider recursive CTEs.

Step-by-Step Examples

Example 1: Employees and Their Managers (Beginner)

CREATE TABLE staff (
  staff_id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  full_name   text NOT NULL,
  manager_id  bigint REFERENCES staff(staff_id)
);

INSERT INTO staff (full_name, manager_id) VALUES
  ('Alice Manager', NULL),
  ('Bob Engineer', 1),
  ('Carol Engineer', 1),
  ('Drew Intern', 2);

SELECT
  e.staff_id,
  e.full_name AS employee,
  m.full_name AS manager
FROM staff e
LEFT JOIN staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;

Expected output:

 staff_id | employee        | manager
----------+-----------------+---------------
 1        | Alice Manager   |
 2        | Bob Engineer    | Alice Manager
 3        | Carol Engineer  | Alice Manager
 4        | Drew Intern     | Bob Engineer
(4 rows)

Explanation:

LEFT JOIN keeps Alice even though she has no manager.
The manager side (m) is another instance of staff.

Example 2: Only Employees With Managers (INNER SELF JOIN) (Intermediate)

SELECT
  e.full_name AS employee,
  m.full_name AS manager
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id
ORDER BY employee;

Expected output:

 employee        | manager
----------------+---------------
 Bob Engineer    | Alice Manager
 Carol Engineer  | Alice Manager
 Drew Intern     | Bob Engineer
(3 rows)

Explanation:

Alice is excluded because she does not match a manager row.

Example 3: Pairing Rows Within the Same Group (Avoid Duplicates) (Advanced)

Self joins are also useful for generating pairs, such as “employees in the same manager group”.

-- Pair reportees under the same manager.
-- Use < to avoid duplicates and self-pairs.

SELECT
  a.full_name AS employee_a,
  b.full_name AS employee_b,
  a.manager_id
FROM staff a
JOIN staff b
  ON a.manager_id = b.manager_id
 AND a.staff_id < b.staff_id
WHERE a.manager_id IS NOT NULL
ORDER BY a.manager_id, employee_a, employee_b;

Expected output:

 employee_a     | employee_b     | manager_id
----------------+----------------+------------
 Bob Engineer    | Carol Engineer | 1
(1 row)

Explanation:

Bob and Carol share manager 1.
a.staff_id < b.staff_id avoids listing (Carol, Bob) as a second row.

Example 4: Managers with More Than One Direct Report (Advanced)

SELECT
  m.full_name AS manager,
  COUNT(*) AS direct_reports
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id
GROUP BY m.full_name
HAVING COUNT(*) > 1
ORDER BY direct_reports DESC, manager;

Expected output:

 manager        | direct_reports
----------------+---------------
 Alice Manager  | 2
(1 row)

Explanation:

This is a common org-chart metric.

Practical Use Cases

1) Organization charts

SELECT e.full_name AS employee, m.full_name AS manager
FROM employees e
LEFT JOIN employees m ON m.employee_id = e.manager_id;

2) Referral programs

SELECT u.email AS referred, r.email AS referrer
FROM users u
LEFT JOIN users r ON r.user_id = u.referrer_user_id;

3) Duplicate detection (same email, different ids)

SELECT a.user_id, b.user_id, a.email
FROM users a
JOIN users b ON a.email = b.email AND a.user_id < b.user_id;

4) Compatibility/bundling within categories

SELECT a.product_id, b.product_id
FROM products a
JOIN products b ON a.category_id = b.category_id AND a.product_id < b.product_id;

5) Compare “current vs previous” rows per key (basic pattern)

SELECT c.id, c.value AS current_value, p.value AS previous_value
FROM snapshots c
JOIN snapshots p ON p.id = c.id AND p.day = c.day - 1;

Common Mistakes & Troubleshooting

1) Forgetting aliases

Wrong SQL:

SELECT staff.full_name
FROM staff
JOIN staff ON staff.manager_id = staff.staff_id;

Bad outcome:

Ambiguous references and unreadable logic.

Fix:

SELECT e.full_name, m.full_name
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id;

2) Missing join condition (accidental cross join)

Wrong SQL:

SELECT *
FROM staff e
JOIN staff m;

Bad outcome:

Produces all combinations.

Fix:

JOIN staff m ON m.staff_id = e.manager_id

3) Duplicates when generating pairs

Wrong SQL:

SELECT a.full_name, b.full_name
FROM staff a
JOIN staff b ON a.manager_id = b.manager_id;

Bad outcome:

Includes self-pairs (Alice with Alice) and both orderings (Bob, Carol) and (Carol, Bob).

Fix:

... AND a.staff_id < b.staff_id

4) Using INNER JOIN when top-level rows matter

Wrong SQL:

SELECT e.full_name, m.full_name
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id;

Bad outcome:

Top-level managers disappear.

Fix:

LEFT JOIN staff m ON m.staff_id = e.manager_id

Debugging checklist:

Confirm the relationship column (manager_id) and the referenced key (staff_id).
Decide whether the relationship is optional (use LEFT JOIN) or required (INNER JOIN).
For pair generation, add a filter to avoid duplicates (a.id < b.id).
Start with a small preview (ORDER BY ... LIMIT 20) and inspect results.
If performance is slow, index the relationship column (manager_id).

Best Practices

✅ Use clear aliases that reflect roles (e = employee, m = manager). ❌ Avoid generic aliases (a, b) in hierarchy queries.
✅ Use LEFT JOIN when parent rows may be missing. ❌ Avoid losing top-level nodes by default.
✅ Add a.id < b.id for pair queries. ❌ Avoid duplicate or mirrored pairs.
✅ Index relationship columns used in self joins. ❌ Avoid repeated scans of large hierarchy tables.
✅ Consider recursive CTEs for multi-level traversal. ❌ Avoid chaining many self joins for deep trees.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE practice_staff (
  staff_id   bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  full_name  text NOT NULL,
  manager_id bigint REFERENCES practice_staff(staff_id)
);

INSERT INTO practice_staff (full_name, manager_id) VALUES
  ('Alice', NULL),
  ('Bob', 1),
  ('Carol', 1),
  ('Drew', 2);

Exercise 1 (Easy): Employee with manager name

Task: Return every staff member with their manager name (if any).

-- Your SQL here

Solution:

SELECT e.full_name AS employee, m.full_name AS manager
FROM practice_staff e
LEFT JOIN practice_staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;

Exercise 2 (Medium): Only employees that have a manager

Task: Return only staff members that have a manager.

-- Your SQL here

Solution:

SELECT e.full_name AS employee, m.full_name AS manager
FROM practice_staff e
JOIN practice_staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;

Exercise 3 (Advanced): Managers with 2+ direct reports

Task: List managers that have more than one direct report.

-- Your SQL here

Solution:

SELECT m.full_name AS manager, COUNT(*) AS reportees
FROM practice_staff e
JOIN practice_staff m ON m.staff_id = e.manager_id
GROUP BY m.full_name
HAVING COUNT(*) > 1
ORDER BY reportees DESC, manager;

Connection to Other Concepts

Concept	Why it matters
Aliases (AS)	required to distinguish table instances
`LEFT JOIN`	keeps top-level nodes with no parent
`INNER JOIN`	removes rows with no matching parent
`GROUP BY` + `HAVING`	common for manager/reportee counts
CROSS JOIN	missing ON in a self join creates a Cartesian product
Recursive CTEs	iterate through multi-level hierarchies

Visual Learning Diagram

flowchart TD
  A[One Table] --> B[SELF JOIN]
  B --> C[Aliases Required]
  B --> D[Hierarchy: manager_id]
  D --> E[LEFT JOIN to keep top-level]
  D --> F[INNER JOIN for required parent]
  B --> G[Pairs Within Group]
  G --> H[Add a.id < b.id]
  D --> I[Deep Trees]
  I --> J[Recursive CTE]

  classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
  classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
  class A,B,C,D,E,F,G,H,I,J allNodes
  class B highlight

Common Pitfalls

Pitfall	Consequence	Prevention
No aliases	ambiguous/unreadable SQL	always alias both instances
Missing join condition	Cartesian product	always add `ON`
Pair queries produce duplicates	mirrored pairs	use `a.id < b.id`
Using INNER JOIN for optional parents	missing top-level rows	use LEFT JOIN
Deep hierarchy with many joins	complex and slow SQL	use recursive CTEs

Quick Reference

SELECT e.col, p.col FROM t e JOIN t p ON p.id = e.parent_id;
SELECT e.col, p.col FROM t e LEFT JOIN t p ON p.id = e.parent_id;
SELECT a.id, b.id FROM t a JOIN t b ON a.group_id = b.group_id AND a.id < b.id;
SELECT m.name, COUNT(*) FROM t e JOIN t m ON m.id = e.parent_id GROUP BY m.name HAVING COUNT(*) > 1;
WITH RECURSIVE ...

What's Next

Previous: CROSS JOIN - Review the previous lesson to reinforce context.
Next: UNION ALL - Continue to the next concept with incremental complexity.
Module Overview - Return to this module index and choose another related lesson.

Concept Overview​

Why is it important?​

Where does it fit?​

Syntax & Rules​

Core Syntax​

Available Options / Parameters​

Key Rules and Considerations​

Step-by-Step Examples​

Example 1: Employees and Their Managers (Beginner)​

Example 2: Only Employees With Managers (INNER SELF JOIN) (Intermediate)​

Example 3: Pairing Rows Within the Same Group (Avoid Duplicates) (Advanced)​

Example 4: Managers with More Than One Direct Report (Advanced)​

Practical Use Cases​

1) Organization charts​

2) Referral programs​

3) Duplicate detection (same email, different ids)​

4) Compatibility/bundling within categories​

5) Compare “current vs previous” rows per key (basic pattern)​

Common Mistakes & Troubleshooting​

1) Forgetting aliases​

2) Missing join condition (accidental cross join)​

3) Duplicates when generating pairs​

4) Using INNER JOIN when top-level rows matter​

Best Practices​

Hands-On Practice​

Exercise 1 (Easy): Employee with manager name​

Exercise 2 (Medium): Only employees that have a manager​

Exercise 3 (Advanced): Managers with 2+ direct reports​

Connection to Other Concepts​

Visual Learning Diagram​

Common Pitfalls​

Quick Reference​

What's Next​

Concept Overview

Why is it important?

Where does it fit?

Syntax & Rules

Core Syntax

Available Options / Parameters

Key Rules and Considerations

Step-by-Step Examples

Example 1: Employees and Their Managers (Beginner)

Example 2: Only Employees With Managers (INNER SELF JOIN) (Intermediate)

Example 3: Pairing Rows Within the Same Group (Avoid Duplicates) (Advanced)

Example 4: Managers with More Than One Direct Report (Advanced)

Practical Use Cases

1) Organization charts

2) Referral programs

3) Duplicate detection (same email, different ids)

4) Compatibility/bundling within categories

5) Compare “current vs previous” rows per key (basic pattern)

Common Mistakes & Troubleshooting

1) Forgetting aliases

2) Missing join condition (accidental cross join)

3) Duplicates when generating pairs

4) Using INNER JOIN when top-level rows matter

Best Practices

Hands-On Practice

Exercise 1 (Easy): Employee with manager name

Exercise 2 (Medium): Only employees that have a manager

Exercise 3 (Advanced): Managers with 2+ direct reports

Connection to Other Concepts

Visual Learning Diagram

Common Pitfalls

Quick Reference

What's Next