Skip to main content

SELF JOIN

Learning Focus

Use this lesson to understand SELF JOIN with practical syntax and examples.

Concept Overview

A self join is a join where both sides refer to the same table. You use it when the relationship you want to model is stored inside one table (for example, an employee row storing manager_id pointing to another employee).

Self joins are “normal” joins (INNER/LEFT/RIGHT); the defining feature is that you must use aliases to distinguish the two instances.

Why is it important?

  • Hierarchies: employee-manager, category trees, referral networks
  • Comparisons: compare rows within the same table (pairs, duplicates, peers)
  • Data quality: detect missing parents, broken references, or inconsistent relationships

Where does it fit?

Self joins use the same join concepts as INNER JOIN and LEFT JOIN, plus careful aliasing. For deeper hierarchies, PostgreSQL also offers recursive CTEs (WITH RECURSIVE), which build on the same idea.


Syntax & Rules

Core Syntax

SELECT ...
FROM table_name a
JOIN table_name b
ON a.some_fk = b.some_pk;

Available Options / Parameters

ChoiceWhat it doesNotes
Table aliasesdistinguish instancesrequired for clarity and to avoid ambiguity
INNER JOINonly rows with a matchexcludes top-level rows (manager_id NULL)
LEFT JOINkeep all left rowsuseful for “employee with optional manager”
Pair filtersavoid duplicatesuse a.id < b.id or a.id <> b.id as needed

Key Rules and Considerations

  • Always alias the table (employees e, employees m).
  • Be explicit about whether you want to keep top-level rows (use LEFT JOIN).
  • Self joins can multiply rows quickly if the join condition is not selective.
  • For deep hierarchies (manager -> manager -> manager), consider recursive CTEs.

Step-by-Step Examples

Example 1: Employees and Their Managers (Beginner)

CREATE TABLE staff (
staff_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
manager_id bigint REFERENCES staff(staff_id)
);

INSERT INTO staff (full_name, manager_id) VALUES
('Alice Manager', NULL),
('Bob Engineer', 1),
('Carol Engineer', 1),
('Drew Intern', 2);

SELECT
e.staff_id,
e.full_name AS employee,
m.full_name AS manager
FROM staff e
LEFT JOIN staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;

Expected output:

 staff_id | employee        | manager
----------+-----------------+---------------
1 | Alice Manager |
2 | Bob Engineer | Alice Manager
3 | Carol Engineer | Alice Manager
4 | Drew Intern | Bob Engineer
(4 rows)

Explanation:

  • LEFT JOIN keeps Alice even though she has no manager.
  • The manager side (m) is another instance of staff.

Example 2: Only Employees With Managers (INNER SELF JOIN) (Intermediate)

SELECT
e.full_name AS employee,
m.full_name AS manager
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id
ORDER BY employee;

Expected output:

 employee        | manager
----------------+---------------
Bob Engineer | Alice Manager
Carol Engineer | Alice Manager
Drew Intern | Bob Engineer
(3 rows)

Explanation:

  • Alice is excluded because she does not match a manager row.

Example 3: Pairing Rows Within the Same Group (Avoid Duplicates) (Advanced)

Self joins are also useful for generating pairs, such as “employees in the same manager group”.

-- Pair reportees under the same manager.
-- Use < to avoid duplicates and self-pairs.

SELECT
a.full_name AS employee_a,
b.full_name AS employee_b,
a.manager_id
FROM staff a
JOIN staff b
ON a.manager_id = b.manager_id
AND a.staff_id < b.staff_id
WHERE a.manager_id IS NOT NULL
ORDER BY a.manager_id, employee_a, employee_b;

Expected output:

 employee_a     | employee_b     | manager_id
----------------+----------------+------------
Bob Engineer | Carol Engineer | 1
(1 row)

Explanation:

  • Bob and Carol share manager 1.
  • a.staff_id < b.staff_id avoids listing (Carol, Bob) as a second row.

Example 4: Managers with More Than One Direct Report (Advanced)

SELECT
m.full_name AS manager,
COUNT(*) AS direct_reports
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id
GROUP BY m.full_name
HAVING COUNT(*) > 1
ORDER BY direct_reports DESC, manager;

Expected output:

 manager        | direct_reports
----------------+---------------
Alice Manager | 2
(1 row)

Explanation:

  • This is a common org-chart metric.

Practical Use Cases

1) Organization charts

SELECT e.full_name AS employee, m.full_name AS manager
FROM employees e
LEFT JOIN employees m ON m.employee_id = e.manager_id;

2) Referral programs

SELECT u.email AS referred, r.email AS referrer
FROM users u
LEFT JOIN users r ON r.user_id = u.referrer_user_id;

3) Duplicate detection (same email, different ids)

SELECT a.user_id, b.user_id, a.email
FROM users a
JOIN users b ON a.email = b.email AND a.user_id < b.user_id;

4) Compatibility/bundling within categories

SELECT a.product_id, b.product_id
FROM products a
JOIN products b ON a.category_id = b.category_id AND a.product_id < b.product_id;

5) Compare “current vs previous” rows per key (basic pattern)

SELECT c.id, c.value AS current_value, p.value AS previous_value
FROM snapshots c
JOIN snapshots p ON p.id = c.id AND p.day = c.day - 1;

Common Mistakes & Troubleshooting

1) Forgetting aliases

Wrong SQL:

SELECT staff.full_name
FROM staff
JOIN staff ON staff.manager_id = staff.staff_id;

Bad outcome:

  • Ambiguous references and unreadable logic.

Fix:

SELECT e.full_name, m.full_name
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id;

2) Missing join condition (accidental cross join)

Wrong SQL:

SELECT *
FROM staff e
JOIN staff m;

Bad outcome:

  • Produces all combinations.

Fix:

JOIN staff m ON m.staff_id = e.manager_id

3) Duplicates when generating pairs

Wrong SQL:

SELECT a.full_name, b.full_name
FROM staff a
JOIN staff b ON a.manager_id = b.manager_id;

Bad outcome:

  • Includes self-pairs (Alice with Alice) and both orderings (Bob, Carol) and (Carol, Bob).

Fix:

... AND a.staff_id < b.staff_id

4) Using INNER JOIN when top-level rows matter

Wrong SQL:

SELECT e.full_name, m.full_name
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id;

Bad outcome:

  • Top-level managers disappear.

Fix:

LEFT JOIN staff m ON m.staff_id = e.manager_id

Debugging checklist:

  1. Confirm the relationship column (manager_id) and the referenced key (staff_id).
  2. Decide whether the relationship is optional (use LEFT JOIN) or required (INNER JOIN).
  3. For pair generation, add a filter to avoid duplicates (a.id < b.id).
  4. Start with a small preview (ORDER BY ... LIMIT 20) and inspect results.
  5. If performance is slow, index the relationship column (manager_id).

Best Practices

  • ✅ Use clear aliases that reflect roles (e = employee, m = manager). ❌ Avoid generic aliases (a, b) in hierarchy queries.
  • ✅ Use LEFT JOIN when parent rows may be missing. ❌ Avoid losing top-level nodes by default.
  • ✅ Add a.id < b.id for pair queries. ❌ Avoid duplicate or mirrored pairs.
  • ✅ Index relationship columns used in self joins. ❌ Avoid repeated scans of large hierarchy tables.
  • ✅ Consider recursive CTEs for multi-level traversal. ❌ Avoid chaining many self joins for deep trees.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE practice_staff (
staff_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
manager_id bigint REFERENCES practice_staff(staff_id)
);

INSERT INTO practice_staff (full_name, manager_id) VALUES
('Alice', NULL),
('Bob', 1),
('Carol', 1),
('Drew', 2);

Exercise 1 (Easy): Employee with manager name

Task: Return every staff member with their manager name (if any).

-- Your SQL here

Solution:

SELECT e.full_name AS employee, m.full_name AS manager
FROM practice_staff e
LEFT JOIN practice_staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;

Exercise 2 (Medium): Only employees that have a manager

Task: Return only staff members that have a manager.

-- Your SQL here

Solution:

SELECT e.full_name AS employee, m.full_name AS manager
FROM practice_staff e
JOIN practice_staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;

Exercise 3 (Advanced): Managers with 2+ direct reports

Task: List managers that have more than one direct report.

-- Your SQL here

Solution:

SELECT m.full_name AS manager, COUNT(*) AS reportees
FROM practice_staff e
JOIN practice_staff m ON m.staff_id = e.manager_id
GROUP BY m.full_name
HAVING COUNT(*) > 1
ORDER BY reportees DESC, manager;

Connection to Other Concepts

ConceptWhy it matters
Aliases (AS)required to distinguish table instances
LEFT JOINkeeps top-level nodes with no parent
INNER JOINremoves rows with no matching parent
GROUP BY + HAVINGcommon for manager/reportee counts
CROSS JOINmissing ON in a self join creates a Cartesian product
Recursive CTEsiterate through multi-level hierarchies

Visual Learning Diagram

flowchart TD
A[One Table] --> B[SELF JOIN]
B --> C[Aliases Required]
B --> D[Hierarchy: manager_id]
D --> E[LEFT JOIN to keep top-level]
D --> F[INNER JOIN for required parent]
B --> G[Pairs Within Group]
G --> H[Add a.id < b.id]
D --> I[Deep Trees]
I --> J[Recursive CTE]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H,I,J allNodes
class B highlight

Common Pitfalls

PitfallConsequencePrevention
No aliasesambiguous/unreadable SQLalways alias both instances
Missing join conditionCartesian productalways add ON
Pair queries produce duplicatesmirrored pairsuse a.id < b.id
Using INNER JOIN for optional parentsmissing top-level rowsuse LEFT JOIN
Deep hierarchy with many joinscomplex and slow SQLuse recursive CTEs

Quick Reference

SELECT e.col, p.col FROM t e JOIN t p ON p.id = e.parent_id;
SELECT e.col, p.col FROM t e LEFT JOIN t p ON p.id = e.parent_id;
SELECT a.id, b.id FROM t a JOIN t b ON a.group_id = b.group_id AND a.id < b.id;
SELECT m.name, COUNT(*) FROM t e JOIN t m ON m.id = e.parent_id GROUP BY m.name HAVING COUNT(*) > 1;
WITH RECURSIVE ...

What's Next