SELF JOIN
Use this lesson to understand SELF JOIN with practical syntax and examples.
Concept Overview
A self join is a join where both sides refer to the same table. You use it when the relationship you want to model is stored inside one table (for example, an employee row storing manager_id pointing to another employee).
Self joins are “normal” joins (INNER/LEFT/RIGHT); the defining feature is that you must use aliases to distinguish the two instances.
Why is it important?
- Hierarchies: employee-manager, category trees, referral networks
- Comparisons: compare rows within the same table (pairs, duplicates, peers)
- Data quality: detect missing parents, broken references, or inconsistent relationships
Where does it fit?
Self joins use the same join concepts as INNER JOIN and LEFT JOIN, plus careful aliasing. For deeper hierarchies, PostgreSQL also offers recursive CTEs (WITH RECURSIVE), which build on the same idea.
Syntax & Rules
Core Syntax
SELECT ...
FROM table_name a
JOIN table_name b
ON a.some_fk = b.some_pk;
Available Options / Parameters
| Choice | What it does | Notes |
|---|---|---|
| Table aliases | distinguish instances | required for clarity and to avoid ambiguity |
INNER JOIN | only rows with a match | excludes top-level rows (manager_id NULL) |
LEFT JOIN | keep all left rows | useful for “employee with optional manager” |
| Pair filters | avoid duplicates | use a.id < b.id or a.id <> b.id as needed |
Key Rules and Considerations
- Always alias the table (
employees e,employees m). - Be explicit about whether you want to keep top-level rows (use
LEFT JOIN). - Self joins can multiply rows quickly if the join condition is not selective.
- For deep hierarchies (manager -> manager -> manager), consider recursive CTEs.
Step-by-Step Examples
Example 1: Employees and Their Managers (Beginner)
CREATE TABLE staff (
staff_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
manager_id bigint REFERENCES staff(staff_id)
);
INSERT INTO staff (full_name, manager_id) VALUES
('Alice Manager', NULL),
('Bob Engineer', 1),
('Carol Engineer', 1),
('Drew Intern', 2);
SELECT
e.staff_id,
e.full_name AS employee,
m.full_name AS manager
FROM staff e
LEFT JOIN staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;
Expected output:
staff_id | employee | manager
----------+-----------------+---------------
1 | Alice Manager |
2 | Bob Engineer | Alice Manager
3 | Carol Engineer | Alice Manager
4 | Drew Intern | Bob Engineer
(4 rows)
Explanation:
LEFT JOINkeeps Alice even though she has no manager.- The manager side (
m) is another instance ofstaff.
Example 2: Only Employees With Managers (INNER SELF JOIN) (Intermediate)
SELECT
e.full_name AS employee,
m.full_name AS manager
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id
ORDER BY employee;
Expected output:
employee | manager
----------------+---------------
Bob Engineer | Alice Manager
Carol Engineer | Alice Manager
Drew Intern | Bob Engineer
(3 rows)
Explanation:
- Alice is excluded because she does not match a manager row.
Example 3: Pairing Rows Within the Same Group (Avoid Duplicates) (Advanced)
Self joins are also useful for generating pairs, such as “employees in the same manager group”.
-- Pair reportees under the same manager.
-- Use < to avoid duplicates and self-pairs.
SELECT
a.full_name AS employee_a,
b.full_name AS employee_b,
a.manager_id
FROM staff a
JOIN staff b
ON a.manager_id = b.manager_id
AND a.staff_id < b.staff_id
WHERE a.manager_id IS NOT NULL
ORDER BY a.manager_id, employee_a, employee_b;
Expected output:
employee_a | employee_b | manager_id
----------------+----------------+------------
Bob Engineer | Carol Engineer | 1
(1 row)
Explanation:
- Bob and Carol share manager 1.
a.staff_id < b.staff_idavoids listing (Carol, Bob) as a second row.
Example 4: Managers with More Than One Direct Report (Advanced)
SELECT
m.full_name AS manager,
COUNT(*) AS direct_reports
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id
GROUP BY m.full_name
HAVING COUNT(*) > 1
ORDER BY direct_reports DESC, manager;
Expected output:
manager | direct_reports
----------------+---------------
Alice Manager | 2
(1 row)
Explanation:
- This is a common org-chart metric.
Practical Use Cases
1) Organization charts
SELECT e.full_name AS employee, m.full_name AS manager
FROM employees e
LEFT JOIN employees m ON m.employee_id = e.manager_id;
2) Referral programs
SELECT u.email AS referred, r.email AS referrer
FROM users u
LEFT JOIN users r ON r.user_id = u.referrer_user_id;
3) Duplicate detection (same email, different ids)
SELECT a.user_id, b.user_id, a.email
FROM users a
JOIN users b ON a.email = b.email AND a.user_id < b.user_id;
4) Compatibility/bundling within categories
SELECT a.product_id, b.product_id
FROM products a
JOIN products b ON a.category_id = b.category_id AND a.product_id < b.product_id;
5) Compare “current vs previous” rows per key (basic pattern)
SELECT c.id, c.value AS current_value, p.value AS previous_value
FROM snapshots c
JOIN snapshots p ON p.id = c.id AND p.day = c.day - 1;
Common Mistakes & Troubleshooting
1) Forgetting aliases
Wrong SQL:
SELECT staff.full_name
FROM staff
JOIN staff ON staff.manager_id = staff.staff_id;
Bad outcome:
- Ambiguous references and unreadable logic.
Fix:
SELECT e.full_name, m.full_name
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id;
2) Missing join condition (accidental cross join)
Wrong SQL:
SELECT *
FROM staff e
JOIN staff m;
Bad outcome:
- Produces all combinations.
Fix:
JOIN staff m ON m.staff_id = e.manager_id
3) Duplicates when generating pairs
Wrong SQL:
SELECT a.full_name, b.full_name
FROM staff a
JOIN staff b ON a.manager_id = b.manager_id;
Bad outcome:
- Includes self-pairs (Alice with Alice) and both orderings (Bob, Carol) and (Carol, Bob).
Fix:
... AND a.staff_id < b.staff_id
4) Using INNER JOIN when top-level rows matter
Wrong SQL:
SELECT e.full_name, m.full_name
FROM staff e
JOIN staff m ON m.staff_id = e.manager_id;
Bad outcome:
- Top-level managers disappear.
Fix:
LEFT JOIN staff m ON m.staff_id = e.manager_id
Debugging checklist:
- Confirm the relationship column (
manager_id) and the referenced key (staff_id). - Decide whether the relationship is optional (use LEFT JOIN) or required (INNER JOIN).
- For pair generation, add a filter to avoid duplicates (
a.id < b.id). - Start with a small preview (
ORDER BY ... LIMIT 20) and inspect results. - If performance is slow, index the relationship column (
manager_id).
Best Practices
- ✅ Use clear aliases that reflect roles (
e= employee,m= manager). ❌ Avoid generic aliases (a,b) in hierarchy queries. - ✅ Use
LEFT JOINwhen parent rows may be missing. ❌ Avoid losing top-level nodes by default. - ✅ Add
a.id < b.idfor pair queries. ❌ Avoid duplicate or mirrored pairs. - ✅ Index relationship columns used in self joins. ❌ Avoid repeated scans of large hierarchy tables.
- ✅ Consider recursive CTEs for multi-level traversal. ❌ Avoid chaining many self joins for deep trees.
Hands-On Practice
Use this setup for the exercises:
CREATE TABLE practice_staff (
staff_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
manager_id bigint REFERENCES practice_staff(staff_id)
);
INSERT INTO practice_staff (full_name, manager_id) VALUES
('Alice', NULL),
('Bob', 1),
('Carol', 1),
('Drew', 2);
Exercise 1 (Easy): Employee with manager name
Task: Return every staff member with their manager name (if any).
-- Your SQL here
Solution:
SELECT e.full_name AS employee, m.full_name AS manager
FROM practice_staff e
LEFT JOIN practice_staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;
Exercise 2 (Medium): Only employees that have a manager
Task: Return only staff members that have a manager.
-- Your SQL here
Solution:
SELECT e.full_name AS employee, m.full_name AS manager
FROM practice_staff e
JOIN practice_staff m ON m.staff_id = e.manager_id
ORDER BY e.staff_id;
Exercise 3 (Advanced): Managers with 2+ direct reports
Task: List managers that have more than one direct report.
-- Your SQL here
Solution:
SELECT m.full_name AS manager, COUNT(*) AS reportees
FROM practice_staff e
JOIN practice_staff m ON m.staff_id = e.manager_id
GROUP BY m.full_name
HAVING COUNT(*) > 1
ORDER BY reportees DESC, manager;
Connection to Other Concepts
| Concept | Why it matters |
|---|---|
| Aliases (AS) | required to distinguish table instances |
LEFT JOIN | keeps top-level nodes with no parent |
INNER JOIN | removes rows with no matching parent |
GROUP BY + HAVING | common for manager/reportee counts |
| CROSS JOIN | missing ON in a self join creates a Cartesian product |
| Recursive CTEs | iterate through multi-level hierarchies |
Visual Learning Diagram
flowchart TD
A[One Table] --> B[SELF JOIN]
B --> C[Aliases Required]
B --> D[Hierarchy: manager_id]
D --> E[LEFT JOIN to keep top-level]
D --> F[INNER JOIN for required parent]
B --> G[Pairs Within Group]
G --> H[Add a.id < b.id]
D --> I[Deep Trees]
I --> J[Recursive CTE]
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H,I,J allNodes
class B highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| No aliases | ambiguous/unreadable SQL | always alias both instances |
| Missing join condition | Cartesian product | always add ON |
| Pair queries produce duplicates | mirrored pairs | use a.id < b.id |
| Using INNER JOIN for optional parents | missing top-level rows | use LEFT JOIN |
| Deep hierarchy with many joins | complex and slow SQL | use recursive CTEs |
Quick Reference
SELECT e.col, p.col FROM t e JOIN t p ON p.id = e.parent_id;
SELECT e.col, p.col FROM t e LEFT JOIN t p ON p.id = e.parent_id;
SELECT a.id, b.id FROM t a JOIN t b ON a.group_id = b.group_id AND a.id < b.id;
SELECT m.name, COUNT(*) FROM t e JOIN t m ON m.id = e.parent_id GROUP BY m.name HAVING COUNT(*) > 1;
WITH RECURSIVE ...
What's Next
- Previous: CROSS JOIN - Review the previous lesson to reinforce context.
- Next: UNION ALL - Continue to the next concept with incremental complexity.
- Module Overview - Return to this module index and choose another related lesson.