SELECT DISTINCT
Use this lesson to understand SELECT DISTINCT with practical syntax and examples.
Concept Overview
SELECT DISTINCT removes duplicate rows from a query result and returns only unique values for the selected columns.
This is useful when your source data contains repeated values (either naturally, or because your query introduces duplicates through joins). DISTINCT helps you answer questions like "Which departments exist?" or "Which customers placed at least one paid order?" without writing a full aggregation.
Why is it important?
- Data quality: de-duplicates results so reports and downstream logic are not inflated by repetition
- Analysis accuracy: enables correct unique lists and unique counts (
COUNT(DISTINCT ...)) - Performance: can reduce result size and network transfer (but DISTINCT itself has a cost)
- Business intelligence: common for unique customers, unique categories, unique regions
Where does it fit?
SELECT DISTINCT is part of DQL (Data Query Language). It sits between basic SELECT and advanced aggregation (GROUP BY):
- If you only need unique values, DISTINCT is often the simplest tool.
- If you need metrics per group (counts, sums, averages), use
GROUP BY.
PostgreSQL also provides a PostgreSQL-native extension called DISTINCT ON for "first row per key" patterns (covered in the advanced example).
Syntax & Rules
Basic Syntax Structure
SELECT DISTINCT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1;
Available Options / Parameters
| Option | What it does | Notes |
|---|---|---|
DISTINCT | removes duplicate rows from the SELECT list | duplicates are based on the full selected row |
DISTINCT ON (expr, ...) | keeps the first row per key | PostgreSQL-specific; requires matching ORDER BY prefix |
COUNT(DISTINCT col) | counts unique non-NULL values | common for metrics |
SUM(DISTINCT col) | sums unique values | supported, but be careful with semantics |
Key Rules and Considerations
- DISTINCT applies to the entire selected row. With multiple columns, uniqueness is the combination.
NULLvalues are treated as equal for DISTINCT purposes (you get one NULL in the output per distinct set).- DISTINCT often requires PostgreSQL to sort or hash data. On large datasets it can be expensive.
- For
SELECT DISTINCT, PostgreSQL requires ORDER BY expressions to appear in the select list. - For
DISTINCT ON, the ORDER BY must start with the same expressions asDISTINCT ON (...).
Step-by-Step Examples
Example 1: Basic Single Column DISTINCT (Beginner)
Create a table with duplicates, then select unique values.
-- Sample table
CREATE TABLE employees (
employee_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
department text NOT NULL,
salary numeric(12,2) NOT NULL CHECK (salary >= 0)
);
-- Sample data (duplicates in department)
INSERT INTO employees (full_name, department, salary) VALUES
('John Smith', 'IT', 75000.00),
('Jane Doe', 'HR', 65000.00),
('Bob Johnson', 'IT', 80000.00),
('Alice Brown', 'Finance', 70000.00),
('Charlie Davis','IT', 75000.00);
-- Query: unique departments
SELECT DISTINCT department
FROM employees
ORDER BY department;
Expected output:
department
------------
Finance
HR
IT
(3 rows)
Explanation:
DISTINCT departmentcollapses repeated department values into one row per department.ORDER BY departmentproduces stable, readable output.
Example 2: Multiple Column DISTINCT (Intermediate)
DISTINCT applies to the combination of the selected columns.
-- Query: unique (department, salary) combinations
SELECT DISTINCT department, salary
FROM employees
ORDER BY department, salary;
Expected output:
department | salary
------------+----------
Finance | 70000.00
HR | 65000.00
IT | 75000.00
IT | 80000.00
(4 rows)
Explanation:
ITappears twice because there are two distinct salary values in IT.- This is not "distinct per column"; it is distinct per row combination.
Example 3: DISTINCT with WHERE Clause (Intermediate)
Filter first, then de-duplicate the remaining rows.
-- Query: unique departments where salary is above 70000
SELECT DISTINCT department
FROM employees
WHERE salary > 70000
ORDER BY department;
Expected output:
department
------------
IT
(1 row)
Explanation:
WHERE salary > 70000narrows the input set.- DISTINCT is applied after filtering.
Example 4: Real-World Scenario + PostgreSQL DISTINCT ON (Advanced)
DISTINCT ON solves "latest row per key" problems without a self-join. It is PostgreSQL-specific.
-- Sample orders table (multiple orders per customer)
CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL CHECK (status IN ('pending', 'paid', 'cancelled')),
created_at timestamptz NOT NULL
);
INSERT INTO orders (customer_id, status, created_at) VALUES
(101, 'paid', '2026-03-01 10:00:00+00'),
(101, 'paid', '2026-03-03 10:00:00+00'),
(102, 'pending', '2026-03-02 09:00:00+00'),
(102, 'paid', '2026-03-04 12:00:00+00'),
(103, 'cancelled','2026-03-03 08:30:00+00');
-- Query: latest order per customer
SELECT DISTINCT ON (customer_id)
customer_id,
order_id,
status,
created_at
FROM orders
ORDER BY customer_id, created_at DESC;
Expected output:
customer_id | order_id | status | created_at
-------------+----------+-----------+---------------------------
101 | 2 | paid | 2026-03-03 10:00:00+00
102 | 4 | paid | 2026-03-04 12:00:00+00
103 | 5 | cancelled | 2026-03-03 08:30:00+00
(3 rows)
Explanation:
DISTINCT ON (customer_id)groups bycustomer_idand keeps one row per customer.ORDER BY customer_id, created_at DESCselects the latest row per customer.- The ORDER BY must start with the DISTINCT ON expressions.
Example 5: Counting Unique Values (Advanced)
Counting distinct values is a common reporting requirement.
-- Query: how many unique customers have at least one paid order?
SELECT COUNT(DISTINCT customer_id) AS paid_customers
FROM orders
WHERE status = 'paid';
Expected output:
paid_customers
---------------
2
(1 row)
Explanation:
COUNT(DISTINCT customer_id)counts unique customers.WHERE status = 'paid'ensures only paid orders contribute.
Practical Use Cases
1. Customer Analysis (E-commerce)
Business problem: Find unique customers who purchased in the last 30 days.
SELECT DISTINCT customer_id
FROM orders
WHERE created_at >= now() - interval '30 days'
AND status = 'paid'
ORDER BY customer_id;
2. Product Catalog Reporting (Analytics)
Business problem: List all unique categories to build a filter menu.
SELECT DISTINCT category
FROM products
WHERE active = true
ORDER BY category;
3. Multi-Region Ops (SaaS)
Business problem: Identify unique (city, region) combinations with active customers.
SELECT DISTINCT city, region
FROM customers
WHERE deleted_at IS NULL
ORDER BY region, city;
4. Data Quality Auditing (ETL)
Business problem: Detect unexpected email domains.
SELECT DISTINCT split_part(email, '@', 2) AS email_domain
FROM customers
WHERE email IS NOT NULL
ORDER BY email_domain;
5. Latest Event Per Device (IoT)
Business problem: For each device, fetch the newest telemetry record.
SELECT DISTINCT ON (device_id)
device_id,
recorded_at,
temperature_c
FROM sensor_readings
ORDER BY device_id, recorded_at DESC;
Common Mistakes & Troubleshooting
Mistake 1: Misunderstanding multi-column DISTINCT
Wrong assumption:
-- You might think this returns unique names and unique departments separately.
SELECT DISTINCT full_name, department
FROM employees;
Bad outcome: it returns unique combinations, not per-column uniqueness.
Fix:
SELECT DISTINCT department FROM employees;
-- or
SELECT DISTINCT full_name FROM employees;
Mistake 2: Using DISTINCT to "fix" a join bug
Wrong SQL:
-- DISTINCT hides duplication caused by an incorrect join.
SELECT DISTINCT c.customer_id
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
JOIN order_items i ON i.order_id = o.order_id;
Bad outcome: query appears to work, but hides row multiplication and can still be slow.
Fix: validate join keys and aggregate at the right level.
SELECT c.customer_id
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
Mistake 3: ORDER BY column not in SELECT list (PostgreSQL DISTINCT)
Wrong SQL:
SELECT DISTINCT department
FROM employees
ORDER BY salary DESC;
Error message: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
Fix: either order by the selected column, or include the ordering expression in the SELECT list.
SELECT DISTINCT department
FROM employees
ORDER BY department;
If you truly need "department ordered by max salary", use GROUP BY:
SELECT department, MAX(salary) AS max_salary
FROM employees
GROUP BY department
ORDER BY max_salary DESC;
Mistake 4: DISTINCT ON ordering does not match
Wrong SQL:
SELECT DISTINCT ON (customer_id)
customer_id, order_id, created_at
FROM orders
ORDER BY created_at DESC, customer_id;
Error message: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Fix: make the ORDER BY start with the DISTINCT ON expressions.
SELECT DISTINCT ON (customer_id)
customer_id, order_id, created_at
FROM orders
ORDER BY customer_id, created_at DESC;
Debugging Tips
- Verify whether duplicates are in the data or introduced by joins.
- Start with
SELECT ... LIMIT 10and remove DISTINCT to see raw rows. - If you are using
DISTINCT ON, confirm that ORDER BY starts with the DISTINCT ON key. - Use
EXPLAINif DISTINCT becomes slow on large tables (see module 18).
Best Practices
✅ Always use DISTINCT when you truly need unique values (not to hide query bugs).
✅ Always use ORDER BY with DISTINCT when output order matters.
✅ Always prefer DISTINCT ON for "latest row per key" patterns (with correct ORDER BY).
✅ Always consider indexes for DISTINCT-heavy query patterns (key columns and ordering columns).
✅ Always consider schema constraints (UNIQUE) to prevent duplicates instead of cleaning them in queries.
❌ Avoid using DISTINCT as a band-aid for incorrect joins.
❌ Avoid DISTINCT on very large datasets without checking the execution plan.
❌ Avoid NOT IN patterns with NULL-producing subqueries; prefer EXISTS when excluding sets.
Hands-On Practice
Exercise 1 (Easy)
Task: Return all unique departments from employees.
-- Write your query here
Solution:
SELECT DISTINCT department
FROM employees
ORDER BY department;
Exercise 2 (Medium)
Task: Count unique customers who have at least one paid order.
-- Write your query here
Solution:
SELECT COUNT(DISTINCT customer_id) AS paid_customers
FROM orders
WHERE status = 'paid';
Exercise 3 (Advanced)
Task: Return the latest order per customer using DISTINCT ON.
-- Write your query here
Solution:
SELECT DISTINCT ON (customer_id)
customer_id, order_id, status, created_at
FROM orders
ORDER BY customer_id, created_at DESC;
Connection to Other Concepts
| Related lesson | Why it matters |
|---|---|
| SELECT Statement | DISTINCT is a modifier on SELECT results |
| WHERE Clause | Filter first to reduce the workload before de-duplication |
| GROUP BY | Use GROUP BY when you need metrics per group |
| Joins | Incorrect joins are a common source of duplicates |
| Index Strategy | Indexes can reduce DISTINCT cost for common patterns |
| Window Functions | Alternative approach for "latest per group" using row_number |
Visual Learning Diagram
flowchart TD
A[Basic SELECT] --> B{Need unique rows?}
B -->|Yes| C[SELECT DISTINCT]
B -->|No| D[Standard SELECT]
C --> E[Single column DISTINCT]
C --> F[Multi-column DISTINCT]
C --> G[DISTINCT ON (PostgreSQL)]
F --> H[COUNT(DISTINCT ...)]
C --> I[Performance check: EXPLAIN]
I --> J[Index strategy]
G --> K[Alternative: Window functions]
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,D,E,F,G,H,I,J,K allNodes
class C highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Using DISTINCT to hide join duplication | Metrics and performance remain wrong | Fix join keys or aggregate at correct grain |
| Assuming DISTINCT applies per column | Unexpected duplicates remain | Remember: DISTINCT applies to full row combination |
| DISTINCT on large tables without filters | Slow sorts/hashes | Filter first and add supporting indexes |
| DISTINCT ON with incorrect ORDER BY prefix | Query fails | Order by key first: ORDER BY key, ts DESC |
| ORDER BY expression not in SELECT list with DISTINCT | Query fails | Order by selected columns or use GROUP BY |
Quick Reference
SELECT DISTINCT col FROM t;
SELECT DISTINCT col1, col2 FROM t;
SELECT COUNT(DISTINCT col) FROM t;
SELECT DISTINCT ON (key) key, other FROM t ORDER BY key, ts DESC;
SELECT col, MAX(x) FROM t GROUP BY col; -- alternative to order by non-selected columns
What's Next
- Previous: SELECT Statement - Review the previous lesson to reinforce context.
- Next: WHERE Clause - Continue to the next concept with incremental complexity.
- Module Overview - Return to this module index and choose another related lesson.