IN Operator
Use this lesson to understand IN Operator with practical syntax and examples.
Concept Overview
The IN operator checks whether a value matches any value in a set. It is a readable alternative to chaining many OR conditions.
For example, these are equivalent:
-- Many ORs
WHERE status = 'queued' OR status = 'processing' OR status = 'failed'
-- IN list
WHERE status IN ('queued', 'processing', 'failed')
Why is it important?
- Readability: makes filters easier to review and maintain
- Correctness: reduces mistakes compared to many repeated
ORs - Flexibility: supports both static lists and subqueries
- Safety: encourages explicit set membership instead of ad-hoc string matching
Where does it fit?
IN is commonly used in WHERE, and sometimes in JOIN ... ON or HAVING. PostgreSQL also provides set-membership alternatives like = ANY(array) and correlated subqueries with EXISTS.
Syntax & Rules
Core Syntax
-- Static list
SELECT *
FROM t
WHERE col IN (value1, value2, value3);
-- Subquery
SELECT *
FROM t
WHERE col IN (SELECT other_col FROM other_table);
Available Options / Parameters
| Form | What it does | Notes (PostgreSQL) |
|---|---|---|
col IN (v1, v2, ...) | membership in a literal list | good for small, static sets |
col NOT IN (v1, ...) | negated membership | be careful with NULLs |
col IN (SELECT ...) | membership in a derived set | often used for “in related table” queries |
col = ANY(array) | membership in an array | useful for parameterized lists |
EXISTS (...) | tests presence of at least one matching row | often safer than IN/NOT IN with NULLs |
Key Rules and Considerations
- Keep data types consistent between
coland the list/subquery values. NOT INis dangerous when the list or subquery can produceNULL.- Very large literal lists can be hard to maintain; prefer temp tables, joins, or arrays.
- For parameterized queries (application code),
= ANY($1)is often cleaner than dynamically buildingIN (...).
Step-by-Step Examples
Example 1: Filter by a Static List (Beginner)
CREATE TABLE employees (
employee_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
department text NOT NULL
);
INSERT INTO employees (full_name, department) VALUES
('Alice', 'HR'),
('Bob', 'IT'),
('Charlie', 'Marketing'),
('Denise', 'IT'),
('Eva', 'HR');
SELECT employee_id, full_name, department
FROM employees
WHERE department IN ('IT', 'HR')
ORDER BY employee_id;
Expected output:
employee_id | full_name | department
-------------+-----------+------------
1 | Alice | HR
2 | Bob | IT
4 | Denise | IT
5 | Eva | HR
(4 rows)
Explanation:
IN ('IT','HR')is shorthand for twoORconditions.
Example 2: IN with Numeric Values (Intermediate)
SELECT employee_id, full_name, department
FROM employees
WHERE employee_id IN (1, 3, 5)
ORDER BY employee_id;
Expected output:
employee_id | full_name | department
-------------+-----------+------------
1 | Alice | HR
3 | Charlie | Marketing
5 | Eva | HR
(3 rows)
Example 3: IN with a Subquery (Intermediate)
CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
employee_id bigint NOT NULL,
status text NOT NULL
);
INSERT INTO orders (employee_id, status) VALUES
(2, 'paid'),
(2, 'paid'),
(4, 'pending');
-- Employees who have at least one order
SELECT employee_id, full_name
FROM employees
WHERE employee_id IN (
SELECT employee_id
FROM orders
)
ORDER BY employee_id;
Expected output:
employee_id | full_name
-------------+----------
2 | Bob
4 | Denise
(2 rows)
Explanation:
- The subquery produces a set of employee IDs present in
orders. - If you only care about existence,
EXISTSis often a good alternative.
Example 4: Parameter-Friendly Membership with = ANY (Advanced)
In application code, you often have a parameter that is an array of values.
-- Literal array example (application code would pass $1)
SELECT employee_id, full_name, department
FROM employees
WHERE department = ANY(ARRAY['IT', 'HR'])
ORDER BY employee_id;
Expected output:
employee_id | full_name | department
-------------+-----------+------------
1 | Alice | HR
2 | Bob | IT
4 | Denise | IT
5 | Eva | HR
(4 rows)
Explanation:
col = ANY(array)behaves likecol IN (...)but is easier to parameterize.
Example 5: The NOT IN + NULL Pitfall (Advanced)
This is the most common correctness trap with IN.
CREATE TABLE blocked_employees (
employee_id bigint
);
INSERT INTO blocked_employees (employee_id) VALUES
(3),
(NULL);
-- Attempt to find employees that are not blocked
SELECT employee_id, full_name
FROM employees
WHERE employee_id NOT IN (SELECT employee_id FROM blocked_employees)
ORDER BY employee_id;
Bad outcome:
- This query returns 0 rows because the subquery contains
NULL.
Fix (safe exclusion using NOT EXISTS):
SELECT e.employee_id, e.full_name
FROM employees e
WHERE NOT EXISTS (
SELECT 1
FROM blocked_employees b
WHERE b.employee_id = e.employee_id
)
ORDER BY e.employee_id;
Expected output (NOT EXISTS):
employee_id | full_name
-------------+----------
1 | Alice
2 | Bob
4 | Denise
5 | Eva
(4 rows)
Explanation:
NOT EXISTSis robust even when the blocking table containsNULL.
Practical Use Cases
1) Filter by allowed workflow states
SELECT job_id, status
FROM jobs
WHERE status IN ('queued', 'processing', 'retrying');
2) Segment customers by region
SELECT customer_id, region
FROM customers
WHERE region IN ('West', 'Northwest');
3) Apply a rule to a small set of IDs
UPDATE products
SET featured = true
WHERE product_id IN (1001, 1003, 1007)
RETURNING product_id, featured;
4) Filter by a dynamic set from another table
SELECT user_id, email
FROM users
WHERE user_id IN (SELECT user_id FROM beta_program);
5) Parameterized filters in application code
-- Example: $1 is a text[]
SELECT order_id, status
FROM orders
WHERE status = ANY($1);
Common Mistakes & Troubleshooting
1) Mixing data types in the IN list
Wrong SQL:
-- department is text, but one element is numeric
WHERE department IN ('IT', 2)
Typical error:
ERROR: invalid input syntax for type integer: "IT"
Fix:
WHERE department IN ('IT', 'HR')
2) Using a huge literal list
Wrong approach:
-- Thousands of IDs hard-coded into a query
WHERE user_id IN (1, 2, 3, 4, 5, ...)
Bad outcome:
- Hard to maintain and can be slower than a join.
Fix options:
-- Option A: use a staging/temp table and join
-- Option B: use a parameterized array and = ANY($1)
WHERE user_id = ANY($1)
3) Using NOT IN when NULLs are possible
Wrong SQL:
WHERE employee_id NOT IN (SELECT employee_id FROM blocked_employees)
Bad outcome:
- Can return 0 rows if the subquery includes NULL.
Fix:
WHERE NOT EXISTS (
SELECT 1
FROM blocked_employees b
WHERE b.employee_id = e.employee_id
)
4) Confusing IN with range filtering
Wrong SQL:
-- This checks membership in two values, not a range
WHERE price IN (10, 20)
Fix:
WHERE price BETWEEN 10 AND 20
-- or: WHERE price >= 10 AND price <= 20
Debugging checklist:
- Confirm the data type of the filtered column.
- Run the list/subquery alone and inspect for unexpected
NULLvalues. - If using
NOT IN, switch toNOT EXISTSwhen NULLs may appear. - For very large sets, prefer joins or
= ANY(array). - If performance is slow, use
EXPLAINand check indexes.
Best Practices
- ✅ Use
IN (...)for small, readable sets. ❌ Avoid giant literal lists that hide business logic. - ✅ Prefer
= ANY($1)for parameterized lists. ❌ Avoid buildingIN (...)strings dynamically in application code. - ✅ Use
NOT EXISTSwhen excluding against a nullable set. ❌ AvoidNOT IN (subquery)if the subquery can contain NULL. - ✅ Keep data types consistent between the column and the values. ❌ Avoid implicit casts that can surprise you or hurt performance.
- ✅ Consider indexing the filtered column when it is used frequently. ❌ Avoid relying on sequential scans for high-traffic filters.
Hands-On Practice
Use this setup for the exercises:
CREATE TABLE practice_products (
product_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
category text NOT NULL
);
INSERT INTO practice_products (name, category) VALUES
('Laptop', 'Electronics'),
('Desk', 'Furniture'),
('Chair', 'Furniture'),
('Headphones', 'Electronics'),
('Pen', 'Stationery');
Exercise 1 (Easy): Filter by one category using IN
Task: Select products in the Furniture category.
-- Your SQL here
Solution:
SELECT product_id, name
FROM practice_products
WHERE category IN ('Furniture')
ORDER BY product_id;
Exercise 2 (Medium): Exclude a set with NOT IN (safe data)
Task: Select products that are not in Electronics or Stationery.
-- Your SQL here
Solution:
SELECT product_id, name, category
FROM practice_products
WHERE category NOT IN ('Electronics', 'Stationery')
ORDER BY product_id;
Exercise 3 (Advanced): Safe exclusion with NOT EXISTS
Task: Create a practice_banned_products table (including a NULL row) and select products that are not banned using NOT EXISTS.
-- Your SQL here
Solution:
CREATE TABLE practice_banned_products (product_id bigint);
INSERT INTO practice_banned_products (product_id) VALUES (2), (NULL);
SELECT p.product_id, p.name
FROM practice_products p
WHERE NOT EXISTS (
SELECT 1
FROM practice_banned_products b
WHERE b.product_id = p.product_id
)
ORDER BY p.product_id;
Connection to Other Concepts
| Concept | Why it matters |
|---|---|
AND / OR / NOT | IN is often used instead of many ORs |
BETWEEN | better for ranges than IN (a, b) |
| NULL Values | explains why NOT IN + NULL can break filters |
| Subqueries | IN (SELECT ...) depends on subquery semantics |
EXISTS | common safer alternative to IN/NOT IN |
| Joins | often replace large membership lists efficiently |
Visual Learning Diagram
flowchart TD
A[WHERE Filtering] --> B[IN / NOT IN]
B --> C[Static List]
B --> D[Subquery]
B --> E[= ANY(array)]
B --> F[NOT EXISTS]
D --> G[NULL Pitfall]
G --> F
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G allNodes
class B highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
NOT IN (subquery) with NULLs | returns 0 rows unexpectedly | use NOT EXISTS |
| Huge literal list | hard to maintain, slow plans | use arrays, joins, or staging tables |
| Mixed data types | errors or unexpected casts | keep types consistent |
Using IN for ranges | wrong results | use BETWEEN or inequalities |
Treating IN as a substitute for joins | complex queries | join to a lookup table instead |
Quick Reference
WHERE col IN ('a', 'b', 'c');
WHERE col IN (SELECT x FROM t);
WHERE col = ANY(ARRAY[1,2,3]);
WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.x = outer.x);
WHERE col NOT IN ('x', 'y');
What's Next
- Previous: Wildcards ('%' and '_') - Review the previous lesson to reinforce context.
- Next: BETWEEN Operator - Continue to the next concept with incremental complexity.
- Module Overview - Return to this module index and choose another related lesson.