Skip to main content

IN Operator

Learning Focus

Use this lesson to understand IN Operator with practical syntax and examples.

Concept Overview

The IN operator checks whether a value matches any value in a set. It is a readable alternative to chaining many OR conditions.

For example, these are equivalent:

-- Many ORs
WHERE status = 'queued' OR status = 'processing' OR status = 'failed'

-- IN list
WHERE status IN ('queued', 'processing', 'failed')

Why is it important?

  • Readability: makes filters easier to review and maintain
  • Correctness: reduces mistakes compared to many repeated ORs
  • Flexibility: supports both static lists and subqueries
  • Safety: encourages explicit set membership instead of ad-hoc string matching

Where does it fit?

IN is commonly used in WHERE, and sometimes in JOIN ... ON or HAVING. PostgreSQL also provides set-membership alternatives like = ANY(array) and correlated subqueries with EXISTS.


Syntax & Rules

Core Syntax

-- Static list
SELECT *
FROM t
WHERE col IN (value1, value2, value3);

-- Subquery
SELECT *
FROM t
WHERE col IN (SELECT other_col FROM other_table);

Available Options / Parameters

FormWhat it doesNotes (PostgreSQL)
col IN (v1, v2, ...)membership in a literal listgood for small, static sets
col NOT IN (v1, ...)negated membershipbe careful with NULLs
col IN (SELECT ...)membership in a derived setoften used for “in related table” queries
col = ANY(array)membership in an arrayuseful for parameterized lists
EXISTS (...)tests presence of at least one matching rowoften safer than IN/NOT IN with NULLs

Key Rules and Considerations

  • Keep data types consistent between col and the list/subquery values.
  • NOT IN is dangerous when the list or subquery can produce NULL.
  • Very large literal lists can be hard to maintain; prefer temp tables, joins, or arrays.
  • For parameterized queries (application code), = ANY($1) is often cleaner than dynamically building IN (...).

Step-by-Step Examples

Example 1: Filter by a Static List (Beginner)

CREATE TABLE employees (
employee_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
full_name text NOT NULL,
department text NOT NULL
);

INSERT INTO employees (full_name, department) VALUES
('Alice', 'HR'),
('Bob', 'IT'),
('Charlie', 'Marketing'),
('Denise', 'IT'),
('Eva', 'HR');

SELECT employee_id, full_name, department
FROM employees
WHERE department IN ('IT', 'HR')
ORDER BY employee_id;

Expected output:

 employee_id | full_name | department
-------------+-----------+------------
1 | Alice | HR
2 | Bob | IT
4 | Denise | IT
5 | Eva | HR
(4 rows)

Explanation:

  • IN ('IT','HR') is shorthand for two OR conditions.

Example 2: IN with Numeric Values (Intermediate)

SELECT employee_id, full_name, department
FROM employees
WHERE employee_id IN (1, 3, 5)
ORDER BY employee_id;

Expected output:

 employee_id | full_name | department
-------------+-----------+------------
1 | Alice | HR
3 | Charlie | Marketing
5 | Eva | HR
(3 rows)

Example 3: IN with a Subquery (Intermediate)

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
employee_id bigint NOT NULL,
status text NOT NULL
);

INSERT INTO orders (employee_id, status) VALUES
(2, 'paid'),
(2, 'paid'),
(4, 'pending');

-- Employees who have at least one order
SELECT employee_id, full_name
FROM employees
WHERE employee_id IN (
SELECT employee_id
FROM orders
)
ORDER BY employee_id;

Expected output:

 employee_id | full_name
-------------+----------
2 | Bob
4 | Denise
(2 rows)

Explanation:

  • The subquery produces a set of employee IDs present in orders.
  • If you only care about existence, EXISTS is often a good alternative.

Example 4: Parameter-Friendly Membership with = ANY (Advanced)

In application code, you often have a parameter that is an array of values.

-- Literal array example (application code would pass $1)
SELECT employee_id, full_name, department
FROM employees
WHERE department = ANY(ARRAY['IT', 'HR'])
ORDER BY employee_id;

Expected output:

 employee_id | full_name | department
-------------+-----------+------------
1 | Alice | HR
2 | Bob | IT
4 | Denise | IT
5 | Eva | HR
(4 rows)

Explanation:

  • col = ANY(array) behaves like col IN (...) but is easier to parameterize.

Example 5: The NOT IN + NULL Pitfall (Advanced)

This is the most common correctness trap with IN.

CREATE TABLE blocked_employees (
employee_id bigint
);

INSERT INTO blocked_employees (employee_id) VALUES
(3),
(NULL);

-- Attempt to find employees that are not blocked
SELECT employee_id, full_name
FROM employees
WHERE employee_id NOT IN (SELECT employee_id FROM blocked_employees)
ORDER BY employee_id;

Bad outcome:

  • This query returns 0 rows because the subquery contains NULL.

Fix (safe exclusion using NOT EXISTS):

SELECT e.employee_id, e.full_name
FROM employees e
WHERE NOT EXISTS (
SELECT 1
FROM blocked_employees b
WHERE b.employee_id = e.employee_id
)
ORDER BY e.employee_id;

Expected output (NOT EXISTS):

 employee_id | full_name
-------------+----------
1 | Alice
2 | Bob
4 | Denise
5 | Eva
(4 rows)

Explanation:

  • NOT EXISTS is robust even when the blocking table contains NULL.

Practical Use Cases

1) Filter by allowed workflow states

SELECT job_id, status
FROM jobs
WHERE status IN ('queued', 'processing', 'retrying');

2) Segment customers by region

SELECT customer_id, region
FROM customers
WHERE region IN ('West', 'Northwest');

3) Apply a rule to a small set of IDs

UPDATE products
SET featured = true
WHERE product_id IN (1001, 1003, 1007)
RETURNING product_id, featured;

4) Filter by a dynamic set from another table

SELECT user_id, email
FROM users
WHERE user_id IN (SELECT user_id FROM beta_program);

5) Parameterized filters in application code

-- Example: $1 is a text[]
SELECT order_id, status
FROM orders
WHERE status = ANY($1);

Common Mistakes & Troubleshooting

1) Mixing data types in the IN list

Wrong SQL:

-- department is text, but one element is numeric
WHERE department IN ('IT', 2)

Typical error:

ERROR:  invalid input syntax for type integer: "IT"

Fix:

WHERE department IN ('IT', 'HR')

2) Using a huge literal list

Wrong approach:

-- Thousands of IDs hard-coded into a query
WHERE user_id IN (1, 2, 3, 4, 5, ...)

Bad outcome:

  • Hard to maintain and can be slower than a join.

Fix options:

-- Option A: use a staging/temp table and join
-- Option B: use a parameterized array and = ANY($1)
WHERE user_id = ANY($1)

3) Using NOT IN when NULLs are possible

Wrong SQL:

WHERE employee_id NOT IN (SELECT employee_id FROM blocked_employees)

Bad outcome:

  • Can return 0 rows if the subquery includes NULL.

Fix:

WHERE NOT EXISTS (
SELECT 1
FROM blocked_employees b
WHERE b.employee_id = e.employee_id
)

4) Confusing IN with range filtering

Wrong SQL:

-- This checks membership in two values, not a range
WHERE price IN (10, 20)

Fix:

WHERE price BETWEEN 10 AND 20
-- or: WHERE price >= 10 AND price <= 20

Debugging checklist:

  1. Confirm the data type of the filtered column.
  2. Run the list/subquery alone and inspect for unexpected NULL values.
  3. If using NOT IN, switch to NOT EXISTS when NULLs may appear.
  4. For very large sets, prefer joins or = ANY(array).
  5. If performance is slow, use EXPLAIN and check indexes.

Best Practices

  • ✅ Use IN (...) for small, readable sets. ❌ Avoid giant literal lists that hide business logic.
  • ✅ Prefer = ANY($1) for parameterized lists. ❌ Avoid building IN (...) strings dynamically in application code.
  • ✅ Use NOT EXISTS when excluding against a nullable set. ❌ Avoid NOT IN (subquery) if the subquery can contain NULL.
  • ✅ Keep data types consistent between the column and the values. ❌ Avoid implicit casts that can surprise you or hurt performance.
  • ✅ Consider indexing the filtered column when it is used frequently. ❌ Avoid relying on sequential scans for high-traffic filters.

Hands-On Practice

Use this setup for the exercises:

CREATE TABLE practice_products (
product_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
category text NOT NULL
);

INSERT INTO practice_products (name, category) VALUES
('Laptop', 'Electronics'),
('Desk', 'Furniture'),
('Chair', 'Furniture'),
('Headphones', 'Electronics'),
('Pen', 'Stationery');

Exercise 1 (Easy): Filter by one category using IN

Task: Select products in the Furniture category.

-- Your SQL here

Solution:

SELECT product_id, name
FROM practice_products
WHERE category IN ('Furniture')
ORDER BY product_id;

Exercise 2 (Medium): Exclude a set with NOT IN (safe data)

Task: Select products that are not in Electronics or Stationery.

-- Your SQL here

Solution:

SELECT product_id, name, category
FROM practice_products
WHERE category NOT IN ('Electronics', 'Stationery')
ORDER BY product_id;

Exercise 3 (Advanced): Safe exclusion with NOT EXISTS

Task: Create a practice_banned_products table (including a NULL row) and select products that are not banned using NOT EXISTS.

-- Your SQL here

Solution:

CREATE TABLE practice_banned_products (product_id bigint);
INSERT INTO practice_banned_products (product_id) VALUES (2), (NULL);

SELECT p.product_id, p.name
FROM practice_products p
WHERE NOT EXISTS (
SELECT 1
FROM practice_banned_products b
WHERE b.product_id = p.product_id
)
ORDER BY p.product_id;

Connection to Other Concepts

ConceptWhy it matters
AND / OR / NOTIN is often used instead of many ORs
BETWEENbetter for ranges than IN (a, b)
NULL Valuesexplains why NOT IN + NULL can break filters
SubqueriesIN (SELECT ...) depends on subquery semantics
EXISTScommon safer alternative to IN/NOT IN
Joinsoften replace large membership lists efficiently

Visual Learning Diagram

flowchart TD
A[WHERE Filtering] --> B[IN / NOT IN]
B --> C[Static List]
B --> D[Subquery]
B --> E[= ANY(array)]
B --> F[NOT EXISTS]
D --> G[NULL Pitfall]
G --> F

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G allNodes
class B highlight

Common Pitfalls

PitfallConsequencePrevention
NOT IN (subquery) with NULLsreturns 0 rows unexpectedlyuse NOT EXISTS
Huge literal listhard to maintain, slow plansuse arrays, joins, or staging tables
Mixed data typeserrors or unexpected castskeep types consistent
Using IN for rangeswrong resultsuse BETWEEN or inequalities
Treating IN as a substitute for joinscomplex queriesjoin to a lookup table instead

Quick Reference

WHERE col IN ('a', 'b', 'c');
WHERE col IN (SELECT x FROM t);
WHERE col = ANY(ARRAY[1,2,3]);
WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.x = outer.x);
WHERE col NOT IN ('x', 'y');

What's Next