GROUP BY
Use this lesson to understand GROUP BY with practical syntax and examples.
Concept Overview
GROUP BY transforms detail rows into summary rows by grouping records that share the same key(s), then applying aggregate functions (like COUNT, SUM, AVG, MIN, MAX) to each group.
Why is it important?
- Reporting: metrics per region, per customer, per product
- Analytics: cohort summaries, conversion rates, revenue by segment
- Data quality: detect duplicates and outliers by grouping
- Performance: correct grouping reduces data movement and lets the database aggregate efficiently
Where does it fit?
GROUP BY sits between row-level filtering (WHERE) and group-level filtering (HAVING). It is the core technique for aggregation and is commonly combined with joins, conditional aggregates (FILTER), and ordering.
Syntax & Rules
Core Syntax
SELECT group_key_1, group_key_2, ...,
aggregate_fn(expr) AS metric
FROM table_name
WHERE row_filters
GROUP BY group_key_1, group_key_2, ...
ORDER BY metric DESC;
Available Options / Parameters
| Clause/Feature | What it does | PostgreSQL notes |
|---|---|---|
WHERE | filters rows before grouping | reduces work and changes results |
GROUP BY | defines grouping keys | every non-aggregated selected column must be grouped |
HAVING | filters groups after aggregation | use for aggregate predicates |
FILTER (WHERE ...) | conditional aggregate | PostgreSQL feature: SUM(x) FILTER (WHERE cond) |
ORDER BY | sorts final groups | use NULLS LAST when aggregates can be NULL |
Key Rules and Considerations
- Every selected column must either be an aggregate (
SUM(amount)) or appear inGROUP BY. - Use
WHEREto remove rows you do not want to include in the aggregates. - Use
HAVINGfor conditions on aggregated values. - Conditional aggregates are usually cleaner with
FILTERthan withCASEexpressions.
Step-by-Step Examples
Example 1: Basic Grouping (Beginner)
CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL,
region text NOT NULL,
status text NOT NULL,
amount numeric(12,2) NOT NULL CHECK (amount >= 0),
created_at date NOT NULL
);
INSERT INTO orders (customer_id, region, status, amount, created_at) VALUES
(10, 'US', 'paid', 25.00, '2026-01-01'),
(10, 'US', 'paid', 10.00, '2026-01-02'),
(11, 'EU', 'paid', 50.00, '2026-01-03'),
(12, 'EU', 'failed', 20.00, '2026-01-03'),
(13, 'APAC', 'paid', 40.00, '2026-01-04'),
(13, 'APAC', 'pending', 15.00, '2026-01-05');
SELECT region, COUNT(*) AS order_count
FROM orders
GROUP BY region
ORDER BY region;
Expected output:
region | order_count
--------+------------
APAC | 2
EU | 2
US | 2
(3 rows)
Explanation:
GROUP BY regionproduces one output row per region.COUNT(*)counts all rows in each region.
Example 2: Multiple Aggregates per Group (Intermediate)
SELECT
region,
COUNT(*) AS orders_total,
SUM(amount) AS amount_total,
AVG(amount) AS avg_amount
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY region
ORDER BY amount_total DESC;
Expected output:
region | orders_total | amount_total | avg_amount
--------+-------------+--------------+--------------------
APAC | 2 | 55.00 | 27.5000000000000000
EU | 2 | 70.00 | 35.0000000000000000
US | 2 | 35.00 | 17.5000000000000000
(3 rows)
Explanation:
WHEREfilters rows before aggregation.SUMandAVGignore NULLs, butamountis NOT NULL here.
Example 3: Conditional Aggregates with FILTER (Advanced)
PostgreSQL's FILTER lets you compute multiple conditional metrics in one pass.
SELECT
region,
COUNT(*) AS orders_total,
COUNT(*) FILTER (WHERE status = 'paid') AS orders_paid,
SUM(amount) FILTER (WHERE status = 'paid') AS revenue_paid,
SUM(amount) FILTER (WHERE status = 'failed') AS failed_amount
FROM orders
GROUP BY region
ORDER BY region;
Expected output:
region | orders_total | orders_paid | revenue_paid | failed_amount
--------+-------------+------------+-------------+--------------
APAC | 2 | 1 | 40.00 |
EU | 2 | 1 | 50.00 | 20.00
US | 2 | 2 | 35.00 |
(3 rows)
Explanation:
FILTERapplies only to the aggregate it is attached to.SUM(...)returns NULL when there are no matching rows (for example, no failed orders in US).
Example 4: Grouping by Time Buckets (Advanced)
SELECT
date_trunc('month', created_at::timestamptz) AS month,
COUNT(*) AS orders
FROM orders
GROUP BY month
ORDER BY month;
Expected output:
month | orders
------------------------+-------
2026-01-01 00:00:00+00 | 6
(1 row)
Explanation:
date_trunc()is a common way to group by month/day/hour.- Use a stable bucket definition and consistent time zone conventions for production reporting.
Practical Use Cases
1) Revenue by region for dashboards
SELECT region, SUM(amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY region
ORDER BY revenue DESC;
2) Customer-level activity summaries
SELECT customer_id, COUNT(*) AS orders
FROM orders
GROUP BY customer_id;
3) Detect duplicates by grouping
SELECT email, COUNT(*) AS copies
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
4) Operational monitoring (fail rate)
SELECT
region,
COUNT(*) FILTER (WHERE status = 'failed') AS failed,
COUNT(*) AS total
FROM orders
WHERE created_at >= current_date - 7
GROUP BY region;
5) Inventory totals by warehouse
SELECT warehouse_id, SUM(quantity_on_hand) AS stock
FROM inventory
GROUP BY warehouse_id;
Common Mistakes & Troubleshooting
1) Selecting a non-aggregated column that is not grouped
Wrong SQL:
SELECT region, customer_id, COUNT(*)
FROM orders
GROUP BY region;
Typical error:
ERROR: column "orders.customer_id" must appear in the GROUP BY clause or be used in an aggregate function
Fix:
SELECT region, COUNT(*)
FROM orders
GROUP BY region;
2) Using HAVING for row-level filters
Wrong approach:
SELECT region, COUNT(*)
FROM orders
GROUP BY region
HAVING created_at >= '2026-01-01';
Bad outcome:
created_atis a row-level column and this is not the right place for filtering.
Fix:
SELECT region, COUNT(*)
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY region;
3) Confusing NULL results in SUM/AVG
Bad outcome:
- Conditional
SUM(...) FILTER (...)can return NULL when no rows match.
Fix:
SELECT COALESCE(SUM(amount) FILTER (WHERE status = 'failed'), 0) AS failed_amount
FROM orders;
4) Inflated metrics after joins
Wrong approach:
-- If each order has multiple items, SUM(order.amount) after joining items overcounts.
SELECT SUM(o.amount)
FROM orders o
JOIN order_items i ON i.order_id = o.order_id;
Fix:
- Aggregate at the correct grain (orders) before joining, or aggregate item-level revenue instead.
Debugging checklist:
- Confirm the grouping keys match the business question.
- Verify
WHEREfilters are applied before grouping. - Ensure every selected non-aggregate column appears in
GROUP BY. - Check NULL behavior for conditional aggregates.
- If joins are involved, confirm the join does not multiply rows unexpectedly.
Best Practices
- ✅ Filter early with
WHEREto reduce work and improve correctness. ❌ Avoid grouping huge raw tables without a time window. - ✅ Alias aggregate outputs (
SUM(amount) AS revenue). ❌ Avoid unlabeled aggregates in reporting queries. - ✅ Use
FILTERfor conditional metrics in PostgreSQL. ❌ Avoid repeating the same query multiple times with slightly different WHERE clauses. - ✅ Validate results with a small sample and sanity checks. ❌ Avoid trusting aggregates without spot-checking inputs.
- ✅ Watch out for join multiplication when aggregating. ❌ Avoid summing fact columns after a join that multiplies rows.
Hands-On Practice
Use this setup for the exercises:
CREATE TABLE practice_orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
region text NOT NULL,
status text NOT NULL,
amount numeric(12,2) NOT NULL
);
INSERT INTO practice_orders (region, status, amount) VALUES
('US', 'paid', 10.00),
('US', 'failed', 5.00),
('EU', 'paid', 20.00),
('EU', 'paid', 30.00);
Exercise 1 (Easy): Orders per region
Task: Return the number of orders per region.
-- Your SQL here
Solution:
SELECT region, COUNT(*) AS orders
FROM practice_orders
GROUP BY region
ORDER BY region;
Exercise 2 (Medium): Paid revenue per region
Task: Return paid revenue per region.
-- Your SQL here
Solution:
SELECT region, SUM(amount) FILTER (WHERE status = 'paid') AS revenue_paid
FROM practice_orders
GROUP BY region
ORDER BY region;
Exercise 3 (Advanced): Paid order rate per region
Task: Return total orders, paid orders, and the paid ratio per region.
-- Your SQL here
Solution:
SELECT
region,
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'paid') AS paid_orders,
(COUNT(*) FILTER (WHERE status = 'paid')::numeric / COUNT(*)) AS paid_ratio
FROM practice_orders
GROUP BY region
ORDER BY region;
Connection to Other Concepts
| Concept | Why it matters |
|---|---|
| Aggregate functions | GROUP BY is the common way to compute aggregates per key |
WHERE | filters rows before grouping |
HAVING | filters groups after aggregation |
| Joins | grouping after joins requires grain awareness |
| ROLLUP / GROUPING SETS | advanced group-by constructs for subtotals |
| NULL Values | affects conditional sums and counts |
Visual Learning Diagram
flowchart TD
A[Raw Rows] --> B[WHERE]
B --> C[GROUP BY]
C --> D[Aggregates]
D --> E[HAVING]
D --> F[ORDER BY]
C --> G[Conditional Aggregates (FILTER)]
classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G allNodes
class C highlight
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Selecting non-grouped columns | query error | group those columns or aggregate them |
| Filtering too late | slow and wrong metrics | use WHERE before GROUP BY |
| Wrong grouping key | misleading KPIs | validate with sample rows |
| Conditional SUM returns NULL | confusing dashboards | use COALESCE where appropriate |
| Join multiplication | inflated totals | aggregate at correct grain |
Quick Reference
SELECT k, COUNT(*) FROM t GROUP BY k;
SELECT k, SUM(v) FROM t WHERE status = 'paid' GROUP BY k;
SELECT a, b, AVG(score) FROM t GROUP BY a, b;
SELECT k, SUM(v) FILTER (WHERE ok) FROM t GROUP BY k;
SELECT k, COUNT(*) FROM t GROUP BY k HAVING COUNT(*) > 10;
What's Next
- Next: HAVING Clause - Continue to the next concept with incremental complexity.
- Module Overview - Return to this module index and choose another related lesson.