Skip to main content

JSONB Querying and Indexing

Learning Focus

Use this lesson to query jsonb with operators (->, ->>, #>>, @>, ?) and design indexes that match real JSON query patterns.

Concept Overview

jsonb stores JSON in a binary format that supports indexing and efficient querying.

Use jsonb for flexible attributes (metadata, optional fields, external payloads), not for core relational keys.

Why is it important?

  • Flexibility: ingest evolving payloads without constant schema migrations
  • Queryability: filter and extract JSON fields using SQL
  • Performance: the right GIN/expression/partial index can make JSON workloads fast
  • Pragmatism: keep stable join keys relational, and JSON for “edge attributes”

Where does it fit?

jsonb commonly appears in event logs, audit trails, product metadata, and integration tables. In practice, it pairs well with:

  • B-tree indexes on relational columns
  • GIN indexes for containment queries
  • expression/partial indexes for hot JSON keys

Syntax & Rules

Core Syntax

-- Extract JSON vs text
payload -> 'type' -- jsonb
payload ->> 'type' -- text

-- Nested path extraction
payload #> '{user,id}' -- jsonb
payload #>> '{user,id}' -- text

-- Containment and existence
payload @> '{"type":"purchase"}'::jsonb
payload ? 'type'

Common Operators / Functions

OperatorMeaningExample
->get JSON valuepayload -> 'type'
->>get text valuepayload ->> 'type'
#>nested JSON pathpayload #> '{user,id}'
#>>nested text pathpayload #>> '{user,id}'
@>containmentpayload @> '{"type":"purchase"}'
?key existspayload ? 'type'
`?`any key exists
?&all keys existpayload ?& array['a','b']

Useful functions:

FunctionUseExample
jsonb_typeof(jsonb)inspect typejsonb_typeof(payload->'items')
jsonb_array_elements(jsonb)unnest arrayjsonb_array_elements(payload->'items')
jsonb_build_object(...)build jsonbjsonb_build_object('k', 'v')
jsonb_set(target, path, new_value, create_missing)update jsonbjsonb_set(payload, '{k}', '"v"'::jsonb, true)

Key Rules and Considerations

  • Prefer jsonb over json for most query and indexing workloads.
  • ->> returns text. Cast when comparing/aggregating numeric or timestamp values.
  • GIN indexes help most with containment (@>) and key existence (?) patterns.
  • Equality checks on extracted text (like payload ->> 'type' = 'purchase') often benefit from an expression index.

Step-by-Step Examples

Use this setup for the examples:

DROP TABLE IF EXISTS events;

CREATE TABLE events (
event_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);

INSERT INTO events (payload, created_at) VALUES
(
'{"type":"purchase","user":{"id":1},"amount":35.00,"items":[{"sku":"A1","qty":1},{"sku":"B2","qty":2}]}'::jsonb,
'2026-03-01 10:00+00'
),
(
'{"type":"pageview","user":{"id":1},"path":"/pricing"}'::jsonb,
'2026-03-01 10:05+00'
),
(
'{"type":"purchase","user":{"id":2},"amount":10.00,"items":[{"sku":"A1","qty":1}]}'::jsonb,
'2026-03-02 09:00+00'
);

Example 1: Extract and Filter by a Key (Beginner)

SELECT
event_id,
payload ->> 'type' AS type,
payload #>> '{user,id}' AS user_id
FROM events
WHERE payload ->> 'type' = 'purchase';

Expected output:

 event_id |   type   | user_id
----------+----------+--------
1 | purchase | 1
3 | purchase | 2
(2 rows)

Explanation:

  • Use ->> when you need text for comparisons and display.
  • #>> '{user,id}' navigates nested keys and returns text.

Example 2: Containment Search With @> (Intermediate)

SELECT event_id
FROM events
WHERE payload @> '{"type":"purchase"}'::jsonb
ORDER BY event_id;

Expected output:

 event_id
----------
1
3
(2 rows)

Explanation:

  • @> checks whether the left JSON contains the right JSON structure.
  • This pattern is typically index-friendly with a GIN index.

Example 3: Query Arrays With jsonb_array_elements (Advanced)

Goal: Find purchase events that include SKU B2.

SELECT e.event_id
FROM events e
WHERE e.payload ->> 'type' = 'purchase'
AND EXISTS (
SELECT 1
FROM jsonb_array_elements(e.payload -> 'items') AS it(item)
WHERE it.item ->> 'sku' = 'B2'
)
ORDER BY e.event_id;

Expected output:

 event_id
----------
1
(1 row)

Explanation:

  • jsonb_array_elements turns a JSON array into a set of rows.
  • EXISTS keeps the result at “one row per event” even if multiple items match.

Example 4: Update JSONB With jsonb_set (Advanced)

Goal: Mark an event as a test record.

UPDATE events
SET payload = jsonb_set(payload, '{is_test}', 'true'::jsonb, true)
WHERE event_id = 2
RETURNING event_id, payload ->> 'type' AS type, payload ->> 'is_test' AS is_test;

Expected output:

 event_id |   type   | is_test
----------+----------+---------
2 | pageview | true
(1 row)

Practical Use Cases

1) Event Routing in ETL / Streams

Context: select only purchase events for downstream processing.

SELECT event_id, created_at
FROM events
WHERE payload @> '{"type":"purchase"}'::jsonb
ORDER BY created_at;

2) Reporting: Revenue Per User

Context: extract user.id and sum numeric amounts.

SELECT
(payload #>> '{user,id}')::bigint AS user_id,
SUM((payload ->> 'amount')::numeric) AS revenue
FROM events
WHERE payload ->> 'type' = 'purchase'
GROUP BY 1
ORDER BY 1;

3) Guardrails With Constraints

Context: ensure that purchases always have an amount key.

ALTER TABLE events
ADD CONSTRAINT purchase_must_have_amount
CHECK (
payload ->> 'type' <> 'purchase'
OR (payload ? 'amount')
);

4) Promote Hot Keys to Columns

Context: if you filter by type constantly, consider a real column (or at least a generated view/materialized approach).

-- Example: store event_type as a real column for faster filtering and better constraints
ALTER TABLE events ADD COLUMN event_type text;
UPDATE events SET event_type = payload ->> 'type';
CREATE INDEX idx_events_event_type ON events (event_type);

Common Mistakes & Troubleshooting

Mistake 1: Using -> When You Need Text

Wrong:

SELECT event_id
FROM events
WHERE payload -> 'type' = 'purchase';

What happens: you are comparing JSON to text; results will be wrong or error-prone.

Fix:

SELECT event_id
FROM events
WHERE payload ->> 'type' = 'purchase';

Mistake 2: Expecting a Generic GIN Index to Speed Up ->> Equality

Symptom: WHERE payload ->> 'type' = 'purchase' stays slow even after USING gin (payload).

Fix (expression index):

CREATE INDEX idx_events_type
ON events ((payload ->> 'type'));

Mistake 3: Forgetting Casts for Numeric Aggregation

Wrong:

SELECT SUM(payload ->> 'amount') FROM events;

What happens: text cannot be summed.

Fix:

SELECT SUM((payload ->> 'amount')::numeric) FROM events;

Debugging tips:

  1. Print both payload -> 'k' and payload ->> 'k' to confirm the type you are working with.
  2. Use jsonb_typeof(payload->'k') before casting.
  3. Run EXPLAIN (ANALYZE, BUFFERS) on slow JSON filters to see whether the intended index is used.
  4. If most rows are not relevant, prefer partial indexes.

Best Practices

  • ✅ Choose JSON query patterns first, then design indexes; ❌ index everything “just in case”.
  • ✅ Use @> for containment-style filters; ❌ chain many ->> comparisons when containment fits.
  • ✅ Create expression indexes for hot extracted keys; ❌ assume USING gin (payload) covers all predicate shapes.
  • ✅ Keep stable keys as columns (and constrain them); ❌ bury join keys and timestamps only in JSON.
  • ✅ Add CHECK constraints for critical payload contracts; ❌ rely solely on application code.

Hands-On Practice

Use this setup for the exercises:

DROP TABLE IF EXISTS practice_events;

CREATE TABLE practice_events (
event_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
payload jsonb NOT NULL
);

INSERT INTO practice_events (payload) VALUES
('{"type":"purchase","user":{"id":10},"amount":12.50,"items":[{"sku":"X","qty":1}]}'::jsonb),
('{"type":"purchase","user":{"id":10},"amount":20.00,"items":[{"sku":"Y","qty":2}]}'::jsonb),
('{"type":"pageview","user":{"id":11},"path":"/home"}'::jsonb);

Exercise 1 (Easy): Filter purchases

Task: Return event_id for purchase events.

-- Your SQL here

Solution:

SELECT event_id
FROM practice_events
WHERE payload ->> 'type' = 'purchase'
ORDER BY event_id;

Exercise 2 (Medium): Revenue per user

Task: Group purchases by user.id and sum amount.

-- Your SQL here

Solution:

SELECT
(payload #>> '{user,id}')::bigint AS user_id,
SUM((payload ->> 'amount')::numeric) AS revenue
FROM practice_events
WHERE payload ->> 'type' = 'purchase'
GROUP BY 1
ORDER BY 1;

Exercise 3 (Advanced): Add an index for a hot key

Task: Create an index that helps WHERE payload ->> 'type' = ....

-- Your SQL here

Solution:

CREATE INDEX idx_practice_events_type
ON practice_events ((payload ->> 'type'));

Connection to Other Concepts

ConceptWhy it matters
IndexesJSON performance depends on matching the index to the predicate
Data types and casting->> returns text; cast for numeric/date operations
ConstraintsCHECK constraints help enforce payload contracts
Window functionsanalyze extracted JSON values (ranks, running totals, deltas)
CTEskeep multi-step JSON extraction readable

Visual Learning Diagram

flowchart TD
A[jsonb payload] --> B[Query Pattern]
B --> C[Containment\n@> / ?]
B --> D[Extracted Key\n->> equality]
C --> E[GIN index\n(payload) or jsonb_path_ops]
D --> F[Expression index\n((payload->>'k'))]
B --> G[Mostly subset?]
G --> H[Partial index\nWHERE payload->>'type'='purchase']
B --> I[Extract + Cast]
I --> J[Aggregations / Reports]

classDef allNodes fill:#3e3e3e,stroke:#ffffff,stroke-width:2px,color:#f5f5f5
classDef highlight fill:#3e3e3e,stroke:#ffffff,stroke-width:4px,color:#f5f5f5
class A,B,C,D,E,F,G,H,I,J allNodes
class B highlight

Common Pitfalls

PitfallConsequencePrevention
Using json instead of jsonbweak indexing/performanceprefer jsonb
Comparing text to JSON incorrectlywrong results/errorsuse ->> for text or cast properly
Assuming one index fits all JSON queriesslow querieschoose GIN vs expression vs partial to match predicate
Storing stable keys only in JSONcomplex joins/constraintspromote stable keys to columns
Indexing without evidencelarger writes and diskadd indexes based on real workloads

Quick Reference

payload -> 'k'
payload ->> 'k'
payload #>> '{a,b}'
payload @> '{"k":"v"}'::jsonb
payload ? 'k'

What's Next