PostgreSQL Core Concepts

Learning Focus

Use this lesson to understand PostgreSQL Core Concepts — the RDBMS architecture, key components, and operational behavior that every PostgreSQL user must know.

What is an RDBMS?

In the context of Relational Database Management Systems (RDBMS), a table is one of the most fundamental structures used to organize and store data. A table consists of rows and columns, where:

1. Columns (Fields)

Columns represent specific categories of data in the table. Each column stores data of a particular type such as text, integer, date, or boolean.
The name of each column represents the type of information it holds (e.g., customer_id, name, order_date).
In PostgreSQL, columns are strongly typed — implicit type coercions are minimal compared to other databases.

2. Rows (Records or Tuples)

A row represents a single individual record in the table. Each row contains data corresponding to each column.
For example, in a customers table, each row represents a different customer with their data across every column.

Example Table: `customers`

customer_id	customer_name	contact_name	address	city	postal_code	country
1	Alfreds Futterkiste	Maria Anders	Obere Str. 57	Berlin	12209	Germany
2	Ana Trujillo	Ana Trujillo	Avda. 2222	Mexico D.F.	05021	Mexico
3	Antonio Moreno	Antonio Moreno	Mataderos 2312	Mexico D.F.	05023	Mexico

Columns: customer_id, customer_name, contact_name, address, city, postal_code, country.
Rows: Each row represents a unique customer with their data.

PostgreSQL Object Hierarchy

PostgreSQL organizes objects in a unique hierarchy that is important to understand:

graph TD
    A[Cluster] --> B[Database 1]
    A --> C[Database 2]
    B --> D[Schema: public]
    B --> E[Schema: app]
    D --> F[Tables]
    D --> G[Views]
    E --> H[Tables]
    E --> I[Functions]

Key Terms

Term	Meaning	Why It Matters
Cluster	One PostgreSQL server instance (data directory)	Hosts multiple databases; managed by one `postmaster` process
Database	Logical container for schemas and objects	Applications usually isolate by database; cross-database queries are not directly supported
Schema	Namespace within a database (default: `public`)	Organizes tables, views, functions; controls privilege boundaries
Role	User or group identity	Handles authentication and authorization
Extension	Packaged functionality (`pg_stat_statements`, `uuid-ossp`, `PostGIS`)	Adds types, functions, indexes, and instrumentation without forking

Relationships Between Tables

Relational Databases use tables to store data, and these tables are related through keys.
For example, a customers table might be related to an orders table through a foreign key. The customer_id column in the orders table references the customer_id column in the customers table.

Key Points About Tables

Primary Key: A column (or set of columns) that uniquely identifies each row. PostgreSQL supports GENERATED ALWAYS AS IDENTITY (SQL standard) and SERIAL (legacy).
Foreign Key: A column that links to another table's primary key. PostgreSQL enforces referential integrity strictly.
Constraints: Rules applied to columns to ensure data integrity:
- NOT NULL — column cannot have NULL values
- UNIQUE — all values must be distinct
- CHECK — values must satisfy a boolean expression
- EXCLUDE — PostgreSQL-specific constraint for complex uniqueness rules

Architecture (Under the Hood)

Core Components

Component	What It Does	What You Notice
WAL (Write-Ahead Log)	Logs all changes before they are written to data files	Enables crash recovery, point-in-time recovery, streaming replication
MVCC (Multi-Version Concurrency Control)	Keeps multiple versions of each row	Readers never block writers; writers never block readers
Autovacuum	Background process that cleans up dead tuples and updates statistics	Prevents table bloat, keeps query planner accurate
Query Planner/Optimizer	Chooses the best execution plan for each query	Indexes, statistics, and table structure all affect plan choice
Shared Buffers	In-memory cache for frequently accessed data pages	Proper sizing improves read performance significantly
Background Writer	Writes dirty pages from shared buffers to disk	Reduces checkpoint I/O spikes

Process Architecture

flowchart TB
    Client1[Client 1] --> Postmaster
    Client2[Client 2] --> Postmaster
    Postmaster --> Backend1[Backend Process 1]
    Postmaster --> Backend2[Backend Process 2]
    Postmaster --> BGWriter[Background Writer]
    Postmaster --> Autovacuum[Autovacuum Launcher]
    Postmaster --> WALWriter[WAL Writer]
    Postmaster --> Checkpointer[Checkpointer]

Postmaster: The main PostgreSQL process; listens for connections and forks backend processes.
Backend Process: One per client connection; executes queries.
Background Workers: Autovacuum, WAL writer, checkpointer, and stats collector.

Key Features of PostgreSQL as an RDBMS

Tables: Data stored in tables with rows (records) and columns (attributes)
Primary Keys: Uniquely identify each record, support composite keys
Foreign Keys: Establish and enforce relationships between tables
Normalization: Organizing data to minimize redundancy
ACID Compliance: Full transactional support with WAL-based durability
Extensibility: Custom types, operators, index methods, and procedural languages

Use Cases of RDBMS (PostgreSQL)

Web Applications and SaaS
- Multi-tenant applications using schema-per-tenant isolation
- JSONB support enables flexible document storage alongside relational data
Financial and Banking Systems
- Strict ACID transactions for account balances and transfers
- Row-level security for compliance
Geospatial Applications
- PostGIS extension provides industry-leading spatial data support
- Used by OpenStreetMap, government mapping agencies
Analytics and Data Warehousing
- Window functions, CTEs, and parallel query execution
- Partitioned tables for large datasets
Content Management Systems
- Full-text search built in (tsvector, tsquery)
- JSONB for flexible content schemas

Benefits of PostgreSQL as an RDBMS

Data Integrity
- Constraints (NOT NULL, UNIQUE, CHECK, EXCLUDE, FOREIGN KEY) enforced at the database level
Data Security
- Role-based access control, row-level security (RLS), column-level permissions
Flexible Querying
- CTEs (Common Table Expressions), window functions, lateral joins, recursive queries
Extensibility
- Custom data types, operators, index access methods (GIN, GiST, BRIN, SP-GiST)
- Procedural languages: PL/pgSQL, PL/Python, PL/Perl, PL/v8
Standards Compliance
- Most SQL-standard-compliant open-source database
Scalability
- Table partitioning, parallel queries, logical and streaming replication

Limitations of RDBMS (and PostgreSQL Solutions)

1. Complexity with Large-Scale Data

Solution:
- Partitioning: Declarative table partitioning by range, list, or hash
- Indexing: B-tree, GIN (full-text, JSONB), GiST (spatial), BRIN (large sequential data)
- Caching: Use pg_bouncer for connection pooling; pair with Redis for hot data

2. Limited Handling of Unstructured Data

Solution:
- JSONB: Native binary JSON storage with indexing (GIN)
- hstore: Key-value pairs within a column
- Large Objects / bytea: Binary data storage
- External storage: Store files in S3/MinIO, keep references in PostgreSQL

3. Schema Rigidity

Solution:
- JSONB columns: Add semi-structured data alongside relational columns
- Migration tools: Use Flyway, Liquibase, or sqitch for controlled schema evolution
- Views: Abstract schema changes from application queries

4. Horizontal Scaling Challenges

Solution:
- Citus extension: Distributed PostgreSQL for horizontal scaling
- Streaming replication: Read replicas for scaling read workloads
- Logical replication: Selective table replication across clusters
- PgBouncer: Connection pooling to handle many concurrent clients

5. Performance Overhead Due to ACID Properties

Solution:
- MVCC (built-in): Non-locking reads, high concurrency by design
- Unlogged tables: Skip WAL for temporary/staging data (data lost on crash)
- COPY command: Bulk loading orders of magnitude faster than row-by-row INSERT
- Parallel queries: PostgreSQL automatically parallelizes scans, joins, and aggregations

Minimal Practice Workflow

Create a dedicated lab database:

CREATE DATABASE pg_lab;
\c pg_lab
SELECT version();

Create sample tables:

CREATE TABLE customers (
  customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name        text NOT NULL,
  email       text NOT NULL UNIQUE,
  created_at  timestamptz DEFAULT now()
);

CREATE TABLE orders (
  order_id    bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  customer_id bigint NOT NULL REFERENCES customers(customer_id),
  total       numeric(10,2) NOT NULL,
  order_date  date NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO customers (name, email) VALUES
  ('Alice', 'alice@example.com'),
  ('Bob', 'bob@company.org');

INSERT INTO orders (customer_id, total) VALUES
  (1, 99.95),
  (1, 249.00),
  (2, 15.50);

SELECT c.name, o.total, o.order_date
FROM customers c
JOIN orders o USING (customer_id)
ORDER BY o.order_date DESC;

Concept Map

flowchart LR
    A[PostgreSQL Cluster] --> B[Databases]
    B --> C[Schemas]
    C --> D[Tables & Views]
    D --> E[Constraints & Keys]
    E --> F[Queries & Transactions]
    F --> G[MVCC & WAL]

Common Pitfalls

Pitfall	Consequence	Prevention
Developing as the `postgres` superuser	Dangerous; mistakes affect the entire cluster	Create dedicated roles with least privilege
Ignoring autovacuum	Table bloat, degraded query plans	Monitor `pg_stat_user_tables` for dead tuple count
Not understanding MVCC	Surprise behavior with concurrent updates	Learn `BEGIN`/`COMMIT`/`ROLLBACK` and isolation levels
Using `SERIAL` instead of `IDENTITY`	Legacy syntax, harder to manage	Use `GENERATED ALWAYS AS IDENTITY` (SQL standard)
Cross-database queries	Not supported in PostgreSQL	Use schemas within one database, or `dblink`/`postgres_fdw`

Quick Reference (psql)

\l        list databases
\c NAME   connect to database
\dn       list schemas
\dt       list tables in current schema
\d+ T     describe table T in detail
\x        toggle expanded output
\timing   show query execution time

What's Next

Previous: Relational Database Concepts — Review the previous lesson to reinforce context.
Next: PostgreSQL vs SQL — Continue to the next concept with incremental complexity.
Section Overview — Return to this section index and choose another related lesson.

What is an RDBMS?​

1. Columns (Fields)​

2. Rows (Records or Tuples)​

Example Table: customers​

PostgreSQL Object Hierarchy​

Key Terms​

Relationships Between Tables​

Key Points About Tables​

Architecture (Under the Hood)​

Core Components​

Process Architecture​

Key Features of PostgreSQL as an RDBMS​

Use Cases of RDBMS (PostgreSQL)​

Benefits of PostgreSQL as an RDBMS​

Limitations of RDBMS (and PostgreSQL Solutions)​

1. Complexity with Large-Scale Data​

2. Limited Handling of Unstructured Data​

3. Schema Rigidity​

4. Horizontal Scaling Challenges​

5. Performance Overhead Due to ACID Properties​

Minimal Practice Workflow​

Concept Map​

Common Pitfalls​

Quick Reference (psql)​

What's Next​