Skip to main content

PostgreSQL Core Concepts

Learning Focus

Use this lesson to understand PostgreSQL Core Concepts — the RDBMS architecture, key components, and operational behavior that every PostgreSQL user must know.

What is an RDBMS?

In the context of Relational Database Management Systems (RDBMS), a table is one of the most fundamental structures used to organize and store data. A table consists of rows and columns, where:

1. Columns (Fields)

  • Columns represent specific categories of data in the table. Each column stores data of a particular type such as text, integer, date, or boolean.
  • The name of each column represents the type of information it holds (e.g., customer_id, name, order_date).
  • In PostgreSQL, columns are strongly typed — implicit type coercions are minimal compared to other databases.

2. Rows (Records or Tuples)

  • A row represents a single individual record in the table. Each row contains data corresponding to each column.
  • For example, in a customers table, each row represents a different customer with their data across every column.

Example Table: customers

customer_idcustomer_namecontact_nameaddresscitypostal_codecountry
1Alfreds FutterkisteMaria AndersObere Str. 57Berlin12209Germany
2Ana TrujilloAna TrujilloAvda. 2222Mexico D.F.05021Mexico
3Antonio MorenoAntonio MorenoMataderos 2312Mexico D.F.05023Mexico
  • Columns: customer_id, customer_name, contact_name, address, city, postal_code, country.
  • Rows: Each row represents a unique customer with their data.

PostgreSQL Object Hierarchy

PostgreSQL organizes objects in a unique hierarchy that is important to understand:

graph TD
A[Cluster] --> B[Database 1]
A --> C[Database 2]
B --> D[Schema: public]
B --> E[Schema: app]
D --> F[Tables]
D --> G[Views]
E --> H[Tables]
E --> I[Functions]

Key Terms

TermMeaningWhy It Matters
ClusterOne PostgreSQL server instance (data directory)Hosts multiple databases; managed by one postmaster process
DatabaseLogical container for schemas and objectsApplications usually isolate by database; cross-database queries are not directly supported
SchemaNamespace within a database (default: public)Organizes tables, views, functions; controls privilege boundaries
RoleUser or group identityHandles authentication and authorization
ExtensionPackaged functionality (pg_stat_statements, uuid-ossp, PostGIS)Adds types, functions, indexes, and instrumentation without forking

Relationships Between Tables

  • Relational Databases use tables to store data, and these tables are related through keys.
  • For example, a customers table might be related to an orders table through a foreign key. The customer_id column in the orders table references the customer_id column in the customers table.

Key Points About Tables

  • Primary Key: A column (or set of columns) that uniquely identifies each row. PostgreSQL supports GENERATED ALWAYS AS IDENTITY (SQL standard) and SERIAL (legacy).
  • Foreign Key: A column that links to another table's primary key. PostgreSQL enforces referential integrity strictly.
  • Constraints: Rules applied to columns to ensure data integrity:
    • NOT NULL — column cannot have NULL values
    • UNIQUE — all values must be distinct
    • CHECK — values must satisfy a boolean expression
    • EXCLUDE — PostgreSQL-specific constraint for complex uniqueness rules

Architecture (Under the Hood)

Core Components

ComponentWhat It DoesWhat You Notice
WAL (Write-Ahead Log)Logs all changes before they are written to data filesEnables crash recovery, point-in-time recovery, streaming replication
MVCC (Multi-Version Concurrency Control)Keeps multiple versions of each rowReaders never block writers; writers never block readers
AutovacuumBackground process that cleans up dead tuples and updates statisticsPrevents table bloat, keeps query planner accurate
Query Planner/OptimizerChooses the best execution plan for each queryIndexes, statistics, and table structure all affect plan choice
Shared BuffersIn-memory cache for frequently accessed data pagesProper sizing improves read performance significantly
Background WriterWrites dirty pages from shared buffers to diskReduces checkpoint I/O spikes

Process Architecture

flowchart TB
Client1[Client 1] --> Postmaster
Client2[Client 2] --> Postmaster
Postmaster --> Backend1[Backend Process 1]
Postmaster --> Backend2[Backend Process 2]
Postmaster --> BGWriter[Background Writer]
Postmaster --> Autovacuum[Autovacuum Launcher]
Postmaster --> WALWriter[WAL Writer]
Postmaster --> Checkpointer[Checkpointer]
  • Postmaster: The main PostgreSQL process; listens for connections and forks backend processes.
  • Backend Process: One per client connection; executes queries.
  • Background Workers: Autovacuum, WAL writer, checkpointer, and stats collector.

Key Features of PostgreSQL as an RDBMS

  • Tables: Data stored in tables with rows (records) and columns (attributes)
  • Primary Keys: Uniquely identify each record, support composite keys
  • Foreign Keys: Establish and enforce relationships between tables
  • Normalization: Organizing data to minimize redundancy
  • ACID Compliance: Full transactional support with WAL-based durability
  • Extensibility: Custom types, operators, index methods, and procedural languages

Use Cases of RDBMS (PostgreSQL)

  1. Web Applications and SaaS
    • Multi-tenant applications using schema-per-tenant isolation
    • JSONB support enables flexible document storage alongside relational data
  2. Financial and Banking Systems
    • Strict ACID transactions for account balances and transfers
    • Row-level security for compliance
  3. Geospatial Applications
    • PostGIS extension provides industry-leading spatial data support
    • Used by OpenStreetMap, government mapping agencies
  4. Analytics and Data Warehousing
    • Window functions, CTEs, and parallel query execution
    • Partitioned tables for large datasets
  5. Content Management Systems
    • Full-text search built in (tsvector, tsquery)
    • JSONB for flexible content schemas

Benefits of PostgreSQL as an RDBMS

  1. Data Integrity
    • Constraints (NOT NULL, UNIQUE, CHECK, EXCLUDE, FOREIGN KEY) enforced at the database level
  2. Data Security
    • Role-based access control, row-level security (RLS), column-level permissions
  3. Flexible Querying
    • CTEs (Common Table Expressions), window functions, lateral joins, recursive queries
  4. Extensibility
    • Custom data types, operators, index access methods (GIN, GiST, BRIN, SP-GiST)
    • Procedural languages: PL/pgSQL, PL/Python, PL/Perl, PL/v8
  5. Standards Compliance
    • Most SQL-standard-compliant open-source database
  6. Scalability
    • Table partitioning, parallel queries, logical and streaming replication

Limitations of RDBMS (and PostgreSQL Solutions)

1. Complexity with Large-Scale Data

  • Solution:
    • Partitioning: Declarative table partitioning by range, list, or hash
    • Indexing: B-tree, GIN (full-text, JSONB), GiST (spatial), BRIN (large sequential data)
    • Caching: Use pg_bouncer for connection pooling; pair with Redis for hot data

2. Limited Handling of Unstructured Data

  • Solution:
    • JSONB: Native binary JSON storage with indexing (GIN)
    • hstore: Key-value pairs within a column
    • Large Objects / bytea: Binary data storage
    • External storage: Store files in S3/MinIO, keep references in PostgreSQL

3. Schema Rigidity

  • Solution:
    • JSONB columns: Add semi-structured data alongside relational columns
    • Migration tools: Use Flyway, Liquibase, or sqitch for controlled schema evolution
    • Views: Abstract schema changes from application queries

4. Horizontal Scaling Challenges

  • Solution:
    • Citus extension: Distributed PostgreSQL for horizontal scaling
    • Streaming replication: Read replicas for scaling read workloads
    • Logical replication: Selective table replication across clusters
    • PgBouncer: Connection pooling to handle many concurrent clients

5. Performance Overhead Due to ACID Properties

  • Solution:
    • MVCC (built-in): Non-locking reads, high concurrency by design
    • Unlogged tables: Skip WAL for temporary/staging data (data lost on crash)
    • COPY command: Bulk loading orders of magnitude faster than row-by-row INSERT
    • Parallel queries: PostgreSQL automatically parallelizes scans, joins, and aggregations

Minimal Practice Workflow

Create a dedicated lab database:

CREATE DATABASE pg_lab;
\c pg_lab
SELECT version();

Create sample tables:

CREATE TABLE customers (
customer_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
email text NOT NULL UNIQUE,
created_at timestamptz DEFAULT now()
);

CREATE TABLE orders (
order_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id bigint NOT NULL REFERENCES customers(customer_id),
total numeric(10,2) NOT NULL,
order_date date NOT NULL DEFAULT CURRENT_DATE
);

INSERT INTO customers (name, email) VALUES
('Alice', 'alice@example.com'),
('Bob', 'bob@company.org');

INSERT INTO orders (customer_id, total) VALUES
(1, 99.95),
(1, 249.00),
(2, 15.50);

SELECT c.name, o.total, o.order_date
FROM customers c
JOIN orders o USING (customer_id)
ORDER BY o.order_date DESC;

Concept Map

flowchart LR
A[PostgreSQL Cluster] --> B[Databases]
B --> C[Schemas]
C --> D[Tables & Views]
D --> E[Constraints & Keys]
E --> F[Queries & Transactions]
F --> G[MVCC & WAL]

Common Pitfalls

PitfallConsequencePrevention
Developing as the postgres superuserDangerous; mistakes affect the entire clusterCreate dedicated roles with least privilege
Ignoring autovacuumTable bloat, degraded query plansMonitor pg_stat_user_tables for dead tuple count
Not understanding MVCCSurprise behavior with concurrent updatesLearn BEGIN/COMMIT/ROLLBACK and isolation levels
Using SERIAL instead of IDENTITYLegacy syntax, harder to manageUse GENERATED ALWAYS AS IDENTITY (SQL standard)
Cross-database queriesNot supported in PostgreSQLUse schemas within one database, or dblink/postgres_fdw

Quick Reference (psql)

\l        list databases
\c NAME connect to database
\dn list schemas
\dt list tables in current schema
\d+ T describe table T in detail
\x toggle expanded output
\timing show query execution time

What's Next