Indexes and Query Performance
A database index is a separate data structure that makes finding rows fast. Without indexes, every query scans all rows. With proper indexes, the database jumps directly to matching rows. Understanding indexes is the key to responsive applications.
How Indexes Work
Think of a book. To find references to "performance", you could read every page—slow. Or flip to the index, find "performance" with page numbers, and jump there. Databases use B-tree structures for the same purpose. They're sorted, allowing binary search to locate matching values in O(log n) time instead of O(n).
When you query SELECT * FROM users WHERE email = 'user@example.com' without an index, the database checks every row. With an index on email, it uses the index to jump directly to matching rows. For large tables, this is the difference between milliseconds and seconds.
Primary Key Indexes
Every table has a primary key, usually an id column. The database automatically creates an index on the primary key. This is why SELECT * FROM users WHERE id = 5 is always fast—the database has an implicit index.
Single-Column Indexes
Create indexes on columns you frequently filter or sort by. If you often query SELECT * FROM orders WHERE customer_id = ?, add an index on customer_id. If you sort by created_at, add an index there.
Each index consumes storage. More importantly, indexes slow INSERT, UPDATE, and DELETE because the index must be updated. Only index columns that benefit queries you actually run.
Composite Indexes
A composite index spans multiple columns. If you frequently query WHERE status = 'shipped' AND created_at > '2024-01-01', a composite index on (status, created_at) is more efficient than separate indexes. The order matters—the database uses the index left-to-right. An index on (status, created_at) helps queries filtered by status, but might not help queries filtered only by created_at.
When NOT to Add Indexes
Over-indexing is a real problem. Every index has maintenance cost. Indexes bloat storage. Write-heavy tables with many indexes become slow on INSERT and UPDATE. Index the critical query paths, not everything.
Boolean columns rarely benefit from indexes—half the table is true, half is false, so the index doesn't reduce much. Low-cardinality columns (columns with few unique values) often don't justify indexes. High-cardinality columns (like email) justify indexes because the index significantly reduces the rows examined.
EXPLAIN ANALYZE: Understanding Query Execution
EXPLAIN ANALYZE shows what the database actually does with a query. Running EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = ? shows the query plan: does it use an index, how many rows does it examine, how long does it take?
If you see "Seq Scan" (sequential scan), the query scans all rows. If you see "Index Scan", it's using an index. If a query you expect to be fast shows a sequential scan, add an index. This tool is essential for optimization.
Common Performance Problems
N+1 queries: Fetching a list then querying for each item's related data. Solved by JOINs or eager loading in the ORM.
Missing indexes: EXPLAIN ANALYZE reveals this. Add the index and the query becomes fast.
SELECT *: Fetching all columns when you need only a few wastes network bandwidth. Specify the columns you need.
Queries inside loops: If you're looping through results and querying for each item, you're doing N+1. Batch the data instead.
Connection Pooling
Creating a database connection is expensive. Establishing the connection, authenticating, negotiating protocol—it takes milliseconds. In a web application handling thousands of requests, creating a connection for each request is wasteful.
Connection pooling keeps a pool of open connections. Requests reuse connections instead of creating new ones. PgBouncer is the standard connection pool for PostgreSQL. Every production application using PostgreSQL should use connection pooling.
When running on serverless (AWS Lambda, Vercel Functions), connection pooling is even more critical. Each function invocation creates a new process without connection pooling, quickly exhausting the database's connection limit. Use Supabase connection pooling or PgBouncer specifically designed for serverless.
The 80/20 Rule
Most database performance problems are solved by identifying the slowest queries and fixing them. Start by running the slowest query logs. Most applications have a handful of queries causing 80% of the load. Fix those before worrying about micro-optimizations.
Measure first. Don't guess about which queries are slow. Use slow query logs and monitoring to identify actual bottlenecks.