Schema Design | App Dev Guide

Schema design is the practice of deciding what tables exist, what columns they have, what types those columns are, and how tables relate to each other. Good schema design prevents data quality problems and makes queries efficient. Bad design causes headaches for years.

Normalisation: The Core Principle

Normalisation is the practice of organizing data to reduce redundancy. Store a customer's address in one place, not in every order record. Store a product's price once, not duplicated across every order line item.

The normal forms (1NF, 2NF, 3NF) are formal definitions of progressively stricter normalization. You don't need to memorize them. The goal is: eliminate redundancy, ensure each column depends on the primary key, avoid data anomalies.

A redundant schema creates problems. Update a customer's address in one order record but forget another. Now you have inconsistent data. Normalized schemas prevent this by storing the data once.

Foreign Keys and Referential Integrity

A foreign key is a column that references a row in another table. An order has a customer_id that references a customer. A foreign key constraint ensures that the referenced record actually exists. You can't create an order for a non-existent customer.

This prevents orphaned data—records that reference nothing. Without foreign key constraints, you can delete a customer and leave orders dangling, referencing an ID that no longer exists. Foreign keys enforce data integrity at the database level.

Denormalisation: A Strategic Tradeoff

Normalisation is good until it's not. Excessively normalised schemas require many JOINs. SELECT order_value FROM orders JOIN order_items ON orders.id = order_items.order_id becomes complex with many tables. Sometimes you store calculated or redundant data for performance. This is denormalisation—intentional redundancy for speed.

An order might store total_amount even though you could compute it from order_items. This is denormalisation. It's a tradeoff: slightly more storage and update complexity for faster queries. Use it when you've measured that queries are slow, not speculatively.

Common Schema Mistakes

Using strings for IDs: If a record references another, use integers or UUIDs, not strings. Storing arrays in a single column: An order has many line items. Store them in a separate order_items table, not as a comma-separated string in the order row. Generic column names like value1, value2: These are meaningless. Name columns after what they contain.

Not tracking time: Every table should have created_at and updated_at timestamps. These are invaluable for debugging, auditing, and understanding when data changed. They're so common that most ORMs create them automatically.

Soft Deletes: Preserving History

Hard delete (DELETE FROM orders) removes rows permanently. Soft delete marks a row as deleted without removing it: UPDATE orders SET deleted_at = now() WHERE id = 5. The row stays in the database, archived.

Soft deletes preserve history and enable recovery from accidents. The tradeoff is complexity—most queries need WHERE deleted_at IS NULL to exclude deleted rows. For production applications, especially those with compliance requirements, soft deletes are often worth the overhead.

UUID vs Serial Integer IDs

Primary keys can be serial integers (1, 2, 3...) or UUIDs (globally unique identifiers). Integers are smaller and faster to index. UUIDs are globally unique and safe to expose in URLs—you don't leak how many users exist by seeing the IDs.

For internal databases, integers are fine. For user-facing IDs in URLs, UUIDs prevent enumeration attacks. PostgreSQL has uuid type and can generate them with the uuid-ossp extension. Use what's appropriate for your use case.

Enums vs Reference Tables

An order status is "pending", "shipped", or "delivered". You could store status as a string. Or create an order_statuses table with id and name, and store status_id in orders. The latter enforces valid values.

PostgreSQL has ENUM types, which are a middle ground. They're more efficient than reference tables but harder to modify later. For values that never change, ENUM works. For values that might—adding a new order status—use a reference table.

Relationships: One-to-Many, Many-to-Many

One customer has many orders: orders table has customer_id. Many students attend many courses: you need a enrollments table with student_id and course_id. This junction table enables many-to-many relationships.

When designing, ask: how many of these relates to how many of those? This determines the table structure and indexes you need.

Nullable vs Not Null

Columns can allow NULL (missing values) or require a value with NOT NULL. A phone number might be NULL if a user doesn't provide one. An email address might be required.

Nullability affects queries. NULL behaves differently than other values in SQL—NULL = NULL returns nothing. If you have optional data, allow NULL. If data is required, make the column NOT NULL to enforce it at the database level.

Designing for Growth

Good schema design anticipates change without over-engineering. Use proper types. Use foreign keys. Track timestamps. Use meaningful column names. Avoid tricks. Rename columns later if needed—that's what migrations are for.

Warning

Changing schemas in production requires care. Renaming a column or changing its type is not atomic. Use migrations with multiple steps to avoid downtime. Add new column, backfill data, update application, remove old column.

Tip

Most schema mistakes are discovered after you've written application code that depends on the schema. Start simple and normalize as you understand the data. Don't over-engineer upfront.