SQL Fundamentals | App Dev Guide

SQL is the universal language for relational databases. Every developer working with relational data needs to understand it, not because they'll write SQL directly in production applications, but because understanding SQL helps you recognize when your ORM is generating inefficient queries and how to optimize them.

Tables and Rows: The Basics

Think of a table like a spreadsheet tab. Each row is a record. Each column has a name and a data type. A users table might have columns: id (integer), email (text), created_at (timestamp). A products table has id, name, price, inventory. Tables are how structured data is organized.

SELECT: Retrieving Data

SELECT is the most common SQL operation. SELECT email, name FROM users retrieves the email and name columns from all rows in the users table. SELECT * FROM users gets all columns. You almost never want SELECT * in production code—specify the columns you need.

WHERE: Filtering Data

Without filtering, you get all rows. WHERE adds constraints. SELECT email FROM users WHERE created_at > '2024-01-01' returns only users created after that date. WHERE active = true AND country = 'US' combines multiple conditions. WHERE email LIKE '%@gmail.com' matches patterns. Filtering pushes work to the database instead of your application code.

JOIN: Combining Tables

Relational data is split across tables to avoid duplication. An order has a customer_id that references a customer in another table. JOIN brings them together. SELECT orders.id, customers.name FROM orders JOIN customers ON orders.customer_id = customers.id gets the order ID and the customer's name in a single result.

There are different types of JOINs. INNER JOIN returns only rows where both tables have a match. LEFT JOIN returns all rows from the left table, with NULL for missing matches on the right. RIGHT JOIN and FULL OUTER JOIN exist but are less common. Understanding JOINs is critical—this is where relational databases demonstrate their power.

INSERT, UPDATE, DELETE: Modifying Data

INSERT INTO users (email, name) VALUES ('user@example.com', 'John') adds a new row. UPDATE users SET active = false WHERE id = 5 modifies existing rows. DELETE FROM users WHERE id = 5 removes rows. These are straightforward but their power lies in WHERE clauses—you can update or delete many rows at once based on conditions.

Indexes: Making Queries Fast

An index is a separate data structure that makes finding rows fast. Think of a book index—instead of reading every page to find references to "scalability," you flip to the index, get page numbers, and jump directly there. Databases use similar structures. Without an index on a WHERE clause, the database scans every row. With an index, it jumps directly to matching rows.

Every table has an implicit index on its primary key (usually id). You create additional indexes on columns you frequently filter by or sort by. SELECT * FROM users WHERE email = 'user@example.com' is fast only if there's an index on email. Indexes slow down INSERT and UPDATE operations—they must be maintained—so they're a tradeoff. Index the right columns, not all of them.

Transactions: Atomic Operations

A transaction groups multiple operations into one atomic unit. Either all succeed or all fail. No halfway states. BEGIN TRANSACTION starts a transaction. COMMIT makes the changes permanent. ROLLBACK undoes them. This is critical for operations like transferring money: subtract from one account, add to another. If the second operation fails, roll back both.

Most modern databases default to autocommit, where each statement is its own transaction. For multi-statement operations, explicitly manage transactions. This is one of the most important properties of relational databases.

Aggregate Functions: Analytics

COUNT, SUM, AVG, MIN, MAX compute aggregate values. SELECT COUNT(*) FROM users counts all users. SELECT AVG(price) FROM products gets the average price. SELECT SUM(quantity) FROM order_items WHERE order_id = 42 totals quantities for one order. GROUP BY groups results by a column: SELECT country, COUNT(*) FROM users GROUP BY country counts users per country.

The N+1 Query Problem

The most common performance problem in applications is the N+1 query. You fetch a list of orders, then for each order, you query the database to get the customer. That's 1 query for the list, then N queries for each order's customer. So for 100 orders, you do 101 queries.

The solution is a single JOIN query that gets orders and customers at once. Or "eager loading" in an ORM, which fetches related data in a separate batch query. Understanding this problem helps you write performant code even when using ORMs.

What Every Developer Needs to Know

You don't need to be a SQL expert. You need to understand basic CRUD (CREATE, READ, UPDATE, DELETE), JOINs across 2-3 tables, why indexes matter, and that transactions exist. You need to recognize when a query is slow and understand why.

ORMs vs Raw SQL

Most applications use an ORM (Object-Relational Mapper) to abstract SQL. Prisma, Drizzle, SQLAlchemy, and others translate method calls into SQL. This is convenient but can hide performance problems. Understanding SQL helps you inspect what an ORM generates and optimize when necessary.

Developer Insight

Always use parameterized queries when writing SQL directly. Never concatenate user input into SQL strings. SELECT * FROM users WHERE email = ? with the email provided as a parameter is safe. SELECT * FROM users WHERE email = '{email}' is vulnerable to SQL injection. ORMs handle this automatically.

Tip

Most people learn SQL by reading documentation while building something real. Don't try to memorize syntax. Build an application, encounter performance problems, and learn SQL to solve them.