Scaling Strategies | App Dev Guide

Your application is successful. Users are multiplying. Traffic is growing. Your single server is dying. How do you handle 10x the traffic? Scaling is about doing more with your infrastructure.

Vertical Scaling

Vertical scaling means upgrading your server: more CPU, more RAM, faster disk. Your single server becomes more powerful.

The appeal: simple. You don't change your code. You don't change your architecture. You just pay AWS more money and get a bigger server.

The limitations: eventually you hit the ceiling. The biggest servers available have 768GB RAM and 128 CPUs. You can't scale beyond that. And you're vulnerable—one server failure means complete outage.

When vertical scaling is appropriate:

You're not yet at the limits (plenty of scaling room)
Your application is monolithic (all features on one server)
You have simple infrastructure needs

For most growing applications, vertical scaling is temporary. You do it for 6-12 months. Eventually you hit limits and must think about horizontal scaling.

Horizontal Scaling

Horizontal scaling means adding more servers. Instead of one powerful server, you have 10 medium servers or 100 small servers. Traffic is distributed across them.

Benefits: no ceiling. You can scale to any size. You can handle failure—if one server dies, the others continue. Cost: you might be more efficient (smaller servers are cheaper per unit).

Challenges: your application must be stateless. If server 1 handles user requests and then fails, user sessions are lost. Your code must not store data on disk—use databases for persistent data. Your code must be idempotent—running the same operation twice should be safe.

But once you solve these, horizontal scaling is powerful. You can handle 10x, 100x, 1000x traffic by adding servers.

Load Balancing

With multiple servers, how does traffic get distributed? A load balancer sits in front and routes requests.

Round-robin: Send request 1 to server 1, request 2 to server 2, request 3 to server 3, request 4 to server 1. Simple but naive. Doesn't account for server load.

Least connections: Send the next request to whichever server has the fewest active connections. Better—avoids sending requests to overloaded servers.

Weighted round-robin: Some servers are more powerful. Give them more traffic. Server 1 gets 40% of traffic, server 2 gets 30%, server 3 gets 30%.

Session stickiness: If a user's request goes to server 1, keep sending their requests to server 1. This is a hack for stateful applications (applications that store session data on the server). Avoid this if possible—it reduces horizontal scaling benefits.

Load balancers are typically hardware appliances, but cloud providers offer software load balancers (AWS ELB, Google Cloud Load Balancing) that scale automatically.

Stateless Applications

Horizontal scaling requires stateless applications. Stateless means the application doesn't store anything that persists beyond the request.

Bad (stateful):

// Store user session on disk
fs.writeFile(`/tmp/session_{userId}.json`, sessionData);

If user is routed to a different server, their session is lost.

Good (stateless):

// Store user session in database or cache
cache.set(`session_{userId}`, sessionData, 1 * 60 * 60);

Now any server can retrieve the session from the database/cache. The server doesn't matter.

Stateless applications are a prerequisite for horizontal scaling.

Database Scaling

Most scaling bottlenecks are the database. You can scale your application servers infinitely, but if the database is overwhelmed, everything slows down.

Read replicas: Copy database data to read-only servers. Direct read queries to replicas, writes to the primary. This distributes read load. Complex because reads might be slightly stale.

Sharding: Split data across multiple database servers. User 1-1000000 on database 1, user 1000001-2000000 on database 2. Massively complex—your application must know which shard to query.

Connection pooling: Database connections are expensive. A pool maintains a limited number of connections and reuses them. Reduces overhead.

Query optimization: Add indexes, optimize slow queries, eliminate N+1 queries. Often the fastest scaling is making queries faster.

Caching: Cache query results in Redis or Memcached. Dramatically reduces database load. Trade-off: stale cache.

Most teams find that good caching handles 80% of scaling needs. Database sharding is extreme and should be last resort.

Caching as a Scaling Strategy

Caching is one of the most effective scaling strategies. Store frequently-accessed data in memory. Retrieval is orders of magnitude faster than querying the database.

Cache types:

In-process cache: Data stored in application memory. Fast but tied to one process.
Redis/Memcached: Dedicated cache service. Multiple applications can access it. Survives application restarts.
CDN: Cache at the edge. Serve static content from servers close to users. Dramatically faster.

Trade-off: cache is stale. Data changes, but cache doesn't. Usually acceptable—data 30 seconds old is better than slow data.

Tip

The 80/20 of scaling: most applications' bottleneck is the database, and most database problems are solved by caching. Before scaling infrastructure, cache aggressively.

Auto-Scaling in Cloud Environments

Cloud providers (AWS, GCP, Azure) offer auto-scaling: automatically add/remove servers based on demand.

Traffic spikes at 2pm. Auto-scaling sees CPU at 80%, starts new servers. By 3pm, load is distributed, CPU is at 60%, everything is fine. At 7pm, traffic drops, auto-scaling shuts down servers to save money.

This is powerful but requires:

Stateless application (servers can be added/removed)
Fast startup time (new server should be serving traffic within seconds)
Good health checks (system must know if a server is healthy)
Load balancer (distribute traffic across servers)

Get these right and scaling becomes automatic and boring—exactly what you want.

Performance Testing Before Scaling

Don't guess about scaling. Test. Load test your application—simulate many users making requests. See where it breaks. Fix the bottleneck. Test again.

Tools: Apache JMeter, Locust, K6. Simulate realistic user behavior. Ramp up traffic gradually. See at what point the system breaks.

Typical findings: database is slow, cache would help. API call to third-party service is slow, add timeout. Frontend is slow, optimize bundle size.

This informed approach is better than guessing.

The 80/20 of Scaling

Most bottlenecks are database-related. Most database problems are solved by:

Optimizing queries (add indexes, eliminate N+1 queries)
Adding caching (Redis for hot data)
Connection pooling (reuse database connections)

These three tactics solve 80% of scaling problems. If these don't work, then consider read replicas, sharding, or other advanced strategies.

Designing for Scale from Day One

Should you design for unlimited scale from the start? Probably not. It adds complexity, slows development, and you might not need it.

But some decisions are hard to change later:

Use stateless applications (easy to parallelize)
Use databases instead of files (easier to distribute)
Plan for read replicas (don't hardcode primary database)
Use caching from the start (easier to add than retrofit)

These don't slow development but prevent painful migration later.

Vertical vs Horizontal Scaling

Name	Cost	Complexity	FailureMode
Vertical Scaling	Expensive per unit, limited ceiling	Simple, no code changes needed	One failure = complete outage
Horizontal Scaling	Cheaper per unit, no ceiling	Complex, requires stateless design	One failure = degraded service

The Reality

Most successful companies start with vertical scaling. It's simple and gets you far. As traffic grows, they migrate to horizontal scaling. This is a one-time cost but necessary for long-term growth.

The database is usually the first bottleneck. Solve it with caching and optimization before scaling application servers.

Don't over-engineer early. Start simple. Scale when you need to. Better to refactor a successful product than to build a scalable product that nobody uses.