Rate Limiting | App Dev Guide

Rate limits restrict how many API calls you can make in a given time period. Per second, per minute, per day. Services enforce them to prevent abuse and ensure fair resource distribution. Understanding and respecting rate limits is critical for reliable integrations.

Why Services Enforce Rate Limits

Protecting service stability: one misbehaving client making millions of requests could crash the service. Rate limits prevent this.

Fair resource distribution: shared infrastructure. If one customer uses 90% of capacity, others suffer. Rate limits ensure everyone gets fair access.

Business model: many services charge based on API calls. Rate limits enforce pricing tiers.

Understanding Rate Limits

Services specify limits: 100 requests per minute, 10,000 per day, 1 per second. Limits might be per API key, per user, per endpoint, or global.

Rate limit information is usually in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset. Use these to track your usage.

HTTP 429: Too Many Requests

When you exceed a rate limit, the service returns HTTP 429 Too Many Requests. Your code must detect this and handle it gracefully.

The response includes a Retry-After header telling you how long to wait before retrying. Respect this. If Retry-After says wait 60 seconds, wait 60 seconds.

Implementing Exponential Backoff

When you get a 429, don't retry immediately. Implement exponential backoff: wait, retry, if it fails again wait longer, retry again.

Example: 1 second, 2 seconds, 4 seconds, 8 seconds, 16 seconds. This prevents hammering a struggling service. If many clients are hitting the service, they back off gradually rather than all retrying at once.

Caching API Responses

If data doesn't change frequently, cache it. Store the response and re-use it for requests within a cache window. This drastically reduces API calls.

Example: currency exchange rates change infrequently. Cache them for 1 hour. Requests within that hour use the cache instead of hitting the API.

Batch Requests

Some APIs support batch endpoints. Instead of making 100 requests to fetch 100 items, send a single batch request for all 100. This counts as one request toward your limit.

GraphQL is naturally batched—a single query can fetch many items. REST needs explicit batch endpoints.

Webhook vs Polling

Polling (repeatedly asking "has anything changed?") generates many unnecessary API calls. Webhooks (the service notifies you when something changes) eliminate polling. Use webhooks to reduce API call volume.

Building Your Own Rate Limiting

Protect your own API from abuse. Implement rate limiting for your endpoints. Token bucket algorithm is common: each user/IP gets a quota of requests. Quota refills over time.

Redis is ideal for rate limiting. Increment a counter per user per minute. If counter exceeds limit, return 429. Reset counter every minute.

Rate Limit Tiers

Most APIs offer tiered pricing. Free tier: 100 requests per day. Paid tier: 100,000 per day. Enterprise: unlimited or negotiated.

Know your tier and your limits. If you're approaching limits, upgrade before hitting them. Running into rate limits in production is embarrassing and expensive.

Monitoring Usage

Track API call counts. Monitor growth. When you're using 80% of your limit, alert the team. This gives time to optimize or upgrade before hitting limits.

Some services provide usage dashboards. Use them. Understand your usage patterns.

Negotiating Higher Limits

If you're a significant customer, talk to the service. Explain your use case. Sometimes limits can be raised or you can negotiate custom arrangements.

Warning

Hitting rate limits in production is painful. Design for them upfront. Cache aggressively. Use webhooks instead of polling. Batch requests when possible. Monitor usage. Don't let this surprise you.

Tip

Implement exponential backoff correctly. Many developers get this wrong, retrying too aggressively and making things worse. Use libraries like Tenacity (Python) or retry (Node.js) that handle this correctly.