Background Jobs and Async Processing

Some work doesn't belong in the request-response cycle. Sending an email, processing a video, generating a report, syncing with external APIs—these take seconds or minutes. A user shouldn't wait for them.

Background jobs solve this. The request triggers a job, returns immediately to the user, and the job processes asynchronously. The user gets a response in milliseconds instead of waiting for seconds.

Why Some Work Needs To Happen In The Background

Speed: Users expect responses in milliseconds. Background work takes seconds/minutes.
Reliability: A web request might timeout. A background job can retry.
Scalability: Some work is expensive and peaks at certain times. Processing it in the background spreads the load.
User experience: "Your order has been received, we'll send a confirmation email shortly" is better than "wait while I send the email..."

Job Queues: The Architecture

A job queue is a data structure holding work to be done. Producers add jobs. Workers consume jobs, process them, mark them complete.

Flow:

Web request triggers a job (e.g., "send email to user@example.com")
Job is added to the queue
Request returns immediately to the user
A worker process picks up the job from the queue
Worker processes the job (sends email, generates report, etc.)
Worker marks job complete or adds to retry queue if failed

This decouples producing work from processing it. The web app doesn't care when the job is processed. Workers can be scaled independently.

Queue Technologies

BullMQ (Node.js): Job queue built on Redis. Simple, reliable, good DX.

Celery (Python): Distributed task queue. Works with Redis or RabbitMQ as broker. Industry standard for Python.

Sidekiq (Ruby): Job queue for Rails, Redis-backed. Simple and reliable.

Database-backed queues: PostgreSQL, MySQL can be job queues (slightly higher latency but simpler architecture).

For most applications, Redis-backed queues are the right choice. Redis is fast and fits the use case well.

Job Idempotency: The Key Principle

Jobs may be retried if they fail. They might be processed twice if something goes wrong. For this to be safe, jobs must be idempotent: running them multiple times has the same effect as running once.

Good: "send welcome email to user 123". Sending twice sends two emails (not ideal, but the job itself is idempotent—there's no inconsistency).

Bad: "increment user 123's credit by 5". Processing twice increments twice (inconsistency). The job must track whether it's already been processed.

Design jobs to be idempotent. Use unique identifiers (job ID) to detect retries.

Cron Jobs: Scheduled Recurring Work

Some work should happen on a schedule: send daily digest emails, clean up expired records, sync data from external sources. These are cron jobs.

Tools: cron (Unix), APScheduler (Python), node-cron (Node.js), or queue systems with scheduled jobs (BullMQ, Celery support scheduling).

For reliability, avoid OS-level cron if possible. Use your application's job queue with scheduling. That way you can monitor failures, retry, log, etc.

Job Retries and Dead Letter Queues

Jobs fail sometimes. Network issues, external service outages, bad data. Good job queues automatically retry failed jobs.

Exponential backoff: first retry after 1 second, then 10 seconds, then 100 seconds. This prevents hammering failing services.

Dead letter queue: jobs that fail repeatedly go here for inspection. You can investigate why they failed and manually retry.

Monitoring and Alerting

A background job fails silently—the user doesn't see the error. You must monitor: are jobs being processed? Are many failing?

Track: job counts, success/failure rates, processing time. Alert if failure rate spikes (more than 5% failing is unusual) or dead letter queue grows.

Without monitoring, you might not realize jobs aren't processing until users complain about missing emails or unprocessed data.

Warning

Invisible failures are dangerous. Always monitor background job systems. Know when jobs fail, why, and fix them quickly.

Complexity vs Benefit

Job queues add complexity: another piece of infrastructure to run, monitor, and debug. Don't add them prematurely.

Do you need a job queue? Ask: is there work that takes more than a second or so? Work that fails and needs retry? Work that shouldn't block the user? If yes, add a queue.

For simple applications, you might not need one. As you grow, inevitably you do.

The Principle

Some work doesn't fit the request-response cycle. Job queues decouple producing work from processing it, enabling better user experience, reliability, and scalability. For any non-trivial application, job queues are essential.