Backup & Recovery | App Dev Guide

Backups are non-negotiable. Hardware fails. Humans make mistakes. Ransomware encrypts data. Software has bugs that corrupt data. Data loss is not if but when. The question is whether you're prepared when it happens.

RTO and RPO: Defining Recovery Requirements

RTO (Recovery Time Objective) is how long you can be down. If your application is down for 1 hour, how much business impact? If you're down for 1 day? RTO determines your recovery infrastructure.

RPO (Recovery Point Objective) is how much data you can afford to lose. If you can recover to this morning's backup, RPO is 24 hours. If you need to recover to the last 5 minutes, RPO is 5 minutes. RPO determines backup frequency.

Most applications can tolerate 1-hour RTO and 1-hour RPO. A critical payment processor might require 15-minute RTO and 5-minute RPO. Know your numbers and design backups accordingly.

Backup Types

Full backup: a complete copy of the database. Easy to understand, restores everything, but large and slow to create. Run weekly.

Incremental backup: only changes since the last backup. Smaller, faster, but you need the full backup plus all incrementals to restore. Run daily.

Continuous WAL archiving: PostgreSQL's Write-Ahead Log (WAL) records every change. Archive WAL segments continuously. Restore to any point in time with PITR (point-in-time recovery). Powerful but requires expertise to set up.

Managed Database Backup

Services like Supabase, RDS, and Neon handle backups automatically. They run regular backups, archive them securely, and enable restores through a UI. This is the right choice for most projects.

Know your retention period. RDS retains 7 days by default. If you need 30 days, configure it. Know how to restore. Have tested the restore process.

Testing Recovery

A backup you haven't tested is theoretical. Run recovery drills quarterly. Actually restore a backup to a test environment. Verify the data is there and correct. This catches problems early.

If you discover during a real failure that your backup is corrupted or your restore process is broken, you're in crisis mode. Testing prevents this.

Backup Storage

Backups should not be in the same location as your primary database. If your AWS region fails, backups in the same region are useless. Store backups in another region or even another cloud provider.

If your AWS account is compromised, backups in the same account are vulnerable. Store critical backups in a separate AWS account.

Backup Security

Backups contain sensitive data. They must be encrypted. Access must be restricted. Only the team needing recovery access should have it.

Managed services handle encryption automatically. Ensure you're not disabling it to save money.

Application-Level Backups

Database backups protect the database. They don't protect files in S3, third-party integrations, or configuration. Your disaster recovery plan must include all parts of your system.

Files in S3: maintain backups in another bucket or region. Third-party integrations: export or re-sync data. Configuration: store in version control.

The Disaster Recovery Plan

Document your disaster recovery process. If the primary database is corrupted and you have 15 minutes to recover, someone needs to know exactly what to do. The plan should be written, accessible without the systems it describes, and regularly reviewed.

Who does what? Developer on-call restores the database. Ops engineer updates DNS. Product manager notifies customers. Communications manager posts status. Document this.

PITR: Point-In-Time Recovery

Point-in-time recovery means you can restore the database to any moment in the past, not just the last backup checkpoint. PostgreSQL's WAL archiving enables this.

If a bad migration corrupts data at 2:00 PM, you recover to 1:59 PM. If someone accidentally runs DELETE FROM users, you recover to before the deletion.

PITR requires continuous WAL archiving and storage. This adds operational complexity but is invaluable for recovering from accidents.

Automated Backup Verification

Run automated tests that restore backups periodically. Alert if restoration fails. This catches backup corruption early, not when you need the backup.

Backup Retention and Compliance

Some regulations require retaining backups for years. HIPAA requires 6 years. GDPR requires handling deletion requests, which is complex with old backups. Know your legal requirements and configure retention accordingly.

Warning

"Backup" means nothing without "tested restore." You have no backup if you can't restore from it. Test restores regularly. Automate the testing. Alert when restore fails.

Developer Insight

For new projects, use a managed database with automatic backups. You get backups, encryption, geographic redundancy, and restore capability without building infrastructure. This is the right default choice.