Disaster Recovery
This guide covers how to recover from data loss, corruption, or accidental deletion across all Pixelflare data stores.
D1 Database
The D1 database stores all metadata including users, images, albums, tags, analytics, API keys, webhooks, and more. Cloudflare provides automatic point-in-time recovery via Time Travel.
Native protection: Time Travel keeps 30 days of history (7 days on free plan) with no additional cost.
Restoring from Time Travel
If you need to roll back the database to a previous state:
Check recent activity to identify when the issue occurred:
bashwrangler d1 execute pixflare-production-db --remote --command "SELECT created_at, action, resource_type FROM audit_logs ORDER BY created_at DESC LIMIT 20"Restore to a specific timestamp (last known good state):
bashwrangler d1 time-travel restore pixflare-production-db --before-timestamp="2025-01-23T10:30:00Z"Verify the restoration worked:
bashwrangler d1 execute pixflare-production-db --remote --command "SELECT COUNT(*) FROM images"
Notes:
- The restore operation is destructive and overwrites the database in place
- Wrangler will provide a bookmark so you can undo the restore if needed
- You can use Unix timestamps or RFC3339 format
- Database will be briefly unavailable during restore
Limitations
Time Travel cannot export databases with FTS5 virtual tables (Pixelflare uses FTS5 for search). If you need to export for migration:
Drop the FTS5 table temporarily:
bashwrangler d1 execute pixflare-production-db --remote --command "DROP TABLE images_fts"Export the database:
bashwrangler d1 export pixflare-production-db --remote --output=backup.sqlRecreate FTS5 by running the search migration:
bashwrangler d1 migrations apply pixflare-production-db --remote
Long-term backups
// TODO: Automated nightly SQL exports to R2 for retention beyond 30 days
R2 Bucket (Image Files)
The R2 bucket stores all original images and generated variants. Cloudflare does not provide native versioning or automated backups for R2.
Current protection: User-facing S3 backup feature (see S3 Backups)
User S3 backups
Users can configure their own S3-compatible storage to automatically sync images. If your R2 bucket is lost, users can restore from their individual backups.
Limitations:
- Only covers users who enabled S3 backups
- Images uploaded before backup was enabled won't exist
- Users must restore to their own accounts (not centralized)
Platform-level R2 backup
// TODO: Configure platform-level backup account that syncs all images to external S3/Backblaze storage for disaster recovery
Recovering from R2 data loss
// TODO: Document restoration procedure from external backup to R2 bucket
KV Namespace (Cache and Secrets)
KV stores image metadata cache, encrypted S3 credentials, and Tenant Master Keys (TMKs) for encryption.
Native protection: None. Cloudflare does not provide KV backup tools.
Critical data in KV
tmk:*- Tenant Master Keys for image encryption (critical - cannot decrypt images without these)s3_creds:*- User S3 backup credentials (encrypted)- Image metadata cache (can be regenerated, not critical)
Backing up KV
// TODO: Automated daily export of critical KV keys to R2
Recovering KV data
// TODO: Document restoration procedure for TMKs and encrypted credentials
Analytics Engine
Analytics data is written to Cloudflare's Analytics Engine and aggregated nightly to D1 tables (analytics_daily, analytics_monthly).
Protection: Covered by D1 Time Travel (see above)
Notes:
- Raw Analytics Engine data is retained for 90 days by Cloudflare
- Aggregated data in D1 can be restored using Time Travel
- If you restore D1 to a previous date, you'll lose aggregated analytics after that date
- Raw analytics for the lost period can be re-aggregated by re-running the cron job
Re-aggregating analytics
If analytics data is lost or corrupted:
// TODO: Document manual re-aggregation procedure from Analytics Engine to D1
Workers AI & Vectorize
AI classification results (labels, captions) are stored in the D1 image_ai table. Image embeddings for semantic search are stored in Vectorize.
Protection for AI classifications: Covered by D1 Time Travel
Protection for Vectorize embeddings: // TODO: Research Vectorize backup capabilities
Recovering AI data
If AI classifications are lost, they can be regenerated by re-running Workers AI on existing images:
// TODO: Document bulk AI re-classification procedure
If Vectorize embeddings are lost:
// TODO: Document embedding regeneration procedure
Queues
Cloudflare Queues handle background processing (variant generation, backups, webhooks, custom domains). Queue messages are transient and do not need backup.
No backup needed: Messages are processed and removed. Failed messages are retried automatically.
If queues are down or cleared:
- Pending variant generation can be re-triggered manually
- Backup sync will run on next cron (3 AM UTC)
- Webhooks will be skipped (deliveries are not retried after queue loss)
Infrastructure (Terraform State)
All Cloudflare resources are managed by Terraform. The terraform.tfstate file is critical for disaster recovery.
Backing up Terraform state
If using local state (not recommended for production):
- Copy terraform.tfstate to a secure location after each deployment
- Consider using remote state (Terraform Cloud or S3 backend)
If using remote state:
- Terraform Cloud: Automatic versioning and backups
- S3 backend: Enable versioning on the state bucket
Recovering infrastructure
If your Cloudflare account is lost or resources are deleted:
- Provision new Cloudflare account
- Update terraform.tfvars with new account ID
- Run terraform apply to recreate all resources
- Restore data from backups (see sections above)
- Update DNS records to point to new Workers
Recovery Time Objectives (RTO)
Estimated time to restore from various failure scenarios:
- Accidental deletion (within 30 days): 15 minutes (D1 Time Travel)
- Database corruption: 2-4 hours (restore D1, verify, test)
- Complete account loss: 4-8 hours (recreate infrastructure, restore all data stores)
- Migration to new provider: Days to weeks (manual migration)
Recovery Point Objectives (RPO)
Maximum acceptable data loss for each scenario:
- D1 database: Point-in-time (within 30 days)
- R2 images: Depends on user backup frequency (usually 24 hours)
- KV data: // TODO: Will be 24 hours once automated backups are implemented
- Analytics: 24 hours (nightly aggregation)
Testing Recovery Procedures
// TODO: Schedule quarterly disaster recovery drills to verify all procedures work