Backup & Recovery
Benchmarked against: Anthropic β Zero Data Retention / Data residency Architecture: Cloud UB (D1) + R2 backup + local mirrors Status: Partially implemented β backup strategy defined, automation in progress
Backup and recovery defines how SuperPortia protects its data and recovers from failures. The fleet's knowledge, work orders, and agent communications are critical assets that must survive hardware failures, service outages, and operational errors.
What needs backupβ
| Data | Criticality | Loss impact |
|---|---|---|
| UB entries | Critical | All institutional knowledge lost |
| Work orders | Critical | Task history and audit trail lost |
| WO transitions | High | Compliance audit trail lost |
| Agent messages | Medium | Communication history lost |
| Agent registry | Low | Ephemeral β rebuilt on next heartbeat |
| Source code | Critical | Git provides backup (GitHub) |
| CLAUDE.md + rules | Critical | Git provides backup (GitHub) |
| Vector embeddings | Low | Regenerable from entry content |
Backup strategyβ
Tier 1: Cloud UB (D1 β R2)β
The primary backup path for all Cloud UB data:
| Aspect | Detail |
|---|---|
| Source | Cloud UB D1 (all tables) |
| Destination | Cloudflare R2 bucket |
| Format | JSON export per table |
| Frequency | Daily (planned β currently manual) |
| Retention | All backups kept (R2 storage is cheap) |
| Automation | Cron trigger β Worker endpoint β R2 write |
Tier 2: Local mirrorβ
Each ship maintains a local SQLite copy of UB data:
| Aspect | Detail |
|---|---|
| Source | Cloud UB D1 |
| Destination | Local SQLite on ship filesystem |
| Sync | Currently manual; future: automated periodic sync |
| Purpose | Fast local access + offline resilience |
Tier 3: Git repositoriesβ
Source code and configuration are backed up through standard Git workflow:
| Aspect | Detail |
|---|---|
| Source | Local filesystem |
| Destination | GitHub (private repositories) |
| Trigger | Per-commit push |
| Includes | Code, CLAUDE.md, rules, skills, docs, scripts |
| Excludes | UB data, secrets (.env), large binaries |
Recovery proceduresβ
Scenario 1: Cloud UB D1 data lossβ
Severity: Critical Recovery time: Minutes to hours depending on backup freshness
1. Identify scope of loss (full DB vs specific tables)
2. Locate latest R2 backup
3. Download backup JSON files
4. Import into new D1 database (or restore to existing)
5. Rebuild Vectorize index from entry content
6. Verify with sre_status()
7. Notify all agents to reconnect
Scenario 2: Cloud UB Worker failureβ
Severity: High Recovery time: Minutes
1. Check Cloudflare dashboard for Worker status
2. If code issue: redeploy from Git
- npx wrangler deploy
3. If Cloudflare outage: wait for resolution
4. Local UBI tools remain available for ship-local work
5. Monitor with cloud_ub_health()
Scenario 3: Ship hardware failureβ
Severity: High (for affected ship) Recovery time: Hours to days
1. Other ships continue operating via Cloud UB
2. Replace/repair hardware
3. Clone Git repository to new machine
4. Install dependencies (Node.js, Python, etc.)
5. Configure MCP servers with ship identity
6. Run agent_heartbeat() to register
7. Local UBI rebuilds from Cloud UB sync
Scenario 4: Vectorize index corruptionβ
Severity: Low Recovery time: Minutes
1. Semantic search degrades to keyword-only
2. Rebuild index from D1 entry content
3. Re-embed all entries (batch process)
4. Verify with search_brain() test queries
Scenario 5: Accidental data deletionβ
Severity: Varies Recovery time: Minutes (if caught quickly)
1. UB entries: check R2 backup for deleted entries
2. Work orders: check wo_transitions history
3. Messages: archived messages are soft-deleted, recoverable
4. Source code: git reflog / git checkout
5. If Captain approval was given for deletion: document as intentional
Disaster recovery matrixβ
| Scenario | RTO (Recovery Time) | RPO (Data Loss) | Automated? |
|---|---|---|---|
| D1 data loss | 1β2 hours | Up to 24 hours (last backup) | No β manual |
| Worker failure | 5β15 minutes | Zero (stateless) | Partial β auto-restart |
| Ship hardware failure | Hoursβdays | Zero (Cloud UB has data) | No |
| Vectorize corruption | 30 minutes | Zero (regenerable) | No |
| Network outage | Wait for resolution | Zero | N/A |
| Accidental deletion | Minutes | Varies by backup frequency | No |
Current gapsβ
| Gap | Impact | Planned fix |
|---|---|---|
| No automated D1 β R2 backup | Up to 24h data loss risk | Cron-triggered Worker endpoint |
| No automated local sync | Local mirrors may be stale | Periodic Cloud UB β local sync |
| No backup verification | Backups might be corrupt | Post-backup integrity check |
| No point-in-time recovery | Can only restore to last backup | WAL-based incremental backup |
| No cross-region replication | Single Cloudflare region | Multi-region D1 (when available) |
These gaps are documented as inspection mirror items β known capabilities to build.
Mutual rescue architectureβ
The dual-ship design provides inherent resilience:
Captain quote (2026-02-25): "δΈεζδΈειε―δ»₯ζ" β if one goes down, the other can rescue.
Related pagesβ
| Page | Relationship |
|---|---|
| Data Residency | Where data lives |
| Fleet Management | Fleet architecture |
| SRE Status | Health monitoring |
| Cloud UB MCP | Cloud UB details |
| Company Constitution | Β§1 Knowledge goes to UB only |