Skip to main content

Backup & Recovery

Benchmarked against: Anthropic β€” Zero Data Retention / Data residency Architecture: Cloud UB (D1) + R2 backup + local mirrors Status: Partially implemented β€” backup strategy defined, automation in progress

Backup and recovery defines how SuperPortia protects its data and recovers from failures. The fleet's knowledge, work orders, and agent communications are critical assets that must survive hardware failures, service outages, and operational errors.


What needs backup​

DataCriticalityLoss impact
UB entriesCriticalAll institutional knowledge lost
Work ordersCriticalTask history and audit trail lost
WO transitionsHighCompliance audit trail lost
Agent messagesMediumCommunication history lost
Agent registryLowEphemeral β€” rebuilt on next heartbeat
Source codeCriticalGit provides backup (GitHub)
CLAUDE.md + rulesCriticalGit provides backup (GitHub)
Vector embeddingsLowRegenerable from entry content

Backup strategy​

Tier 1: Cloud UB (D1 β†’ R2)​

The primary backup path for all Cloud UB data:

AspectDetail
SourceCloud UB D1 (all tables)
DestinationCloudflare R2 bucket
FormatJSON export per table
FrequencyDaily (planned β€” currently manual)
RetentionAll backups kept (R2 storage is cheap)
AutomationCron trigger β†’ Worker endpoint β†’ R2 write

Tier 2: Local mirror​

Each ship maintains a local SQLite copy of UB data:

AspectDetail
SourceCloud UB D1
DestinationLocal SQLite on ship filesystem
SyncCurrently manual; future: automated periodic sync
PurposeFast local access + offline resilience

Tier 3: Git repositories​

Source code and configuration are backed up through standard Git workflow:

AspectDetail
SourceLocal filesystem
DestinationGitHub (private repositories)
TriggerPer-commit push
IncludesCode, CLAUDE.md, rules, skills, docs, scripts
ExcludesUB data, secrets (.env), large binaries

Recovery procedures​

Scenario 1: Cloud UB D1 data loss​

Severity: Critical Recovery time: Minutes to hours depending on backup freshness

1. Identify scope of loss (full DB vs specific tables)
2. Locate latest R2 backup
3. Download backup JSON files
4. Import into new D1 database (or restore to existing)
5. Rebuild Vectorize index from entry content
6. Verify with sre_status()
7. Notify all agents to reconnect

Scenario 2: Cloud UB Worker failure​

Severity: High Recovery time: Minutes

1. Check Cloudflare dashboard for Worker status
2. If code issue: redeploy from Git
- npx wrangler deploy
3. If Cloudflare outage: wait for resolution
4. Local UBI tools remain available for ship-local work
5. Monitor with cloud_ub_health()

Scenario 3: Ship hardware failure​

Severity: High (for affected ship) Recovery time: Hours to days

1. Other ships continue operating via Cloud UB
2. Replace/repair hardware
3. Clone Git repository to new machine
4. Install dependencies (Node.js, Python, etc.)
5. Configure MCP servers with ship identity
6. Run agent_heartbeat() to register
7. Local UBI rebuilds from Cloud UB sync

Scenario 4: Vectorize index corruption​

Severity: Low Recovery time: Minutes

1. Semantic search degrades to keyword-only
2. Rebuild index from D1 entry content
3. Re-embed all entries (batch process)
4. Verify with search_brain() test queries

Scenario 5: Accidental data deletion​

Severity: Varies Recovery time: Minutes (if caught quickly)

1. UB entries: check R2 backup for deleted entries
2. Work orders: check wo_transitions history
3. Messages: archived messages are soft-deleted, recoverable
4. Source code: git reflog / git checkout
5. If Captain approval was given for deletion: document as intentional

Disaster recovery matrix​

ScenarioRTO (Recovery Time)RPO (Data Loss)Automated?
D1 data loss1–2 hoursUp to 24 hours (last backup)No β€” manual
Worker failure5–15 minutesZero (stateless)Partial β€” auto-restart
Ship hardware failureHours–daysZero (Cloud UB has data)No
Vectorize corruption30 minutesZero (regenerable)No
Network outageWait for resolutionZeroN/A
Accidental deletionMinutesVaries by backup frequencyNo

Current gaps​

GapImpactPlanned fix
No automated D1 β†’ R2 backupUp to 24h data loss riskCron-triggered Worker endpoint
No automated local syncLocal mirrors may be stalePeriodic Cloud UB β†’ local sync
No backup verificationBackups might be corruptPost-backup integrity check
No point-in-time recoveryCan only restore to last backupWAL-based incremental backup
No cross-region replicationSingle Cloudflare regionMulti-region D1 (when available)

These gaps are documented as inspection mirror items β€” known capabilities to build.


Mutual rescue architecture​

The dual-ship design provides inherent resilience:

Captain quote (2026-02-25): "δΈ€ε€‹ζŽ›δΈ€ε€‹ι‚„ε―δ»₯ζ•‘" β€” if one goes down, the other can rescue.


PageRelationship
Data ResidencyWhere data lives
Fleet ManagementFleet architecture
SRE StatusHealth monitoring
Cloud UB MCPCloud UB details
Company ConstitutionΒ§1 Knowledge goes to UB only