Benchmarked against: Anthropic — Cost Optimization
Rule: Constitution Section 6 + cost-awareness.md
Principle: Minimum cost that gets the job done RIGHT
SuperPortia's cost optimization is not about spending less — it's about spending wisely. The Captain's directive: important tasks done poorly with cheap tools wastes MORE money than doing them right with appropriate tools.
Cost hierarchy
Engine cost tiers
| Tier | Engines | Cost | Use for |
|---|
| Free | Groq | $0 | Random searches, trivial cleanup, unimportant tasks |
| Budget | DeepSeek, Mistral | Cents | Analysis, reasoning, translations |
| Standard | Gemini, Zhipu | Cents-$ | General tasks, search with citations, Chinese NLP |
| Premium | Claude Sonnet | $$ | Strategy analysis, code review |
| Elite | Claude Opus | $$$$ | Architecture, key decisions, code operations |
The search flow (mandatory)
Never use Claude (Opus) for research. Follow the cost-efficient search flow:
| Step | Action | Cost |
|---|
| 1 | search_brain() | Free (already indexed) |
| 2 | search_web(engine="groq") or intel_search() | Free or ~$0.014 |
| 3 | ingest_fragment() | Free (pipeline cost only) |
| 4 | Use the data | Free (from UB) |
Never use WebSearch or WebFetch directly — that costs Opus tokens. Delegate to low-cost engines.
Claude Max Plan billing
Understanding the billing structure prevents costly mistakes:
| Quota | What counts | Shared with |
|---|
| All Models | Claude Opus + Sonnet + Haiku | claude.ai Chat + Claude Code CLI |
| Sonnet Only | Sonnet usage only | Independent — does not consume All Models |
| Extra Usage | Overflow or LiteLLM direct API calls | Separate billing |
Key insight: Claude Code uses the Max Plan "All Models" quota, which is shared with claude.ai Chat. Heavy CLI usage affects Chat availability.
Role-based cost assignment
| Role | Agent | Cost tier | What they do |
|---|
| Chief Engineer | Claude Code | Elite | Architecture, decisions, delegation |
| Executor | Antigravity | Free | Coding, executing WOs |
| Intel Officer | Groq/Gemini/DeepSeek | Budget | External research |
| Courier | cron + bash | Free | Scheduled checks |
| Strategist | Claude AI Chat | Standard | Strategy analysis, reviews |
Token optimization
Context window management
| Technique | Token savings |
|---|
| Progressive Disclosure (PD) | ~5,000 tokens/session |
| Role-based tool assignment | ~4,000 tokens/session |
| On-demand skill loading | ~2,000-4,000 tokens/session |
| Lean boot sequence (2 calls vs 5+) | ~1,500 tokens/session |
Prompt efficiency
| Practice | Impact |
|---|
| Search UB before asking | Avoids re-generating known knowledge |
| Use tables over prose | More compact, same information |
| Reference rules, don't copy | Avoids duplication |
| Ingest results immediately | Never re-search the same thing |
Cost anti-patterns
| Anti-pattern | Why it's wasteful | Better approach |
|---|
| Using Opus for web search | Tokens burned on browsing | Delegate to Groq/Gemini search |
| Loading all skills at startup | 5,600 tokens wasted | Load on demand (PD) |
| Re-searching same topic | Duplicate API/token cost | Ingest to UB first time |
| Using free engine for important tasks | Poor quality leads to rework | Use appropriate engine |
| Long explanations from Opus | Token waste | Be concise, delegate prose |
| Reading entire bulletin board at boot | ~2,000 tokens for maybe-needed info | Read on demand |
Emergency downshift
When quota is running low, activate /brain_lite:
This switches to emergency mode:
- All research goes to Groq (free)
- All analysis goes to DeepSeek (cheap)
- Only code operations use Claude
- Skip non-essential UB reads
- Minimal output format
Brain modes
| Mode | Skill | When |
|---|
| Lite | /brain_lite | Quota emergency, minimal operations |
| Mid | /brain_mid | Standard operations, balanced cost |
| Pro | /brain_pro | Full capabilities, quota available |
Measuring cost effectiveness
| Metric | How to track |
|---|
| WO completion rate | list_work_orders(include_completed=True) |
| Actual vs estimated hours | WO actual_hours field |
| Engine usage distribution | UB tags analysis |
| Rework rate | Rejected WO count |
| UB hit rate | Search-first vs delegate ratio |
Related pages