Cost Optimization Guide

Benchmarked against: Anthropic — Cost Optimization Rule: Constitution Section 6 + cost-awareness.md Principle: Minimum cost that gets the job done RIGHT

SuperPortia's cost optimization is not about spending less — it's about spending wisely. The Captain's directive: important tasks done poorly with cheap tools wastes MORE money than doing them right with appropriate tools.

Cost hierarchy

Engine cost tiers

Tier	Engines	Cost	Use for
Free	Groq	$0	Random searches, trivial cleanup, unimportant tasks
Budget	DeepSeek, Mistral	Cents	Analysis, reasoning, translations
Standard	Gemini, Zhipu	Cents-$	General tasks, search with citations, Chinese NLP
Premium	Claude Sonnet	$$	Strategy analysis, code review
Elite	Claude Opus	$$$$	Architecture, key decisions, code operations

The search flow (mandatory)

Never use Claude (Opus) for research. Follow the cost-efficient search flow:

Step	Action	Cost
1	`search_brain()`	Free (already indexed)
2	`search_web(engine="groq")` or `intel_search()`	Free or ~$0.014
3	`ingest_fragment()`	Free (pipeline cost only)
4	Use the data	Free (from UB)

Never use WebSearch or WebFetch directly — that costs Opus tokens. Delegate to low-cost engines.

Claude Max Plan billing

Understanding the billing structure prevents costly mistakes:

Quota	What counts	Shared with
All Models	Claude Opus + Sonnet + Haiku	claude.ai Chat + Claude Code CLI
Sonnet Only	Sonnet usage only	Independent — does not consume All Models
Extra Usage	Overflow or LiteLLM direct API calls	Separate billing

Key insight: Claude Code uses the Max Plan "All Models" quota, which is shared with claude.ai Chat. Heavy CLI usage affects Chat availability.

Role-based cost assignment

Role	Agent	Cost tier	What they do
Chief Engineer	Claude Code	Elite	Architecture, decisions, delegation
Executor	Antigravity	Free	Coding, executing WOs
Intel Officer	Groq/Gemini/DeepSeek	Budget	External research
Courier	cron + bash	Free	Scheduled checks
Strategist	Claude AI Chat	Standard	Strategy analysis, reviews

Token optimization

Context window management

Technique	Token savings
Progressive Disclosure (PD)	~5,000 tokens/session
Role-based tool assignment	~4,000 tokens/session
On-demand skill loading	~2,000-4,000 tokens/session
Lean boot sequence (2 calls vs 5+)	~1,500 tokens/session

Prompt efficiency

Practice	Impact
Search UB before asking	Avoids re-generating known knowledge
Use tables over prose	More compact, same information
Reference rules, don't copy	Avoids duplication
Ingest results immediately	Never re-search the same thing

Cost anti-patterns

Anti-pattern	Why it's wasteful	Better approach
Using Opus for web search	Tokens burned on browsing	Delegate to Groq/Gemini search
Loading all skills at startup	5,600 tokens wasted	Load on demand (PD)
Re-searching same topic	Duplicate API/token cost	Ingest to UB first time
Using free engine for important tasks	Poor quality leads to rework	Use appropriate engine
Long explanations from Opus	Token waste	Be concise, delegate prose
Reading entire bulletin board at boot	~2,000 tokens for maybe-needed info	Read on demand

Emergency downshift

When quota is running low, activate /brain_lite:

/brain_lite

This switches to emergency mode:

All research goes to Groq (free)
All analysis goes to DeepSeek (cheap)
Only code operations use Claude
Skip non-essential UB reads
Minimal output format

Brain modes

Mode	Skill	When
Lite	`/brain_lite`	Quota emergency, minimal operations
Mid	`/brain_mid`	Standard operations, balanced cost
Pro	`/brain_pro`	Full capabilities, quota available

Measuring cost effectiveness

Metric	How to track
WO completion rate	`list_work_orders(include_completed=True)`
Actual vs estimated hours	WO `actual_hours` field
Engine usage distribution	UB tags analysis
Rework rate	Rejected WO count
UB hit rate	Search-first vs delegate ratio

Page	Relationship
Choosing an Engine	Engine selection guide
Pricing	Engine cost structure
Engine Overview	Full engine catalog
Token Counting	Understanding token costs

Cost hierarchy​

Engine cost tiers​

The search flow (mandatory)​

Claude Max Plan billing​

Role-based cost assignment​

Token optimization​

Context window management​

Prompt efficiency​

Cost anti-patterns​

Emergency downshift​

Brain modes​

Measuring cost effectiveness​

Related pages​