Skip to main content

Cost Optimization Guide

Benchmarked against: Anthropic — Cost Optimization Rule: Constitution Section 6 + cost-awareness.md Principle: Minimum cost that gets the job done RIGHT

SuperPortia's cost optimization is not about spending less — it's about spending wisely. The Captain's directive: important tasks done poorly with cheap tools wastes MORE money than doing them right with appropriate tools.


Cost hierarchy


Engine cost tiers

TierEnginesCostUse for
FreeGroq$0Random searches, trivial cleanup, unimportant tasks
BudgetDeepSeek, MistralCentsAnalysis, reasoning, translations
StandardGemini, ZhipuCents-$General tasks, search with citations, Chinese NLP
PremiumClaude Sonnet$$Strategy analysis, code review
EliteClaude Opus$$$$Architecture, key decisions, code operations

The search flow (mandatory)

Never use Claude (Opus) for research. Follow the cost-efficient search flow:

StepActionCost
1search_brain()Free (already indexed)
2search_web(engine="groq") or intel_search()Free or ~$0.014
3ingest_fragment()Free (pipeline cost only)
4Use the dataFree (from UB)

Never use WebSearch or WebFetch directly — that costs Opus tokens. Delegate to low-cost engines.


Claude Max Plan billing

Understanding the billing structure prevents costly mistakes:

QuotaWhat countsShared with
All ModelsClaude Opus + Sonnet + Haikuclaude.ai Chat + Claude Code CLI
Sonnet OnlySonnet usage onlyIndependent — does not consume All Models
Extra UsageOverflow or LiteLLM direct API callsSeparate billing

Key insight: Claude Code uses the Max Plan "All Models" quota, which is shared with claude.ai Chat. Heavy CLI usage affects Chat availability.


Role-based cost assignment

RoleAgentCost tierWhat they do
Chief EngineerClaude CodeEliteArchitecture, decisions, delegation
ExecutorAntigravityFreeCoding, executing WOs
Intel OfficerGroq/Gemini/DeepSeekBudgetExternal research
Couriercron + bashFreeScheduled checks
StrategistClaude AI ChatStandardStrategy analysis, reviews

Token optimization

Context window management

TechniqueToken savings
Progressive Disclosure (PD)~5,000 tokens/session
Role-based tool assignment~4,000 tokens/session
On-demand skill loading~2,000-4,000 tokens/session
Lean boot sequence (2 calls vs 5+)~1,500 tokens/session

Prompt efficiency

PracticeImpact
Search UB before askingAvoids re-generating known knowledge
Use tables over proseMore compact, same information
Reference rules, don't copyAvoids duplication
Ingest results immediatelyNever re-search the same thing

Cost anti-patterns

Anti-patternWhy it's wastefulBetter approach
Using Opus for web searchTokens burned on browsingDelegate to Groq/Gemini search
Loading all skills at startup5,600 tokens wastedLoad on demand (PD)
Re-searching same topicDuplicate API/token costIngest to UB first time
Using free engine for important tasksPoor quality leads to reworkUse appropriate engine
Long explanations from OpusToken wasteBe concise, delegate prose
Reading entire bulletin board at boot~2,000 tokens for maybe-needed infoRead on demand

Emergency downshift

When quota is running low, activate /brain_lite:

/brain_lite

This switches to emergency mode:

  • All research goes to Groq (free)
  • All analysis goes to DeepSeek (cheap)
  • Only code operations use Claude
  • Skip non-essential UB reads
  • Minimal output format

Brain modes

ModeSkillWhen
Lite/brain_liteQuota emergency, minimal operations
Mid/brain_midStandard operations, balanced cost
Pro/brain_proFull capabilities, quota available

Measuring cost effectiveness

MetricHow to track
WO completion ratelist_work_orders(include_completed=True)
Actual vs estimated hoursWO actual_hours field
Engine usage distributionUB tags analysis
Rework rateRejected WO count
UB hit rateSearch-first vs delegate ratio

PageRelationship
Choosing an EngineEngine selection guide
PricingEngine cost structure
Engine OverviewFull engine catalog
Token CountingUnderstanding token costs