Skip to main content

Token Counting & Cost

Benchmarked against: Anthropic โ€” Token counting Rule source: Company Constitution ยง6, Cost Awareness Status: Partially implemented โ€” baselines measured, per-session tracking planned

Tokens are the fundamental unit of AI engine consumption. Every word, every tool description, every UB search result consumes tokens โ€” and tokens cost money. Understanding token usage is essential for cost management.


What is a token?โ€‹

A token is roughly 4 characters of English text, or about 3/4 of a word. Chinese text uses more tokens per character due to encoding.

ContentApproximate tokens
1 English word~1.3 tokens
1 Chinese character~2-3 tokens
1 line of code~10-20 tokens
1 paragraph~50-100 tokens
1 page of text~300-500 tokens

Token consumption by categoryโ€‹

System context (per session)โ€‹

ComponentTokensFrequency
CLAUDE.md (L1)~500Every session
Rules (L2)~2,000Every session
MCP tool descriptions~4,000+Every session
Hook outputs~200-500Session start
Subtotal~6,700Baseline cost

Agent operations (per action)โ€‹

ActionInput tokensOutput tokens
Simple Q&A~1,000~200
Q&A + 1 tool call~20,000~100
UB search~1,000 (query)~2,000-10,000 (results)
File read~100 (request)~1,000-10,000 (content)
WO creation~500~300
Skill invocation~500-2,000 (skill prompt)Variable

Dispatch engine usage (per WO execution)โ€‹

EngineTypical tokensCost
Groq~5,000-20,000Free
Gemini~5,000-20,000~$0.003-0.014
DeepSeek~5,000-20,000Cents
Claude (Sonnet dispatch)~20,000-100,000~$0.50-1.00
Claude (Opus direct)~20,000-200,000~$1.00-5.00

Measured baselinesโ€‹

From actual SS1 operations (2026-02-26):

MeasurementValue
LangGraph single Q&A + 1 tool20,677 input + 102 output tokens (Sonnet)
Tool descriptions for 35 tools~4,103 tokens
Session start (boot sequence)~6,700 tokens

Cost-per-token by engineโ€‹

EngineInput costOutput costNotes
Claude Opus 4.6HighestHighestMax Plan quota (shared)
Claude Sonnet 4.6ModerateModerateSeparate Sonnet-only quota available
Claude Haiku 4.5Lowest ClaudeLowest Claudeโ€”
Groq Llama 3.3FreeFreeFree tier
Gemini 2.5 FlashVery lowVery lowPer-request pricing
DeepSeek R1/V3Very lowVery lowPer-token pricing

Exact per-token rates follow each provider's published pricing.


Token optimization strategiesโ€‹

Reduce input tokensโ€‹

StrategySavingsHow
Progressive Disclosure~3,000-8,000/sessionLoad skills on demand
Concise messagesVariableShort, direct communication
Targeted file reads~5,000-20,000/readRead specific sections, not entire files
UB-first searchAvoids external API callsCheck UB before searching externally

Reduce output tokensโ€‹

StrategySavingsHow
Structured responsesVariableTables over prose
Skip explanations~100-500/responseDon't explain unless asked
Batch operationsVariableMultiple edits in one response

Reduce engine costโ€‹

StrategySavingsHow
Engine matching10-100xUse Groq for trivial, not Opus
Delegate search10-100xintel_search vs Opus WebSearch
Ingest findingsFuture savingsNext search hits UB cache, no API call

Tracking toolsโ€‹

Available nowโ€‹

# Engine availability and configuration
list_models()

# UB volume (indicates ingestion load)
get_stats()

# WO history with engine + actual_hours
list_work_orders(include_completed=True)

Planned (monitoring gaps)โ€‹

FeatureWhat it measuresStatus
Per-session token counterTotal tokens consumed in one sessionPlanned
Per-WO cost estimationEngine cost ร— tokens for each WOPlanned
Monthly cost dashboardAggregate across all agents and enginesPlanned
Budget alertsWarning when approaching quota limitsPlanned
Token-per-capabilityCost to build/maintain each system capabilityPlanned

The token budget mental modelโ€‹

Think of your context window as a budget:

Total budget:     200,000 tokens
System overhead: -6,700 tokens (CLAUDE.md + rules + tools)
Working capacity: 193,300 tokens

Each turn costs: ~1,000-20,000 tokens (depending on tool usage)
Compaction at: ~180,000 tokens consumed
After compaction: ~20,000 tokens (summary only)

Planning rule: A typical session gets 10-50 productive turns before compaction. Plan your work to complete logical units within this range, and use Session Handoff to persist unfinished work.


PageRelationship
Context WindowsContext capacity
CLAUDE.md + PDToken optimization via PD
PricingEngine cost details
Usage & CostAdmin tracking