Token Counting & Cost

Benchmarked against: Anthropic — Token counting Rule source: Company Constitution §6, Cost Awareness Status: Partially implemented — baselines measured, per-session tracking planned

Tokens are the fundamental unit of AI engine consumption. Every word, every tool description, every UB search result consumes tokens — and tokens cost money. Understanding token usage is essential for cost management.

What is a token?

A token is roughly 4 characters of English text, or about 3/4 of a word. Chinese text uses more tokens per character due to encoding.

Content	Approximate tokens
1 English word	~1.3 tokens
1 Chinese character	~2-3 tokens
1 line of code	~10-20 tokens
1 paragraph	~50-100 tokens
1 page of text	~300-500 tokens

Token consumption by category

System context (per session)

Component	Tokens	Frequency
CLAUDE.md (L1)	~500	Every session
Rules (L2)	~2,000	Every session
MCP tool descriptions	~4,000+	Every session
Hook outputs	~200-500	Session start
Subtotal	~6,700	Baseline cost

Agent operations (per action)

Action	Input tokens	Output tokens
Simple Q&A	~1,000	~200
Q&A + 1 tool call	~20,000	~100
UB search	~1,000 (query)	~2,000-10,000 (results)
File read	~100 (request)	~1,000-10,000 (content)
WO creation	~500	~300
Skill invocation	~500-2,000 (skill prompt)	Variable

Dispatch engine usage (per WO execution)

Engine	Typical tokens	Cost
Groq	~5,000-20,000	Free
Gemini	~5,000-20,000	~$0.003-0.014
DeepSeek	~5,000-20,000	Cents
Claude (Sonnet dispatch)	~20,000-100,000	~$0.50-1.00
Claude (Opus direct)	~20,000-200,000	~$1.00-5.00

Measured baselines

From actual SS1 operations (2026-02-26):

Measurement	Value
LangGraph single Q&A + 1 tool	20,677 input + 102 output tokens (Sonnet)
Tool descriptions for 35 tools	~4,103 tokens
Session start (boot sequence)	~6,700 tokens

Cost-per-token by engine

Engine	Input cost	Output cost	Notes
Claude Opus 4.6	Highest	Highest	Max Plan quota (shared)
Claude Sonnet 4.6	Moderate	Moderate	Separate Sonnet-only quota available
Claude Haiku 4.5	Lowest Claude	Lowest Claude	—
Groq Llama 3.3	Free	Free	Free tier
Gemini 2.5 Flash	Very low	Very low	Per-request pricing
DeepSeek R1/V3	Very low	Very low	Per-token pricing

Exact per-token rates follow each provider's published pricing.

Token optimization strategies

Reduce input tokens

Strategy	Savings	How
Progressive Disclosure	~3,000-8,000/session	Load skills on demand
Concise messages	Variable	Short, direct communication
Targeted file reads	~5,000-20,000/read	Read specific sections, not entire files
UB-first search	Avoids external API calls	Check UB before searching externally

Reduce output tokens

Strategy	Savings	How
Structured responses	Variable	Tables over prose
Skip explanations	~100-500/response	Don't explain unless asked
Batch operations	Variable	Multiple edits in one response

Reduce engine cost

Strategy	Savings	How
Engine matching	10-100x	Use Groq for trivial, not Opus
Delegate search	10-100x	intel_search vs Opus WebSearch
Ingest findings	Future savings	Next search hits UB cache, no API call

Tracking tools

Available now

# Engine availability and configuration
list_models()

# UB volume (indicates ingestion load)
get_stats()

# WO history with engine + actual_hours
list_work_orders(include_completed=True)

Planned (monitoring gaps)

Feature	What it measures	Status
Per-session token counter	Total tokens consumed in one session	Planned
Per-WO cost estimation	Engine cost × tokens for each WO	Planned
Monthly cost dashboard	Aggregate across all agents and engines	Planned
Budget alerts	Warning when approaching quota limits	Planned
Token-per-capability	Cost to build/maintain each system capability	Planned

The token budget mental model

Think of your context window as a budget:

Total budget:     200,000 tokens
System overhead:   -6,700 tokens (CLAUDE.md + rules + tools)
Working capacity: 193,300 tokens

Each turn costs:  ~1,000-20,000 tokens (depending on tool usage)
Compaction at:    ~180,000 tokens consumed
After compaction: ~20,000 tokens (summary only)

Planning rule: A typical session gets 10-50 productive turns before compaction. Plan your work to complete logical units within this range, and use Session Handoff to persist unfinished work.

Page	Relationship
Context Windows	Context capacity
CLAUDE.md + PD	Token optimization via PD
Pricing	Engine cost details
Usage & Cost	Admin tracking

What is a token?​

Token consumption by category​

System context (per session)​

Agent operations (per action)​

Dispatch engine usage (per WO execution)​

Measured baselines​

Cost-per-token by engine​

Token optimization strategies​

Reduce input tokens​

Reduce output tokens​

Reduce engine cost​

Tracking tools​

Available now​

Planned (monitoring gaps)​

The token budget mental model​

Related pages​