Token Counting & Cost
Benchmarked against: Anthropic โ Token counting Rule source: Company Constitution ยง6, Cost Awareness Status: Partially implemented โ baselines measured, per-session tracking planned
Tokens are the fundamental unit of AI engine consumption. Every word, every tool description, every UB search result consumes tokens โ and tokens cost money. Understanding token usage is essential for cost management.
What is a token?โ
A token is roughly 4 characters of English text, or about 3/4 of a word. Chinese text uses more tokens per character due to encoding.
| Content | Approximate tokens |
|---|---|
| 1 English word | ~1.3 tokens |
| 1 Chinese character | ~2-3 tokens |
| 1 line of code | ~10-20 tokens |
| 1 paragraph | ~50-100 tokens |
| 1 page of text | ~300-500 tokens |
Token consumption by categoryโ
System context (per session)โ
| Component | Tokens | Frequency |
|---|---|---|
| CLAUDE.md (L1) | ~500 | Every session |
| Rules (L2) | ~2,000 | Every session |
| MCP tool descriptions | ~4,000+ | Every session |
| Hook outputs | ~200-500 | Session start |
| Subtotal | ~6,700 | Baseline cost |
Agent operations (per action)โ
| Action | Input tokens | Output tokens |
|---|---|---|
| Simple Q&A | ~1,000 | ~200 |
| Q&A + 1 tool call | ~20,000 | ~100 |
| UB search | ~1,000 (query) | ~2,000-10,000 (results) |
| File read | ~100 (request) | ~1,000-10,000 (content) |
| WO creation | ~500 | ~300 |
| Skill invocation | ~500-2,000 (skill prompt) | Variable |
Dispatch engine usage (per WO execution)โ
| Engine | Typical tokens | Cost |
|---|---|---|
| Groq | ~5,000-20,000 | Free |
| Gemini | ~5,000-20,000 | ~$0.003-0.014 |
| DeepSeek | ~5,000-20,000 | Cents |
| Claude (Sonnet dispatch) | ~20,000-100,000 | ~$0.50-1.00 |
| Claude (Opus direct) | ~20,000-200,000 | ~$1.00-5.00 |
Measured baselinesโ
From actual SS1 operations (2026-02-26):
| Measurement | Value |
|---|---|
| LangGraph single Q&A + 1 tool | 20,677 input + 102 output tokens (Sonnet) |
| Tool descriptions for 35 tools | ~4,103 tokens |
| Session start (boot sequence) | ~6,700 tokens |
Cost-per-token by engineโ
| Engine | Input cost | Output cost | Notes |
|---|---|---|---|
| Claude Opus 4.6 | Highest | Highest | Max Plan quota (shared) |
| Claude Sonnet 4.6 | Moderate | Moderate | Separate Sonnet-only quota available |
| Claude Haiku 4.5 | Lowest Claude | Lowest Claude | โ |
| Groq Llama 3.3 | Free | Free | Free tier |
| Gemini 2.5 Flash | Very low | Very low | Per-request pricing |
| DeepSeek R1/V3 | Very low | Very low | Per-token pricing |
Exact per-token rates follow each provider's published pricing.
Token optimization strategiesโ
Reduce input tokensโ
| Strategy | Savings | How |
|---|---|---|
| Progressive Disclosure | ~3,000-8,000/session | Load skills on demand |
| Concise messages | Variable | Short, direct communication |
| Targeted file reads | ~5,000-20,000/read | Read specific sections, not entire files |
| UB-first search | Avoids external API calls | Check UB before searching externally |
Reduce output tokensโ
| Strategy | Savings | How |
|---|---|---|
| Structured responses | Variable | Tables over prose |
| Skip explanations | ~100-500/response | Don't explain unless asked |
| Batch operations | Variable | Multiple edits in one response |
Reduce engine costโ
| Strategy | Savings | How |
|---|---|---|
| Engine matching | 10-100x | Use Groq for trivial, not Opus |
| Delegate search | 10-100x | intel_search vs Opus WebSearch |
| Ingest findings | Future savings | Next search hits UB cache, no API call |
Tracking toolsโ
Available nowโ
# Engine availability and configuration
list_models()
# UB volume (indicates ingestion load)
get_stats()
# WO history with engine + actual_hours
list_work_orders(include_completed=True)
Planned (monitoring gaps)โ
| Feature | What it measures | Status |
|---|---|---|
| Per-session token counter | Total tokens consumed in one session | Planned |
| Per-WO cost estimation | Engine cost ร tokens for each WO | Planned |
| Monthly cost dashboard | Aggregate across all agents and engines | Planned |
| Budget alerts | Warning when approaching quota limits | Planned |
| Token-per-capability | Cost to build/maintain each system capability | Planned |
The token budget mental modelโ
Think of your context window as a budget:
Total budget: 200,000 tokens
System overhead: -6,700 tokens (CLAUDE.md + rules + tools)
Working capacity: 193,300 tokens
Each turn costs: ~1,000-20,000 tokens (depending on tool usage)
Compaction at: ~180,000 tokens consumed
After compaction: ~20,000 tokens (summary only)
Planning rule: A typical session gets 10-50 productive turns before compaction. Plan your work to complete logical units within this range, and use Session Handoff to persist unfinished work.
Related pagesโ
| Page | Relationship |
|---|---|
| Context Windows | Context capacity |
| CLAUDE.md + PD | Token optimization via PD |
| Pricing | Engine cost details |
| Usage & Cost | Admin tracking |