Benchmarked against: Anthropic — Context windows
Scope: How agents manage conversation context and what happens at the limits
Key concept: Context is finite — plan for it
Every agent operates within a context window — the maximum amount of text (tokens) the model can see at once. Understanding context limits is essential for effective agent operation, especially during long sessions.
Context window sizes
| Model | Context window | Effective capacity |
|---|
| Claude Opus 4.6 | 200K tokens | ~150K usable (system prompt + tools consume space) |
| Claude Sonnet 4.6 | 200K tokens | ~150K usable |
| Claude Haiku 4.5 | 200K tokens | ~150K usable |
"Effective capacity" accounts for the space consumed by:
- System prompt (CLAUDE.md, rules, hooks output)
- MCP tool descriptions (~4,000+ tokens for 35 tools)
- Conversation history
- Tool call results
What consumes context
| Consumer | Approximate tokens | Notes |
|---|
| CLAUDE.md | ~500 | Lean boot config |
| Rules (L2 — always loaded) | ~2,000 | 8 rule files |
| Skills (L3 — on demand) | ~500-2,000 each | Only loaded when invoked |
| MCP tool descriptions | ~4,000+ | All connected MCP servers |
| Hook outputs | ~200-500 | SessionStart, compaction hooks |
| Each user message | Variable | Depends on length |
| Each tool result | Variable | UB searches, file reads can be large |
Progressive Disclosure (PD) saves context
SuperPortia uses a 3-layer loading system to minimize context consumption:
| Layer | What | When loaded | Size |
|---|
| L1 | CLAUDE.md | Always | ~500 tokens |
| L2 | Rules | Always | ~2,000 tokens |
| L3 | Skills | On demand (when invoked) | ~500-2,000 each |
This means a fresh session starts with ~6,500 tokens of system context, leaving ~143,000 for actual work.
Context lifecycle
Phase 1: Fresh session (~6,500 tokens used)
- System prompt loaded
- Rules loaded
- Hook outputs injected
- Maximum working capacity
Phase 2: Active work (growing)
- Each conversation turn adds tokens
- Tool results (especially file reads, UB searches) can add thousands of tokens
- Skills loaded on demand add to context
Phase 3: Approaching limits (~180K tokens)
- Claude Code automatically compresses earlier messages
- This is called context compaction
- A summary replaces detailed conversation history
- Context drops back to ~20K, freeing space
Phase 4: Post-compaction
Managing context effectively
| Practice | Why |
|---|
| Keep messages concise | Less context consumed per turn |
| Use PD skill loading | Skills only when needed |
| Ingest findings to UB | Don't rely on context to remember — persist to UB |
| Plan for compaction | Long tasks will hit the limit |
| Session handoff before ending | Persist todos to Cloud UB |
Don't
| Anti-pattern | Cost |
|---|
| Read entire large files | One file read can consume 10K+ tokens |
| Request verbose explanations | Wastes context on text that could be shorter |
| Load all skills at startup | Each skill = 500-2,000 tokens wasted |
| Ignore compaction warnings | Leads to disoriented agents |
| Store state only in conversation | Lost at compaction or session end |
Context-aware agent behaviors
| Behavior | Trigger | Action |
|---|
| Progressive Disclosure | Session start | Load L1+L2 only, L3 on demand |
| UB persistence | Important finding | Ingest to UB — survives compaction |
| Session handoff | Session ending | Todo list → Cloud UB |
| Compaction recovery | After compaction | Read summary, self-confirm, resume |
| Lean boot | Every session | Only 2 calls: heartbeat + mailbox |
Related pages