Skip to main content

Agent Context Windows

Benchmarked against: Anthropic — Context windows Scope: How agents manage conversation context and what happens at the limits Key concept: Context is finite — plan for it

Every agent operates within a context window — the maximum amount of text (tokens) the model can see at once. Understanding context limits is essential for effective agent operation, especially during long sessions.


Context window sizes

ModelContext windowEffective capacity
Claude Opus 4.6200K tokens~150K usable (system prompt + tools consume space)
Claude Sonnet 4.6200K tokens~150K usable
Claude Haiku 4.5200K tokens~150K usable

"Effective capacity" accounts for the space consumed by:

  • System prompt (CLAUDE.md, rules, hooks output)
  • MCP tool descriptions (~4,000+ tokens for 35 tools)
  • Conversation history
  • Tool call results

What consumes context

ConsumerApproximate tokensNotes
CLAUDE.md~500Lean boot config
Rules (L2 — always loaded)~2,0008 rule files
Skills (L3 — on demand)~500-2,000 eachOnly loaded when invoked
MCP tool descriptions~4,000+All connected MCP servers
Hook outputs~200-500SessionStart, compaction hooks
Each user messageVariableDepends on length
Each tool resultVariableUB searches, file reads can be large

Progressive Disclosure (PD) saves context

SuperPortia uses a 3-layer loading system to minimize context consumption:

LayerWhatWhen loadedSize
L1CLAUDE.mdAlways~500 tokens
L2RulesAlways~2,000 tokens
L3SkillsOn demand (when invoked)~500-2,000 each

This means a fresh session starts with ~6,500 tokens of system context, leaving ~143,000 for actual work.


Context lifecycle

Phase 1: Fresh session (~6,500 tokens used)

  • System prompt loaded
  • Rules loaded
  • Hook outputs injected
  • Maximum working capacity

Phase 2: Active work (growing)

  • Each conversation turn adds tokens
  • Tool results (especially file reads, UB searches) can add thousands of tokens
  • Skills loaded on demand add to context

Phase 3: Approaching limits (~180K tokens)

  • Claude Code automatically compresses earlier messages
  • This is called context compaction
  • A summary replaces detailed conversation history
  • Context drops back to ~20K, freeing space

Phase 4: Post-compaction


Managing context effectively

Do

PracticeWhy
Keep messages conciseLess context consumed per turn
Use PD skill loadingSkills only when needed
Ingest findings to UBDon't rely on context to remember — persist to UB
Plan for compactionLong tasks will hit the limit
Session handoff before endingPersist todos to Cloud UB

Don't

Anti-patternCost
Read entire large filesOne file read can consume 10K+ tokens
Request verbose explanationsWastes context on text that could be shorter
Load all skills at startupEach skill = 500-2,000 tokens wasted
Ignore compaction warningsLeads to disoriented agents
Store state only in conversationLost at compaction or session end

Context-aware agent behaviors

BehaviorTriggerAction
Progressive DisclosureSession startLoad L1+L2 only, L3 on demand
UB persistenceImportant findingIngest to UB — survives compaction
Session handoffSession endingTodo list → Cloud UB
Compaction recoveryAfter compactionRead summary, self-confirm, resume
Lean bootEvery sessionOnly 2 calls: heartbeat + mailbox

PageRelationship
Compaction RecoveryWhat to do after compaction
Session HandoffPersisting work across sessions
CLAUDE.md + PDProgressive Disclosure system
Token CountingMeasuring usage