Agent Context Windows

Benchmarked against: Anthropic — Context windows Scope: How agents manage conversation context and what happens at the limits Key concept: Context is finite — plan for it

Every agent operates within a context window — the maximum amount of text (tokens) the model can see at once. Understanding context limits is essential for effective agent operation, especially during long sessions.

Context window sizes

Model	Context window	Effective capacity
Claude Opus 4.6	200K tokens	~150K usable (system prompt + tools consume space)
Claude Sonnet 4.6	200K tokens	~150K usable
Claude Haiku 4.5	200K tokens	~150K usable

"Effective capacity" accounts for the space consumed by:

System prompt (CLAUDE.md, rules, hooks output)
MCP tool descriptions (~4,000+ tokens for 35 tools)
Conversation history
Tool call results

What consumes context

Consumer	Approximate tokens	Notes
CLAUDE.md	~500	Lean boot config
Rules (L2 — always loaded)	~2,000	8 rule files
Skills (L3 — on demand)	~500-2,000 each	Only loaded when invoked
MCP tool descriptions	~4,000+	All connected MCP servers
Hook outputs	~200-500	SessionStart, compaction hooks
Each user message	Variable	Depends on length
Each tool result	Variable	UB searches, file reads can be large

Progressive Disclosure (PD) saves context

SuperPortia uses a 3-layer loading system to minimize context consumption:

Layer	What	When loaded	Size
L1	CLAUDE.md	Always	~500 tokens
L2	Rules	Always	~2,000 tokens
L3	Skills	On demand (when invoked)	~500-2,000 each

This means a fresh session starts with ~6,500 tokens of system context, leaving ~143,000 for actual work.

Context lifecycle

Phase 1: Fresh session (~6,500 tokens used)

System prompt loaded
Rules loaded
Hook outputs injected
Maximum working capacity

Phase 2: Active work (growing)

Each conversation turn adds tokens
Tool results (especially file reads, UB searches) can add thousands of tokens
Skills loaded on demand add to context

Phase 3: Approaching limits (~180K tokens)

Claude Code automatically compresses earlier messages
This is called context compaction
A summary replaces detailed conversation history
Context drops back to ~20K, freeing space

Phase 4: Post-compaction

Compaction Recovery Protocol activates
Agent resumes work from the compaction summary
No UB reads needed — summary has the context

Managing context effectively

Do

Practice	Why
Keep messages concise	Less context consumed per turn
Use PD skill loading	Skills only when needed
Ingest findings to UB	Don't rely on context to remember — persist to UB
Plan for compaction	Long tasks will hit the limit
Session handoff before ending	Persist todos to Cloud UB

Don't

Anti-pattern	Cost
Read entire large files	One file read can consume 10K+ tokens
Request verbose explanations	Wastes context on text that could be shorter
Load all skills at startup	Each skill = 500-2,000 tokens wasted
Ignore compaction warnings	Leads to disoriented agents
Store state only in conversation	Lost at compaction or session end

Context-aware agent behaviors

Behavior	Trigger	Action
Progressive Disclosure	Session start	Load L1+L2 only, L3 on demand
UB persistence	Important finding	Ingest to UB — survives compaction
Session handoff	Session ending	Todo list → Cloud UB
Compaction recovery	After compaction	Read summary, self-confirm, resume
Lean boot	Every session	Only 2 calls: heartbeat + mailbox

Page	Relationship
Compaction Recovery	What to do after compaction
Session Handoff	Persisting work across sessions
CLAUDE.md + PD	Progressive Disclosure system
Token Counting	Measuring usage

Context window sizes​

What consumes context​

Progressive Disclosure (PD) saves context​

Context lifecycle​

Phase 1: Fresh session (~6,500 tokens used)​

Phase 2: Active work (growing)​

Phase 3: Approaching limits (~180K tokens)​

Phase 4: Post-compaction​

Managing context effectively​

Do​

Don't​

Context-aware agent behaviors​

Related pages​