The Attention Budget¶

Update (2026-02-11)

As of v0.4.0, ctx consolidated sessions into the journal mechanism. References to .context/sessions/ in this post reflect the architecture at the time of writing. Session history is now accessed via ctx recall and stored in .context/journal/.

Why Your AI Forgets What You Just Told It¶

Jose Alekhinne / 2026-02-03

Ever wondered why AI gets worse the longer you talk?

You paste a 2000-line file, explain the bug in detail, provide three examples...

...and the AI still suggests a fix that ignores half of what you said.

This isn't a bug. It is physics.

Understanding that single fact shaped every design decision behind ctx.

The Finite Resource Nobody Talks About¶

Here's something that took me too long to internalize: context is not free.

Every token you send to an AI model consumes a finite resource I call the attention budget.

The model doesn't just read tokens; it forms relationships between them: For n tokens, that's roughly n^2 relationships. Double the context, and the computation quadruples.

But the more important constraint isn't cost: It's attention density.

Attention Density

Attention density is how much focus each token receives relative to all other tokens in the context window.

As context grows, attention density drops: Each token gets a smaller slice of the model's focus. Nothing is ignored; but everything becomes blurrier.

Think of it like a flashlight: In a small room, it illuminates everything clearly. In a warehouse, it becomes a dim glow that barely reaches the corners.

This is why ctx agent has an explicit --budget flag:

ctx agent --budget 4000 # Force prioritization
ctx agent --budget 8000 # More context, lower attention density

The budget isn't just about cost. It's about preserving signal.

The Middle Gets Lost¶

This one surprised me.

Research shows that transformer-based models tend to attend more strongly to the beginning and end of a context window than to its middle (a phenomenon often called "lost in the middle").

Positional anchors matter, and the middle has fewer of them.

In practice, this means that information placed "somewhere in the middle" is statistically less salient, even if it's important.

ctx orders context files by logical progression—what the agent needs to know before it can understand the next thing:

CONSTITUTION.md: Constraints before action
TASKS.md: Focus before patterns
CONVENTIONS.md: How to write before where to write
ARCHITECTURE.md: Structure before history
DECISIONS.md: Past choices before gotchas
LEARNINGS.md: Lessons before terminology
GLOSSARY.md: Reference material
AGENT_PLAYBOOK.md: Meta instructions last

This ordering is about logical dependencies, not attention engineering. But it happens to be attention-friendly too:

The files that matter most—CONSTITUTION, TASKS, CONVENTIONS—land at the beginning of the context window, where attention is strongest.

Reference material like GLOSSARY sits in the middle, where lower salience is acceptable.

And AGENT_PLAYBOOK—the operating manual for the context system itself—sits at the end, also outside the "lost in the middle" zone. The agent reads what to work with before learning how the system works.

This is ctx's first primitive: hierarchical importance. Not all context is equal.

`ctx` Primitives¶

ctx is built on four primitives that directly address the attention budget problem.

Primitive 1: Separation of Concerns¶

Instead of a single mega-document, ctx uses separate files for separate purposes:

File	Purpose	Load When
CONSTITUTION.md	Inviolable rules	Always
TASKS.md	Current work	Session start
CONVENTIONS.md	How to write code	Before coding
ARCHITECTURE.md	System structure	Before making changes
DECISIONS.md	Architectural choices	When questioning approach
LEARNINGS.md	Gotchas	When stuck
GLOSSARY.md	Domain terminology	When clarifying terms
AGENT_PLAYBOOK.md	Operating manual	Session start
sessions/	Deep history	On demand
journal/	Session journal	On demand

This isn't just "organization": It is progressive disclosure.

Load only what's relevant to the task at hand. Preserve attention density.

Primitive 2: Explicit Budgets¶

The --budget flag forces a choice:

ctx agent --budget 4000

Here is a sample allocation:

Constitution: ~200 tokens (never truncated)
Tasks: ~500 tokens (current phase)
Conventions: ~800 tokens (key patterns)
Recent decisions: ~400 tokens (last 3)
…budget exhausted, stop loading

The constraint is the feature: It enforces ruthless prioritization.

Primitive 3: Indexes Over Full Content¶

DECISIONS.md and LEARNINGS.md both include index sections:

<!-- INDEX:START -->
| Date       | Decision                            |
|------------|-------------------------------------|
| 2026-01-15 | Use PostgreSQL for primary database |
| 2026-01-20 | Adopt Cobra for CLI framework       |
<!-- INDEX:END -->

An AI agent can scan ~50 tokens of index and decide which 200-token entries are worth loading.

This is just-in-time context.

References are cheaper than full text.

ctx uses the filesystem itself as a context structure:

.context/
├── CONSTITUTION.md
├── TASKS.md
├── sessions/
│   ├── 2026-01-15-*.md
│   └── 2026-01-20-*.md
└── archive/
    └── tasks-2026-01.md

The AI doesn't need every session loaded; it needs to know where to look.

ls .context/sessions/
cat .context/sessions/2026-01-20-auth-discussion.md

File names, timestamps, and directories encode relevance.

Navigation is cheaper than loading.

Progressive Disclosure in Practice¶

The naive approach to context is dumping everything upfront:

"Here's my entire codebase, all my documentation, every decision I've ever made—now help me fix this typo."

This is an antipattern.

Antipattern: Context Hoarding

Dumping everything "just in case" will silently destroy the attention density.

ctx takes the opposite approach:

ctx status                      # Quick overview (~100 tokens)
ctx agent --budget 4000         # Typical session
cat .context/sessions/...       # Deep dive when needed

Command	Tokens	Use Case
`ctx status`	~100	Human glance
`ctx agent --budget 4000`	4000	Normal work
`ctx agent --budget 8000`	8000	Complex tasks
Full session read	10000+	Investigation

Summaries first. Details on demand.

Quality Over Quantity¶

Here's the counterintuitive part: more context can make AI worse.

Extra tokens add noise, not clarity:

Hallucinated connections increase.
Signal per token drops.

The goal isn't maximum context. It's maximum signal per token.

This principle drives several ctx features:

Design Choice	Rationale
Separate files	Load only what's relevant
Explicit budgets	Enforce prioritization
Index sections	Cheap scanning
Task archiving	Keep active context clean
`ctx compact`	Periodic noise reduction

Completed work isn't deleted: It is moved somewhere cold.

Designing for Degradation¶

Here is the uncomfortable truth:

Context will degrade.

Long sessions stretch attention thin. Important details fade.

The real question isn't how to prevent degradation, but how to design for it.

ctx's answer is persistence:

Persist early. Persist often.

The AGENT_PLAYBOOK asks:

"If this session ended right now, would the next one know what happened?"

Capture learnings as they occur:

ctx add learning "JWT tokens require explicit cache invalidation" \
  --context "Debugging auth failures" \
  --lesson "Token refresh doesn't clear old tokens" \
  --application "Always invalidate cache on refresh"

Structure beats prose: Bullet points survive compression.

Headings remain scannable. Tables pack density.

And above all: single source of truth.

Reference decisions; don't duplicate them.

The `ctx` Philosophy¶

Context as Infrastructure

ctx is not a prompt: It is infrastructure.

ctx creates versioned files that persist across time and sessions.

The attention budget is fixed. You can't expand it. But you can spend it wisely:

Hierarchical importance
Progressive disclosure
Explicit budgets
Indexes over full content
Filesystem as structure

This is why ctx exists: not to cram more context into AI sessions, but to curate the right context for each moment.

The Mental Model¶

I now approach every AI interaction with one question:

"Given a fixed attention budget, what's the highest-signal thing I can load?"

Not "how do I explain everything," but "what's the minimum that matters."

That shift (from abundance to curation) is the difference between frustrating sessions and productive ones.

Spend your tokens wisely.

Your AI will thank you.