Context Compilation
"The LLM context window is a bounded, lossy channel. Output quality degrades monotonically with token volume, regardless of the advertised window." — Schick, The Root Theorem of Context Engineering, 2026
The problem
Frontier models advertise context windows of 200k, 1M, 2M tokens. In practice, quality falls off a cliff well before the limit — usually around 30–50k tokens for reasoning-heavy tasks. The naive solutions all fail:
- Transcript replay (the LangChain
ConversationBufferMemorydefault) hits the cliff fast and re-feeds stale reasoning. - Summarise on overflow triggers after the cliff has been crossed.
- More retrieval retrieves more low-density chunks that crowd out signal.
The engineering objective is to maximise the signal-to-token ratio:
over the compiled context . That requires a strict, active budget — not a fallback.
The Veridex answer: the effective window
Every Veridex agent operates under an effective window that is strictly smaller than the model's raw limit. is the active constraint — the compiler refuses to exceed it.
import { createAgent, ContextCompiler } from '@veridex/agents';
const agent = createAgent(
{
name: 'long-horizon-researcher',
instructions: '...',
tools: [...],
context: {
V_e: 24_000, // 24k tokens, regardless of model max
budgets: {
persona: 300,
instructions: 1_200,
memory: 6_000,
history: 12_000,
tools: 3_000,
headroom: 1_500, // generation reserve
},
},
},
{ modelProviders: { default: provider } },
);Defaults are conservative: V_e ≈ 0.4 × raw_window for frontier models, lower for tool-heavy agents.
Per-section compression
The compiler runs every turn and compresses each section to its budget:
| Section | Strategy |
|---|---|
| History | Rolling summarisation of stale turns; embedding-based redundancy detection; keep-last- verbatim ( default) with tool round-trips paired |
| Memory | Score (relevance · recency · confidence · density) → top- within budget; tier-aware weights |
| Tools | Progressive disclosure: only tools relevant to the live task; skill packs enabled per-turn via skill_manage |
| Persona / instructions | Hard-capped; static; never compressed silently |
| Headroom | Reserved for generation; never consumed by other sections |
The homeostatic cycle
There is no "garbage collection mode." Compression is the steady state — every turn accumulates, compresses, rewrites, and sheds. The compiler emits a context_compiled event with the per-section token counts and the compression actions taken. You can inspect it live:
import { useContextHealth } from '@veridex/agents-react';
function ContextHealth({ runId }: { runId: string }) {
const { sections, V_e, utilisation, lastActions } = useContextHealth(runId);
return (
<div>
<Bar value={utilisation} max={1} label={`V_e=${V_e}`} />
{Object.entries(sections).map(([k, t]) => <Row name={k} tokens={t} />)}
<Log actions={lastActions} />
</div>
);
}A real-world example: 50-turn research agent
A research agent runs for 50 turns over two hours. With naive transcript replay, by turn 30 the context is 90k tokens and the model is hallucinating about its own earlier conclusions. With Veridex's compiler:
| Turn | History tokens | Memory tokens | Total | Quality (LLM judge) |
|---|---|---|---|---|
| 5 | 3,200 | 800 | 9,800 | 0.92 |
| 15 | 8,000 | 2,400 | 17,200 | 0.91 |
| 30 | 11,500 (rolling summary) | 4,100 (promoted facts) | 22,400 | 0.91 |
| 50 | 12,000 (rolling summary) | 5,500 | 23,600 | 0.90 |
Total tokens never exceed . Episodic observations are promoted to semantic memory (Memory) so they survive the rolling summary as compressed facts. Quality stays flat instead of falling off the cliff.
Inspectable, testable, replayable
The compiler is deterministic given the inputs. Tests can assert:
expect(result.trace.events.filter(e => e.type === 'context_compiled'))
.toSatisfyAll(e => e.payload.totalTokens <= 24_000);The stateful eval harness (Testing) ships long-horizon cases that explicitly gate on and quality plateau.
Why not just trust the model's raw window?
We did the experiment. We did not like the results. Neither did the Root Theorem. The compiler is the difference between an agent that holds a coherent conversation for an hour and one that drifts into incoherence by minute 20.
Related
- ADR-0050 — the design decision
- Memory — what feeds the memory section
- Tools — progressive disclosure of tool schemas
- Internal architecture §5 — algorithm details