Skip to content

Context Engineering: What Your Agent Needs to Know

A capable model with poor context produces poor results. Context engineering is the discipline of designing what the agent needs to know.

The context problem

Every AI agent operates inside a context window. Whatever is not in that window is effectively invisible to the agent. This creates a fundamental engineering challenge: you must design what the agent knows, not only what it can do.

Context engineering is the practice of designing, structuring, and maintaining the information that flows into an agent's context window — across time, across sessions, and across different stages of a task.

What goes in the context?

Context for a coding agent has several distinct layers:

LayerContentPersistence
Static contextArchitecture rules, coding standards, repo conventionsAlways present
Task contextFeature description, acceptance criteria, scope boundariesPer-task
State contextCurrent progress, what was completed, what failedPer-session
Code contextRelevant files, interfaces, test structuresDynamically loaded

The AGENTS.md pattern

A common emerging pattern for static context is a single file — AGENTS.md — committed to the repository root. It contains the stable instructions an agent needs to operate in that codebase: how to run tests, which directories are off-limits, naming conventions, and architectural constraints.

markdown
# AGENTS.md

## Build & test
- Run tests: `npm test`
- Lint: `npm run lint`
- Build: `npm run build`

## Scope rules
- Never modify files in `legacy/`
- Never edit `package-lock.json` directly

## Architecture
- All API calls go through `src/api/client.ts`
- Use `zod` for all data validation

Visual recap: Context Engineering at a glance

The full picture is broader than context-window management. Context engineering starts with what the agent sees at runtime, but it matures into a lifecycle for designing, testing, governing, and continuously improving the context that shapes agent behavior.

The context window budget

Context window space is finite. Filling it with irrelevant information can be as harmful as providing too little context. A disciplined approach allocates the budget explicitly:

  • ~20% — static context (AGENTS.md, architecture docs)
  • ~30% — task context (current feature, acceptance criteria)
  • ~20% — state context (progress file)
  • ~30% — dynamic code context (files relevant to the current subtask)
See the quadratic cost curve in actionStep through turns and watch input tokens grow — then enable Engineered mode to see what proactive context management changes.
Launch Widget →

Advanced: The Context Development Lifecycle

Context engineering often starts with managing what enters the agent's context window. That is only the first level.

In professional agentic engineering, context is no longer just a temporary input to a chat session. It becomes part of the software production system.

Specifications, AGENTS.md, CLAUDE.md, architecture maps, domain vocabulary, reusable skills, library documentation, MCP context, tickets, logs, and review feedback are no longer disposable prompts. They are software artifacts that directly shape agent behavior.

This creates a new engineering question:

If context changes agent behavior, how do we design, test, distribute, observe, and improve that context?

That is the role of the Context Development Lifecycle.

PhaseCore question
GenerateHave we clarified intent, vocabulary, constraints, and architecture before execution?
EvaluateDoes this context reliably shape the agent's behavior?
DistributeCan this context be packaged, versioned, reused, and governed across teams?
ObserveWhat do PR reviews, agent logs, and production incidents reveal about missing or misleading context?
ImproveHow do we feed those signals back into the context stack?

A mature workflow is not simply:

text
prompt → code

It becomes:

text
shared intent → evaluated context → generated code → telemetry → improved context

This is the bridge from vibe coding to professional agentic engineering.

The practical consequence is simple: context should be managed with the same seriousness as code.

🗺️
Try the interactive Context CockpitOne panel synthesizing the four context layers, the lifecycle loop, and context rot. Drag the turn slider to watch drift grow — then flip Engineered mode to snap it back.
Launch Widget →

Good practices: treating context as a software artifact

Once context shapes agent behavior, it needs engineering discipline.

1. Make context explicit

Do not rely on hidden conversation history for durable project knowledge.

Move durable instructions into versioned artifacts:

  • AGENTS.md
  • architecture maps
  • domain vocabulary
  • task specifications
  • test instructions
  • reusable skills
  • project-specific examples

A useful rule:

If the agent should remember it tomorrow, it probably does not belong only in today's chat.

2. Separate context layers

Do not collapse everything into a single giant instruction file.

A mature context stack usually has several layers:

LayerExamples
Global contextCompany conventions, security rules, domain vocabulary
System contextArchitecture maps, repository documentation, module boundaries
Agent contextAGENTS.md, CLAUDE.md, reusable skills, tool instructions
Task contextSpecs, tickets, MCP context, current logs, review comments

This separation matters because each layer changes at a different speed. Company conventions are relatively stable. Task context changes constantly. Mixing them creates noise, increases drift, and makes context harder to review.

3. Test context, not only code

A change to AGENTS.md, a reusable skill, or a project specification can change many future agent runs.

That means context needs evaluation.

A context eval asks:

Did this piece of context reliably shape the agent's behavior in the intended way?

Useful levels of context testing include:

LevelPurpose
Context lintingValidate structure, syntax, and required fields
Clarity checksCheck whether the instruction is explicit and complete enough for an LLM
Behavioral evalsTest whether the agent follows a project rule
Agentic E2ELet an agent run the system and verify real behavior

Example:

text
Context rule:
All API endpoints must start with /awesome.

Eval:
Ask the agent to add a new user endpoint.
Check whether the generated route follows the required prefix.

The point is not only to test the code that was generated. The point is to test whether the context caused the right behavior.

4. Think probabilistically

Traditional tests are deterministic: pass or fail.

Context evals are probabilistic. The same context may work with one model, fail with another, or pass only four times out of five.

So a passing context eval might mean:

text
This context causes the desired behavior 95% of the time
across 5 runs, 3 models, and 2 agent configurations.

For agentic systems, quality increasingly means reliability under variation.

5. Treat PR feedback as context telemetry

In a traditional workflow, PR feedback is used to fix code.

In an agentic workflow, PR feedback should also update the context stack.

When a reviewer finds a problem, ask: What context was missing, weak, or misleading?

Possible follow-up actions:

  • update AGENTS.md
  • improve the task spec
  • add a missing architecture rule
  • update the domain vocabulary
  • create a new context eval
  • improve a reusable skill
  • add a regression test

The goal is not only to fix the current PR. It is to prevent the same failure from recurring across future agent runs.

6. Govern shared context

For one developer, context is a productivity tool.

For an organization, shared context becomes platform infrastructure.

Reusable context should therefore be:

  • versioned
  • owned
  • reviewed
  • tested
  • documented
  • scanned for security issues
  • distributed through curated registries when reused across teams

Once context becomes executable through agents, it becomes part of the software supply chain.

Context rot

Context rot is not only a session problem. At enterprise scale, it also becomes a governance problem.

A stale instruction in AGENTS.md, an outdated architecture map, or an obsolete reusable skill can silently influence many future agent runs. In that sense, changing context without evaluation is close to changing production behavior without tests.

Context rot occurs when the information in the context window drifts out of sync with the actual state of the codebase. It is one of the most insidious failure modes in long-running agentic projects.

Prevention strategies:

  • Keep AGENTS.md versioned and reviewed on every PR
  • Add context evals for important rules and reusable skills
  • Write state files atomically — never partial updates
  • Use structured formats such as JSON or YAML for state, not free-form prose
  • Keep architecture maps close to the code they describe
  • Remove obsolete context aggressively
  • Treat PR review comments and production incidents as signals to improve context

TIP

The harness design (Lecture 04) specifies exactly how state files are written, read, and verified. Never leave state management to the agent's discretion.


Next: Harness Design →

Released under the MIT License.