Context Engineering: What Your Agent Needs to Know

A capable model with poor context produces poor results. Context engineering is the discipline of designing what the agent needs to know.

The context problem

Every AI agent operates inside a context window. Whatever is not in that window is effectively invisible to the agent. This creates a fundamental engineering challenge: you must design what the agent knows, not only what it can do.

Context engineering is the practice of designing, structuring, and maintaining the information that flows into an agent's context window — across time, across sessions, and across different stages of a task.

What goes in the context?

Context for a coding agent has several distinct layers:

Layer	Content	Persistence
Static context	Architecture rules, coding standards, repo conventions	Always present
Task context	Feature description, acceptance criteria, scope boundaries	Per-task
State context	Current progress, what was completed, what failed	Per-session
Code context	Relevant files, interfaces, test structures	Dynamically loaded

The AGENTS.md pattern

A common emerging pattern for static context is a single file — AGENTS.md — committed to the repository root. It contains the stable instructions an agent needs to operate in that codebase: how to run tests, which directories are off-limits, naming conventions, and architectural constraints.

markdown

# AGENTS.md

## Build & test
- Run tests: `npm test`
- Lint: `npm run lint`
- Build: `npm run build`

## Scope rules
- Never modify files in `legacy/`
- Never edit `package-lock.json` directly

## Architecture
- All API calls go through `src/api/client.ts`
- Use `zod` for all data validation

Visual recap

Visual recap: Context Engineering at a glance

The full picture is broader than context-window management. Context engineering starts with what the agent sees at runtime, but it matures into a lifecycle for designing, testing, governing, and continuously improving the context that shapes agent behavior.

Infographic recap of Lecture 03 covering runtime context layers, the Context Development Lifecycle, good practices, and context rot prevention.

View full-sizeDownload as PNG

The context window budget

Context window space is finite. Filling it with irrelevant information can be as harmful as providing too little context. A disciplined approach allocates the budget explicitly:

~20% — static context (AGENTS.md, architecture docs)
~30% — task context (current feature, acceptance criteria)
~20% — state context (progress file)
~30% — dynamic code context (files relevant to the current subtask)

Advanced: The Context Development Lifecycle

Context engineering often starts with managing what enters the agent's context window. That is only the first level.

In professional agentic engineering, context is no longer just a temporary input to a chat session. It becomes part of the software production system.

Specifications, AGENTS.md, CLAUDE.md, architecture maps, domain vocabulary, reusable skills, library documentation, MCP context, tickets, logs, and review feedback are no longer disposable prompts. They are software artifacts that directly shape agent behavior.

This creates a new engineering question:

If context changes agent behavior, how do we design, test, distribute, observe, and improve that context?

That is the role of the Context Development Lifecycle.

Phase	Core question
Generate	Have we clarified intent, vocabulary, constraints, and architecture before execution?
Evaluate	Does this context reliably shape the agent's behavior?
Distribute	Can this context be packaged, versioned, reused, and governed across teams?
Observe	What do PR reviews, agent logs, and production incidents reveal about missing or misleading context?
Improve	How do we feed those signals back into the context stack?

A mature workflow is not simply:

text

prompt → code

It becomes:

text

shared intent → evaluated context → generated code → telemetry → improved context

This is the bridge from vibe coding to professional agentic engineering.

The practical consequence is simple: context should be managed with the same seriousness as code.

Good practices: treating context as a software artifact

Once context shapes agent behavior, it needs engineering discipline.

1. Make context explicit

Do not rely on hidden conversation history for durable project knowledge.

Move durable instructions into versioned artifacts:

AGENTS.md
architecture maps
domain vocabulary
task specifications
test instructions
reusable skills
project-specific examples

A useful rule:

If the agent should remember it tomorrow, it probably does not belong only in today's chat.

2. Separate context layers

Do not collapse everything into a single giant instruction file.

A mature context stack usually has several layers:

Layer	Examples
Global context	Company conventions, security rules, domain vocabulary
System context	Architecture maps, repository documentation, module boundaries
Agent context	`AGENTS.md`, `CLAUDE.md`, reusable skills, tool instructions
Task context	Specs, tickets, MCP context, current logs, review comments

This separation matters because each layer changes at a different speed. Company conventions are relatively stable. Task context changes constantly. Mixing them creates noise, increases drift, and makes context harder to review.

3. Test context, not only code

A change to AGENTS.md, a reusable skill, or a project specification can change many future agent runs.

That means context needs evaluation.

A context eval asks:

Did this piece of context reliably shape the agent's behavior in the intended way?

Useful levels of context testing include:

Level	Purpose
Context linting	Validate structure, syntax, and required fields
Clarity checks	Check whether the instruction is explicit and complete enough for an LLM
Behavioral evals	Test whether the agent follows a project rule
Agentic E2E	Let an agent run the system and verify real behavior

Example:

text

Context rule:
All API endpoints must start with /awesome.

Eval:
Ask the agent to add a new user endpoint.
Check whether the generated route follows the required prefix.

The point is not only to test the code that was generated. The point is to test whether the context caused the right behavior.

4. Think probabilistically

Traditional tests are deterministic: pass or fail.

Context evals are probabilistic. The same context may work with one model, fail with another, or pass only four times out of five.

So a passing context eval might mean:

text

This context causes the desired behavior 95% of the time
across 5 runs, 3 models, and 2 agent configurations.

For agentic systems, quality increasingly means reliability under variation.

5. Treat PR feedback as context telemetry

In a traditional workflow, PR feedback is used to fix code.

In an agentic workflow, PR feedback should also update the context stack.

When a reviewer finds a problem, ask: What context was missing, weak, or misleading?

Possible follow-up actions:

update AGENTS.md
improve the task spec
add a missing architecture rule
update the domain vocabulary
create a new context eval
improve a reusable skill
add a regression test

The goal is not only to fix the current PR. It is to prevent the same failure from recurring across future agent runs.

6. Govern shared context

For one developer, context is a productivity tool.

For an organization, shared context becomes platform infrastructure.

Reusable context should therefore be:

versioned
owned
reviewed
tested
documented
scanned for security issues
distributed through curated registries when reused across teams

Once context becomes executable through agents, it becomes part of the software supply chain.

Context rot

Context rot is not only a session problem. At enterprise scale, it also becomes a governance problem.

A stale instruction in AGENTS.md, an outdated architecture map, or an obsolete reusable skill can silently influence many future agent runs. In that sense, changing context without evaluation is close to changing production behavior without tests.

Context rot occurs when the information in the context window drifts out of sync with the actual state of the codebase. It is one of the most insidious failure modes in long-running agentic projects.

Prevention strategies:

Keep AGENTS.md versioned and reviewed on every PR
Add context evals for important rules and reusable skills
Write state files atomically — never partial updates
Use structured formats such as JSON or YAML for state, not free-form prose
Keep architecture maps close to the code they describe
Remove obsolete context aggressively
Treat PR review comments and production incidents as signals to improve context

TIP

The harness design (Lecture 04) specifies exactly how state files are written, read, and verified. Never leave state management to the agent's discretion.

Next: Harness Design →

Context Engineering: What Your Agent Needs to Know ​

The context problem ​

What goes in the context? ​

The AGENTS.md pattern ​

Visual recap: Context Engineering at a glance ​

The context window budget ​

Advanced: The Context Development Lifecycle ​

Good practices: treating context as a software artifact ​

1. Make context explicit ​

2. Separate context layers ​

3. Test context, not only code ​

4. Think probabilistically ​

5. Treat PR feedback as context telemetry ​

6. Govern shared context ​

Context rot ​

Context Engineering: What Your Agent Needs to Know

The context problem

What goes in the context?

The AGENTS.md pattern

Visual recap: Context Engineering at a glance

The context window budget

Advanced: The Context Development Lifecycle

Good practices: treating context as a software artifact

1. Make context explicit

2. Separate context layers

3. Test context, not only code

4. Think probabilistically

5. Treat PR feedback as context telemetry

6. Govern shared context

Context rot