Context Engineering: What Your Agent Needs to Know
A capable model with poor context produces poor results. Context engineering is the discipline of designing what the agent needs to know.
The context problem
Every AI agent operates inside a context window. Whatever is not in that window is effectively invisible to the agent. This creates a fundamental engineering challenge: you must design what the agent knows, not only what it can do.
Context engineering is the practice of designing, structuring, and maintaining the information that flows into an agent's context window — across time, across sessions, and across different stages of a task.
What goes in the context?
Context for a coding agent has several distinct layers:
| Layer | Content | Persistence |
|---|---|---|
| Static context | Architecture rules, coding standards, repo conventions | Always present |
| Task context | Feature description, acceptance criteria, scope boundaries | Per-task |
| State context | Current progress, what was completed, what failed | Per-session |
| Code context | Relevant files, interfaces, test structures | Dynamically loaded |
The AGENTS.md pattern
A common emerging pattern for static context is a single file — AGENTS.md — committed to the repository root. It contains the stable instructions an agent needs to operate in that codebase: how to run tests, which directories are off-limits, naming conventions, and architectural constraints.
# AGENTS.md
## Build & test
- Run tests: `npm test`
- Lint: `npm run lint`
- Build: `npm run build`
## Scope rules
- Never modify files in `legacy/`
- Never edit `package-lock.json` directly
## Architecture
- All API calls go through `src/api/client.ts`
- Use `zod` for all data validationVisual recap: Context Engineering at a glance
The full picture is broader than context-window management. Context engineering starts with what the agent sees at runtime, but it matures into a lifecycle for designing, testing, governing, and continuously improving the context that shapes agent behavior.
The context window budget
Context window space is finite. Filling it with irrelevant information can be as harmful as providing too little context. A disciplined approach allocates the budget explicitly:
- ~20% — static context (AGENTS.md, architecture docs)
- ~30% — task context (current feature, acceptance criteria)
- ~20% — state context (progress file)
- ~30% — dynamic code context (files relevant to the current subtask)
Advanced: The Context Development Lifecycle
Context engineering often starts with managing what enters the agent's context window. That is only the first level.
In professional agentic engineering, context is no longer just a temporary input to a chat session. It becomes part of the software production system.
Specifications, AGENTS.md, CLAUDE.md, architecture maps, domain vocabulary, reusable skills, library documentation, MCP context, tickets, logs, and review feedback are no longer disposable prompts. They are software artifacts that directly shape agent behavior.
This creates a new engineering question:
If context changes agent behavior, how do we design, test, distribute, observe, and improve that context?
That is the role of the Context Development Lifecycle.
| Phase | Core question |
|---|---|
| Generate | Have we clarified intent, vocabulary, constraints, and architecture before execution? |
| Evaluate | Does this context reliably shape the agent's behavior? |
| Distribute | Can this context be packaged, versioned, reused, and governed across teams? |
| Observe | What do PR reviews, agent logs, and production incidents reveal about missing or misleading context? |
| Improve | How do we feed those signals back into the context stack? |
A mature workflow is not simply:
prompt → codeIt becomes:
shared intent → evaluated context → generated code → telemetry → improved contextThis is the bridge from vibe coding to professional agentic engineering.
The practical consequence is simple: context should be managed with the same seriousness as code.
Good practices: treating context as a software artifact
Once context shapes agent behavior, it needs engineering discipline.
1. Make context explicit
Do not rely on hidden conversation history for durable project knowledge.
Move durable instructions into versioned artifacts:
AGENTS.md- architecture maps
- domain vocabulary
- task specifications
- test instructions
- reusable skills
- project-specific examples
A useful rule:
If the agent should remember it tomorrow, it probably does not belong only in today's chat.
2. Separate context layers
Do not collapse everything into a single giant instruction file.
A mature context stack usually has several layers:
| Layer | Examples |
|---|---|
| Global context | Company conventions, security rules, domain vocabulary |
| System context | Architecture maps, repository documentation, module boundaries |
| Agent context | AGENTS.md, CLAUDE.md, reusable skills, tool instructions |
| Task context | Specs, tickets, MCP context, current logs, review comments |
This separation matters because each layer changes at a different speed. Company conventions are relatively stable. Task context changes constantly. Mixing them creates noise, increases drift, and makes context harder to review.
3. Test context, not only code
A change to AGENTS.md, a reusable skill, or a project specification can change many future agent runs.
That means context needs evaluation.
A context eval asks:
Did this piece of context reliably shape the agent's behavior in the intended way?
Useful levels of context testing include:
| Level | Purpose |
|---|---|
| Context linting | Validate structure, syntax, and required fields |
| Clarity checks | Check whether the instruction is explicit and complete enough for an LLM |
| Behavioral evals | Test whether the agent follows a project rule |
| Agentic E2E | Let an agent run the system and verify real behavior |
Example:
Context rule:
All API endpoints must start with /awesome.
Eval:
Ask the agent to add a new user endpoint.
Check whether the generated route follows the required prefix.The point is not only to test the code that was generated. The point is to test whether the context caused the right behavior.
4. Think probabilistically
Traditional tests are deterministic: pass or fail.
Context evals are probabilistic. The same context may work with one model, fail with another, or pass only four times out of five.
So a passing context eval might mean:
This context causes the desired behavior 95% of the time
across 5 runs, 3 models, and 2 agent configurations.For agentic systems, quality increasingly means reliability under variation.
5. Treat PR feedback as context telemetry
In a traditional workflow, PR feedback is used to fix code.
In an agentic workflow, PR feedback should also update the context stack.
When a reviewer finds a problem, ask: What context was missing, weak, or misleading?
Possible follow-up actions:
- update
AGENTS.md - improve the task spec
- add a missing architecture rule
- update the domain vocabulary
- create a new context eval
- improve a reusable skill
- add a regression test
The goal is not only to fix the current PR. It is to prevent the same failure from recurring across future agent runs.
6. Govern shared context
For one developer, context is a productivity tool.
For an organization, shared context becomes platform infrastructure.
Reusable context should therefore be:
- versioned
- owned
- reviewed
- tested
- documented
- scanned for security issues
- distributed through curated registries when reused across teams
Once context becomes executable through agents, it becomes part of the software supply chain.
Context rot
Context rot is not only a session problem. At enterprise scale, it also becomes a governance problem.
A stale instruction in AGENTS.md, an outdated architecture map, or an obsolete reusable skill can silently influence many future agent runs. In that sense, changing context without evaluation is close to changing production behavior without tests.
Context rot occurs when the information in the context window drifts out of sync with the actual state of the codebase. It is one of the most insidious failure modes in long-running agentic projects.
Prevention strategies:
- Keep
AGENTS.mdversioned and reviewed on every PR - Add context evals for important rules and reusable skills
- Write state files atomically — never partial updates
- Use structured formats such as JSON or YAML for state, not free-form prose
- Keep architecture maps close to the code they describe
- Remove obsolete context aggressively
- Treat PR review comments and production incidents as signals to improve context
TIP
The harness design (Lecture 04) specifies exactly how state files are written, read, and verified. Never leave state management to the agent's discretion.
Next: Harness Design →
