Observability: Making Agent Behavior Debuggable
If you can't see what the agent did, you can't fix what went wrong.
Why observability is different for agents
In traditional software, observability means logs, metrics, and traces of system behavior. For agentic systems, you need all of that — plus traces of reasoning behavior: what did the agent plan, what did it decide, what did it change its mind about?
The three observability layers
Layer 1: Code-level observability
Standard: logs, error traces, test results. These are produced by the verification pipeline and should be written to the state file after every loop.
Layer 2: Session-level observability
What happened in this agent session? A structured session log should capture:
- Which files were read
- Which files were modified
- Which tests were run and their results
- The agent's self-assessment at session end
Layer 3: Harness-level observability
Over time: is the harness working? Track metrics like:
- Task completion rate per session
- Verification pipeline pass rate
- Scope violations (out-of-bounds file edits)
- Session restart rate (proxy for context loss)
Practical instrumentation
The minimum viable observability setup:
bash
# At session end, the agent writes:
echo '{
"session_id": "2026-04-17-001",
"tasks_completed": ["feat/auth-login"],
"tasks_failed": [],
"files_modified": ["src/auth/login.ts", "tests/auth.test.ts"],
"verification": "passed",
"next_task": "feat/auth-logout"
}' > .agent/session-log.jsonThis file is committed to git, giving you a full audit trail of every agent session.