How do you build an AI agent system that can crash, restart, scale, and evolve β without losing work or leaking secrets? Anthropicβs engineering team published their answer: Managed Agents, an architecture that splits the βbrainβ (Claudeβs reasoning loop) from the βhandsβ (code execution sandboxes) and the βsessionβ (durable event log). The result: p50 latency dropped 60%, p95 dropped over 90%, and every component became replaceable. This is the systems design behind Claudeβs agent infrastructure.
| *Source: Anthropic Engineering Blog (Apr 8, 2026) | Managed Agents API Docs | Claude Agent SDK* |
The Core Problem: βPetβ Containers
The old architecture put everything β Claudeβs reasoning, tool execution, conversation state β in a single container connected via WebSocket. Each container was a βpetβ: if it died, the session was lost. If it was slow, you nursed it back to health. Debugging was impossible because container failures, network drops, and harness bugs all looked the same through the WebSocket event stream.
The Architecture: Brain / Hands / Session
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MANAGED AGENTS β
β β
β ββββββββββββ ββββββββββββββββ βββββββββββββ β
β β BRAIN β β HANDS β β SESSION β β
β β β β β β β β
β β Claude + ββββ>β Sandboxes β β Append-onlyβ β
β β Harness β β MCP servers β β event log β β
β β (stateless)β<βββ Custom toolsβ β (durable) β β
β ββββββββββββ ββββββββββββββββ βββββββββββββ β
β β β β β
β β execute() β emitEvent() β β
β ββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | What It Is | Key Property |
|---|---|---|
| Brain | Claude + harness loop | Stateless β can crash and be replaced |
| Hands | Sandboxes, containers, MCP servers, tools | Interchangeable β execute(name, input) β string |
| Session | Append-only event log | Durable β survives any crash, queryable |
The execute() Interface
The genius is in the simplicity:
execute(name, input) β string
A name and input go in, a string comes out. The brain doesnβt care if itβs talking to a Docker container, an MCP server, or a custom API. Just like the OS read() command doesnβt care if itβs reading from a 1970s disk pack or a modern SSD.
Crash Recovery
Container crashes
β
New harness boots β wake(sessionId) β getSession(id) β resume from last event
β
No work lost. No user impact.
The session log is the single source of truth. When anything crashes, a new instance boots, retrieves the log, and continues. All components are βcattle, not petsβ β replaceable, restartable, disposable.
Security Model
Credentials never enter the sandbox:
| Resource | How It Works |
|---|---|
| Git repos | Tokens clone repos during init, wired into local remotes. Agent pushes/pulls without seeing tokens. |
| External APIs | Tokens in encrypted vault. Claude calls MCP tools via proxy β proxy fetches credentials β makes external call. |
The code Claude generates and executes can never access real secrets. Even if the sandbox is compromised, thereβs nothing to steal.
Performance Gains
| Metric | Before | After | Improvement |
|---|---|---|---|
| p50 latency (TTFT) | Baseline | ~40% of baseline | 60% reduction |
| p95 latency (TTFT) | Baseline | ~10% of baseline | >90% reduction |
The improvement comes from lazy container provisioning β inference starts immediately from the session log. Containers spin up in the background while Claude is already thinking.
The API
import anthropic
client = anthropic.Anthropic()
# Create an agent (once)
agent = client.beta.agents.create(
name="Code Review Agent",
model="claude-opus-4-6",
system="You are an expert code reviewer...",
tool_choice={"type": "agent_toolset", "version": "20260401"}
)
# Create a session (per task)
session = client.beta.sessions.create(
agent_id=agent.id,
environment_id=env.id
)
# Stream results
with client.beta.sessions.stream(session.id) as stream:
for event in stream:
print(event.delta.text)
Pricing: $0.08/session-hour (active execution only) + standard Claude token rates. A 30-minute task costs ~$0.04 in runtime.
Managed Agents vs. Agent SDK
| Β | Managed Agents (API) | Agent SDK (Library) |
|---|---|---|
| You manage | Nothing β Anthropic handles infra | Your own deployment |
| Best for | Async tasks, no-DevOps teams, long-running work | Custom agent loops, self-hosted, fine-grained control |
| Crash recovery | Built-in | You implement it |
| Sandbox security | Built-in | You implement it |
| Cost | $0.08/hr + tokens | Just tokens (your compute) |
How LearnAI Team Could Use This
For Teaching
- Automated assignment review β Create a βCode Review Agentβ that students submit code to. The agent runs in a managed sandbox, executes tests, reviews style, and returns feedback β all without managing any infrastructure.
- Interactive lab environments β Each student gets a session with a coding agent that can execute code safely in a sandbox. No server setup, no container management.
- Office hours bot β A managed agent with the course syllabus, textbook, and past lecture notes as context. Students ask questions anytime, the agent responds with source-grounded answers.
For Research
- Long-running analysis agents β Research tasks that take hours (literature review, data processing) run as managed sessions. If anything crashes, work resumes automatically.
- Multi-agent research workflows β Use the orchestration preview to coordinate agents: one searches papers, one extracts data, one synthesizes findings.
- Reproducible experiments β Every agent session is a durable event log. Share the session ID and anyone can replay the exact sequence of reasoning and actions.
For the LAI Project
- Wiki content agents β Agents that monitor new papers/tools, draft wiki entries, and queue them for review. Run on a schedule, crash-resilient.
- Student project scaffolding β An agent that takes a project spec and generates starter code, tests, and documentation in a managed sandbox β students get a working starting point.
Real-World Use Cases
- CI/CD code review β Companies run managed agents on every PR. The agent checks out the code in a sandbox, runs tests, reviews for security issues, and posts comments β all without access to production credentials.
- Customer support automation β Agents that access internal knowledge bases via MCP, resolve tickets, and escalate to humans when confidence is low. Session logs provide full audit trail.
- Data pipeline debugging β When a data pipeline fails, a managed agent gets the logs, spins up a sandbox with the pipeline code, reproduces the issue, and suggests fixes.
- Research replication β Academic teams use managed agents to replicate paper results: the agent reads the paper, writes code, executes experiments in a sandbox, and compares results.
The OS Analogy
The blog post draws a deliberate parallel to operating system design. Just as Unixβs read() abstracts away storage hardware, execute() abstracts away tool implementation. Just as processes are isolated from each other, brain and hands are isolated. And just as the filesystem persists beyond any process lifetime, the session log persists beyond any container lifetime.
This isnβt just agent infrastructure β itβs the beginning of an operating system for AI agents. The βmeta-harnessβ that doesnβt assume what future harness needs will be, only that the brain needs to operate, the hands need to execute, and the session needs to survive.