Seeing Like an Agent — How Anthropic Designs Tools for Claude Code

Thariq Shihipar (Anthropic) revealed the internal design process behind Claude Code’s tools — from three failed attempts at building AskUserQuestion, to replacing TodoWrite with the Task tool, to abandoning RAG in favor of agent-driven search. The core principle: design tools shaped to the model’s abilities, not your assumptions about them. The only way to know what the model needs is to read its outputs, experiment, and learn to “see like an agent.”

Source: Seeing like an agent: how we design tools in Claude Code (Anthropic, April 2026)

The Core Framework

Imagine you’re solving a hard math problem. What tools do you want?

Your skill level	Best tool	Tradeoff
Basic	Paper	Limited by manual calculations
Intermediate	Calculator	Need to know the advanced buttons
Expert	Computer + code	Most powerful, highest skill floor

The same applies to AI agents. The tool should match the model’s current capabilities — too simple constrains it, too complex confuses it. And as models improve, the right tool changes.

Case Study 1: AskUserQuestion — Three Attempts

The goal: improve Claude’s ability to ask users structured questions (elicitation).

Attempt 1: Bolt it onto ExitPlanTool

Added a questions parameter to the existing ExitPlanTool. Failed because Claude got confused — is it outputting a plan or asking questions? What if answers contradict the plan?

Attempt 2: Custom markdown format

Told Claude to output questions in a special markdown format (bullet points with bracketed options). Claude could usually produce it, but not reliably — it would append extra sentences, drop options, or abandon the structure.

Attempt 3: Dedicated tool (winner)

Created a standalone AskUserQuestion tool that:

Claude can call at any point (especially during plan mode)
Shows a modal with structured options
Blocks the agent loop until the user answers
Works in the Agent SDK and with skills

Why it worked: Claude “liked calling this tool” and produced good outputs. The lesson: even the best designed tool doesn’t work if Claude doesn’t understand how to call it.

Attempt 1: Overload existing tool     → Model confused
Attempt 2: Custom output format       → Unreliable formatting
Attempt 3: Dedicated tool + modal UI  → Model uses it naturally ✓

Case Study 2: TodoWrite → Task Tool

The evolution as models got smarter:

Early Claude Code:
  TodoWrite + system reminders every 5 turns
  → Model forgot what to do without reminders
  → Reminders made it stick rigidly to the list

Opus 4.5+:
  Task tool (replaced TodoWrite)
  → Tasks support dependencies
  → Subagents can coordinate on shared tasks
  → Model can alter and delete tasks (not just check them off)

The key insight: As model capabilities increase, tools that once helped start constraining. The TodoWrite reminders that kept early Claude on track made later Claude too rigid — it wouldn’t adapt when the plan needed to change. The Task tool gives structure without rigidity.

Case Study 3: RAG → Agent-Driven Search → Progressive Disclosure

The biggest shift in how Claude Code finds context:

Phase	How context was built	Problem
v1: RAG	Vector DB pre-indexed the codebase; harness retrieved snippets	Fragile setup, Claude was given context passively
v2: Grep tool	Claude searches the codebase itself	Better — Claude builds its own context
v3: Agent Skills	Progressive disclosure — Claude reads skill files that reference other files recursively	Nested, layered context discovery

“Over the course of a year, Claude went from not really being able to build its own context to being able to do nested search across several layers of files to find the exact context it needed.”

Progressive Disclosure in Practice

Instead of frontloading all information in the system prompt (context rot), give Claude a pointer to where information lives and let it discover what it needs:

Bad:  System prompt with 50K tokens of documentation
      → Most tokens are irrelevant to the current task
      → Context rot degrades quality

Good: System prompt says "docs are in .claude/skills/"
      → Claude reads the index when needed
      → Follows references to specific files
      → Only loads what's relevant

Case Study 4: The Claude Code Guide Subagent

Problem: Claude didn’t know how to explain its own features (MCP setup, slash commands, etc.).

Approach	Result
Put all docs in system prompt	Context rot — interfered with coding, which is the main job
Link to docs, let Claude read them	Worked, but Claude pulled huge doc chunks into context
Subagent — dedicated doc-search agent	Searches in its own context, returns only the answer

The Guide subagent is progressive disclosure + context isolation. The main agent’s context stays clean; the doc-searching noise stays in the child.

Design Principles Summary

Principle	What it means
Shape tools to the model	Match the tool’s complexity to what the model can reliably use
Read your outputs	Watch what the model actually does with the tool — don’t assume
Revisit old tools	As models improve, yesterday’s help becomes today’s constraint
Progressive disclosure	Don’t frontload — let the model discover context as needed
Minimize tool count	Claude Code has ~20 tools. The bar to add a new one is high
Prefer skills over tools	Add functionality through discoverable files, not new tool definitions

The Math Problem Analogy (Applied)

Building an agent? Ask:

"If I were the model, and I had to solve this task,
 what tools would I want — given MY skill set?"

Not your skill set. The model's.
That's what "seeing like an agent" means.

How LearnAI Team Could Use This

Agent SDK projects: When students build agents in CS305, this framework teaches them to iterate on tool design — start simple, read outputs, evolve. The three AskUserQuestion attempts are a perfect teaching case.
CLAUDE.md engineering: Progressive disclosure is exactly what Q’s CLAUDE.md setup does — skills reference files that reference other files, building context on demand.
Harness engineering research: This post is primary-source evidence for the harness engineering thesis — the model is only as good as its tools, and tool design is the bottleneck.
Plugin/skill development: When building new skills for Claude Code, follow the “minimize tool count, maximize progressive disclosure” principle.

Real-World Use Cases

Scenario	Principle applied
Building a custom Claude Code skill	Shape to the model — test if Claude actually calls it reliably
Agent keeps ignoring your instructions	Read outputs — is the tool confusing the model? Simplify
Upgrading from Opus 4.6 to 4.7	Revisit old tools — 4.7 reasons more, needs fewer rigid guides
Agent pulls too much context	Progressive disclosure — give pointers, not content
Adding a 21st tool to your agent	Minimize count — can you use a skill file instead?