Thariq Shihipar (Anthropic) revealed the internal design process behind Claude Code’s tools — from three failed attempts at building AskUserQuestion, to replacing TodoWrite with the Task tool, to abandoning RAG in favor of agent-driven search. The core principle: design tools shaped to the model’s abilities, not your assumptions about them. The only way to know what the model needs is to read its outputs, experiment, and learn to “see like an agent.”
Source: Seeing like an agent: how we design tools in Claude Code (Anthropic, April 2026)
The Core Framework
Imagine you’re solving a hard math problem. What tools do you want?
| Your skill level | Best tool | Tradeoff |
|---|---|---|
| Basic | Paper | Limited by manual calculations |
| Intermediate | Calculator | Need to know the advanced buttons |
| Expert | Computer + code | Most powerful, highest skill floor |
The same applies to AI agents. The tool should match the model’s current capabilities — too simple constrains it, too complex confuses it. And as models improve, the right tool changes.
Case Study 1: AskUserQuestion — Three Attempts
The goal: improve Claude’s ability to ask users structured questions (elicitation).
Attempt 1: Bolt it onto ExitPlanTool
Added a questions parameter to the existing ExitPlanTool. Failed because Claude got confused — is it outputting a plan or asking questions? What if answers contradict the plan?
Attempt 2: Custom markdown format
Told Claude to output questions in a special markdown format (bullet points with bracketed options). Claude could usually produce it, but not reliably — it would append extra sentences, drop options, or abandon the structure.
Attempt 3: Dedicated tool (winner)
Created a standalone AskUserQuestion tool that:
- Claude can call at any point (especially during plan mode)
- Shows a modal with structured options
- Blocks the agent loop until the user answers
- Works in the Agent SDK and with skills
Why it worked: Claude “liked calling this tool” and produced good outputs. The lesson: even the best designed tool doesn’t work if Claude doesn’t understand how to call it.
Attempt 1: Overload existing tool → Model confused
Attempt 2: Custom output format → Unreliable formatting
Attempt 3: Dedicated tool + modal UI → Model uses it naturally ✓
Case Study 2: TodoWrite → Task Tool
The evolution as models got smarter:
Early Claude Code:
TodoWrite + system reminders every 5 turns
→ Model forgot what to do without reminders
→ Reminders made it stick rigidly to the list
Opus 4.5+:
Task tool (replaced TodoWrite)
→ Tasks support dependencies
→ Subagents can coordinate on shared tasks
→ Model can alter and delete tasks (not just check them off)
The key insight: As model capabilities increase, tools that once helped start constraining. The TodoWrite reminders that kept early Claude on track made later Claude too rigid — it wouldn’t adapt when the plan needed to change. The Task tool gives structure without rigidity.
Case Study 3: RAG → Agent-Driven Search → Progressive Disclosure
The biggest shift in how Claude Code finds context:
| Phase | How context was built | Problem |
|---|---|---|
| v1: RAG | Vector DB pre-indexed the codebase; harness retrieved snippets | Fragile setup, Claude was given context passively |
| v2: Grep tool | Claude searches the codebase itself | Better — Claude builds its own context |
| v3: Agent Skills | Progressive disclosure — Claude reads skill files that reference other files recursively | Nested, layered context discovery |
“Over the course of a year, Claude went from not really being able to build its own context to being able to do nested search across several layers of files to find the exact context it needed.”
Progressive Disclosure in Practice
Instead of frontloading all information in the system prompt (context rot), give Claude a pointer to where information lives and let it discover what it needs:
Bad: System prompt with 50K tokens of documentation
→ Most tokens are irrelevant to the current task
→ Context rot degrades quality
Good: System prompt says "docs are in .claude/skills/"
→ Claude reads the index when needed
→ Follows references to specific files
→ Only loads what's relevant
Case Study 4: The Claude Code Guide Subagent
Problem: Claude didn’t know how to explain its own features (MCP setup, slash commands, etc.).
| Approach | Result |
|---|---|
| Put all docs in system prompt | Context rot — interfered with coding, which is the main job |
| Link to docs, let Claude read them | Worked, but Claude pulled huge doc chunks into context |
| Subagent — dedicated doc-search agent | Searches in its own context, returns only the answer |
The Guide subagent is progressive disclosure + context isolation. The main agent’s context stays clean; the doc-searching noise stays in the child.
Design Principles Summary
| Principle | What it means |
|---|---|
| Shape tools to the model | Match the tool’s complexity to what the model can reliably use |
| Read your outputs | Watch what the model actually does with the tool — don’t assume |
| Revisit old tools | As models improve, yesterday’s help becomes today’s constraint |
| Progressive disclosure | Don’t frontload — let the model discover context as needed |
| Minimize tool count | Claude Code has ~20 tools. The bar to add a new one is high |
| Prefer skills over tools | Add functionality through discoverable files, not new tool definitions |
The Math Problem Analogy (Applied)
Building an agent? Ask:
"If I were the model, and I had to solve this task,
what tools would I want — given MY skill set?"
Not your skill set. The model's.
That's what "seeing like an agent" means.
How LearnAI Team Could Use This
- Agent SDK projects: When students build agents in CS305, this framework teaches them to iterate on tool design — start simple, read outputs, evolve. The three AskUserQuestion attempts are a perfect teaching case.
- CLAUDE.md engineering: Progressive disclosure is exactly what Q’s CLAUDE.md setup does — skills reference files that reference other files, building context on demand.
- Harness engineering research: This post is primary-source evidence for the harness engineering thesis — the model is only as good as its tools, and tool design is the bottleneck.
- Plugin/skill development: When building new skills for Claude Code, follow the “minimize tool count, maximize progressive disclosure” principle.
Real-World Use Cases
| Scenario | Principle applied |
|---|---|
| Building a custom Claude Code skill | Shape to the model — test if Claude actually calls it reliably |
| Agent keeps ignoring your instructions | Read outputs — is the tool confusing the model? Simplify |
| Upgrading from Opus 4.6 to 4.7 | Revisit old tools — 4.7 reasons more, needs fewer rigid guides |
| Agent pulls too much context | Progressive disclosure — give pointers, not content |
| Adding a 21st tool to your agent | Minimize count — can you use a skill file instead? |