Anthropic shipped /goal in Claude Code 2.1.139 (May 11, 2026 per official changelog; widely covered May 12): you state a completion condition, and the agent keeps working across turns until a separate fast model (Haiku by default) reads the transcript so far and decides whether the goal is met. Itβs the closest Claude Code has come to a βset it and walk awayβ mode. But the agentβs iterations all share one conversation context window β and once that window fills, accuracy collapses, the agent can hallucinate, and the transcript evaluator (which itself only sees the same transcript) can plausibly mark the goal complete when it isnβt. This entry teaches what /goal does, why it isnβt sufficient for hour-scale or day-scale runs, and the Orchestrator + Headless pattern (popularized in a May 15, 2026 walkthrough by Eric Tech) that puts the autonomy where the context wall canβt reach it.
What /goal is, in one minute
You type /goal inside Claude Codeβs terminal and provide a completion condition, e.g.:
/goal Migrate all legacy Auth components to the new design system,
and ensure tests pass.
Claude then:
- Plans how to satisfy the condition
- Executes edits, tool calls, test runs
- Evaluates whether the condition is met β a separate fast model (Haiku by default) reads the transcript so far and votes βdoneβ or βkeep going.β It does not run tools, inspect files, or verify the repo state; it only judges the conversation it can see.
- Loops β if not met, planning starts again
It tracks elapsed time, turns, and tokens; when the evaluator agrees the goal is satisfied, the goal clears and you get your terminal back. Available in interactive mode, programmatic mode (-p), and Remote Control.
The fundamental problem β the context wall
βThe slash-goal here typically stays in the same active conversation context window β meaning it will absolutely hit the context wall as the conversation progresses.β β Eric Tech
Each plan β execute β evaluate cycle of /goal adds to the same context window. The longer the run, the more the LLMβs effective accuracy drops. At some point β and you canβt predict exactly when β the agent:
- Mis-plans a step because earlier decisions have drifted out of focused attention
- Hallucinates a tool output or a fileβs contents
- Worst case: during the evaluation step, hallucinates that the condition is met when it isnβt
This is the failure mode /goal cannot solve from inside its own conversation: the system thatβs about to make a critical βare we done?β decision is precisely the system whose attention is being eroded.
ββ One growing context window ββββββββββββββββββββββββββββββ
β β
β plan ββΆ execute ββΆ evaluate ββΆ plan ββΆ execute ββΆ ... β
β β
β ββββ context fills, accuracy drops ββββΆ β
β β
β β false "condition met" possible near the wall β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The pattern: Orchestrator + Headless
The fix is mundane: stop doing the work inside the orchestratorβs own context window. Split the system into two roles:
βββββββββββββββββββββββββββ
β Orchestrator session β β stays small, low context %
β (single Claude Code β β decides "what's next"
β conversation; you β β reads state; dispatches
β keep this clean) β
ββββββββββββββ¬βββββββββββββ
β for each iteration:
βΌ
βββββββββββββββββββββββββββ
β Headless worker β β fresh context each time
β (claude -p ...) β β does the real work
β β β can spawn its OWN subagents
β Reports terse result. β β *terminates* when done
βββββββββββββββββββββββββββ
β
βΌ
state file / GitHub project
(the actual memory)
Why this works:
- The orchestrator only sees terse iteration results β small, structured handoffs (e.g., βQA found 3 bugs; bug IDs in GitHub projectβ). Its context stays well below the wall.
- Each headless worker is born with a clean context, does its iteration, writes state externally, and dies. Thereβs no accumulating drift across iterations.
- The βmemory of the projectβ lives in a state file or GitHub project, not in any LLMβs conversation.
Why not just use subagents? A subagent reports its findings back to the parent β that means everything it produced ends up consumed in the parentβs context. Ericβs argument: for hour- or day-scale autonomous work, you want the orchestrator to stay so clean that subagents are a luxury it canβt afford. The headless worker can spawn its own subagents without polluting the orchestrator.
State: where the project actually lives
Ericβs recommended substrate: GitHub Projects columns. Anything else works β a .md file, a SQLite DB β but GitHub Projects has two practical wins:
| Why GitHub Projects | Detail |
|---|---|
| Free | No new infrastructure |
gh CLI built into Claude Code |
The agent can read/write tickets without an MCP server |
| Visible to humans | You can watch the run unfold in a browser, intervene, re-prioritize |
| Per-ticket history | Each ticket carries its own audit trail |
Eric uses six columns:
| Column | Meaning |
|---|---|
| queue | Items pending β the orchestrator pops from here |
| testing | Item currently in flight |
| done | Spec passing |
| bug | Test failed; needs the build worker |
| flaky | Only works on retry |
| skip | Out of scope |
The orchestratorβs job each iteration is essentially: βpop one from queue, dispatch the right headless skill, move the result to the right column.β Thatβs it. The interesting work is in the skills.
Worked example: Super-QA + Super-Build cycling on an app
Ericβs walkthrough demonstrates the pattern with two headless skills orchestrated by a third super-orchestrator skill:
super-orchestrator ββΆ super-QA ββΆ finds bugs ββΆ GitHub bug column
β² β
β βΌ
βββββ super-build βββ fixes bugs βββ GitHub bug column
super-QA (find bugs)
- Traverses the appβs pages using breadth-first search
- Visited set keeps it from re-testing pages it already covered
- Writes Playwright end-to-end tests for each page
- If a test fails β opens a ticket in the bug column
- If a page has child pages β adds them to queue
- Terminates the headless session, returns βfound N bugsβ
super-build (fix bugs)
- Reads from the bug column
- Uses the Superpowers TDD framework (obra/superpowers): write failing test β implement β refactor β verify
- For non-obvious design decisions, invokes Gstack β Garry Tanβs auto-decision skill. (Note: Eric describes
/autoplanas voting across CEO / engineer / security / designer / QA roles; the upstream Gstack/autoplanactually runs a CEO β Design β Eng β DX chain with six auto-decision principles and dual Claude/Codex voices. Treat Ericβs framing as a useful mental model, but read the actual Gstack source before adopting.) - Terminates with βfixed M bugsβ
The loop terminates when
There are no items left in queue AND no items left in bug. Thatβs the completion condition the orchestrator monitors β external to any single LLM conversation.
Why this is βAgentic Engineeringβ rather than βprompt engineeringβ
The Orchestrator + Headless pattern is a small case study in the Agentic Engineering primerβs five-layer framing:
| Layer | What this pattern does |
|---|---|
| Prompt | Iteration prompts are short and structured (βhereβs ticket #N, do super-QA on itβ) |
| Agent | Two roles β orchestrator (long-lived, clean) + worker (short-lived, fresh). Memory lives in state, not context. |
| LLM | Same model; the win is how itβs invoked, not which model |
| MCP | Tools are normal β gh CLI, Playwright, file edits |
| Tools | Workers spawn freely; no parent-context pollution |
The diagnostic question shifts from βis my prompt good?β to βis my orchestratorβs context window staying small?β
Teaching Mode β for CS-310 students
A two-week classroom unit, paired with the What is Agentic Engineering? primer. Plan for heavy scaffolding β Claude Code setup, account/auth/token caps, a pre-wired GitHub project, and a mock target repo should all be provided. Six contact hours is tight if students are configuring Claude Code from scratch; the lab times below assume the starter materials are ready on Day 1.
Week 1 β /goal in isolation (~2 hr lab)
| Activity | Output |
|---|---|
| Read this entry; watch the Eric Tech video | Students can articulate the context-wall problem in their own words |
In pairs, run a short /goal task on a provided sample repo and observe the context-usage indicator as it runs |
A note describing what they saw β typically a steady climb in context % as iterations stack up |
Week 2 β Refactor into Orchestrator + Headless (~4 hr lab)
| Activity | Output |
|---|---|
Provided starter: an orchestrator skill that calls claude -p for each iteration, plus a pre-configured GitHub project with the six columns and 5-10 seed tickets |
Students wire up a single iteration end-to-end |
| Replace the in-context loop from Week 1 with the orchestrator pattern; re-run on the same sample repo | The same task, now with a flatter orchestrator context line because the work happens in fresh claude -p sessions |
1-page reflection: when is plain /goal actually fine? |
Forces the student to name the threshold (short, single-pass, low-stakes tasks) β not every problem needs the orchestrator pattern |
Assessment
- Practical: provide a buggy app + a target spec; the student must build an orchestrator that drives it to green
- Conceptual: given a transcript with a βfalse complete,β identify which iteration drifted and why the context-wall caused it
How LearnAI Team Could Use This
- Production-style autonomous-agent demonstrations β the orchestrator + headless pattern is a useful way to show βAI building software overnightβ without students drawing the wrong lesson (that
/goalalone is sufficient). - Onboarding senior students to long-running agent workflows β the orchestrator-vs-worker split is a common abstraction in modern AI-engineering practice and worth surfacing before students hit it in industry.
- Security teaching (CS-336) β the false-completion failure mode has a security flavor: an evaluator that only reads the transcript can be misled if the transcript itself has been corrupted by context drift (intentional or not). Worth at least a 1-hour discussion.
- Companion to existing entries β pair with Gstack (decision-voting), Harness Engineering (why the runtime layer matters), and Autoresearch (an earlier autonomous-loop pattern).
Real-World Use Cases
| Scenario | How to use the pattern |
|---|---|
| Overnight bug-fix sweep on a legacy module | Orchestrator + super-QA + super-build; goal: βqueue is empty AND bug column is emptyβ |
| Migrating a UI component library | Orchestrator drives one component per iteration; state in GitHub project; each iteration handled headlessly |
| Mass API documentation backfill | Orchestrator iterates over endpoints from a state file; worker writes + verifies docs per endpoint |
| Long-form research synthesis | Orchestrator iterates over a reading-list state file; worker reads + summarizes one paper at a time |
| Course-grading automation (LearnAI use case) | Orchestrator iterates over student submissions; worker runs the rubric + writes a feedback artifact per student |
Important things to know
/goalalone is fine for short, single-pass tasks. The point isnβt that/goalis broken β itβs that the context-wall failure mode is invisible until it bites you. Use/goalfor short, bounded jobs where the entire run fits comfortably inside a fresh context window; reach for the orchestrator pattern when the task plausibly runs hour-scale or longer.- The evaluator is not a safety net. Anthropicβs Haiku evaluator helps, but it reads the transcript only β no tool calls, no file inspection. Donβt trust βcondition metβ as ground truth on long runs; check the actual repo state.
- Subagents are not a substitute. Subagent results flow back into the parentβs context, defeating the point. The headless
claude -pinvocation is what keeps the orchestrator clean. - State must live outside any LLM conversation. A
.mdfile, a SQLite DB, or GitHub Projects β pick one and commit. The orchestrator should be able to crash and restart without losing progress. - Cost is real. Hour- and day-scale runs incur hour- and day-scale token bills. Set hard limits at the orchestrator layer (max iterations, max tokens), not inside
/goal. - The orchestrator skill is where most of the engineering effort lands. Queue management, retry policy, the βis this iteration good enough to commit?β check, escalation to humans when stuck β all of these live here, not in the worker skills.
- Companion deep-dives in this wiki:
- What is Agentic Engineering? A Teaching Primer β the 5-layer framework that contextualizes this pattern
- Harness Engineering β The Real Bottleneck Isnβt the Model β orchestrator design as a discipline
- Claude Code Β· CLAUDE.md Practices β how to manage what does live in context
- Agents Need Control Flow β argument for code over prompts in the orchestrator layer
- Gstack β Garry Tanβs AI Software Factory β the decision-voting layer Ericβs super-build uses
- Autoresearch β Autonomous ML Experiments Overnight β an earlier instance of the same idea, applied to ML training loops