Beyond /goal β€” The Orchestrator + Headless Pattern for Long-Running Claude Sessions

Beyond /goal β€” The Orchestrator + Headless Pattern for Long-Running Claude Sessions

Anthropic shipped /goal in Claude Code 2.1.139 (May 11, 2026 per official changelog; widely covered May 12): you state a completion condition, and the agent keeps working across turns until a separate fast model (Haiku by default) reads the transcript so far and decides whether the goal is met. It’s the closest Claude Code has come to a β€œset it and walk away” mode. But the agent’s iterations all share one conversation context window β€” and once that window fills, accuracy collapses, the agent can hallucinate, and the transcript evaluator (which itself only sees the same transcript) can plausibly mark the goal complete when it isn’t. This entry teaches what /goal does, why it isn’t sufficient for hour-scale or day-scale runs, and the Orchestrator + Headless pattern (popularized in a May 15, 2026 walkthrough by Eric Tech) that puts the autonomy where the context wall can’t reach it.

*Source: Eric Tech β€” β€œStop Using Claude’s /goal Feature | Here’s What Works” (YouTube, May 15, 2026) Claude Code 2.1.139 release notes summary (explainx.ai) Joe Njenga, β€œI Tested (New) Claude Code /goal Command” (Medium, May 2026)*

What /goal is, in one minute

You type /goal inside Claude Code’s terminal and provide a completion condition, e.g.:

/goal Migrate all legacy Auth components to the new design system,
      and ensure tests pass.

Claude then:

  1. Plans how to satisfy the condition
  2. Executes edits, tool calls, test runs
  3. Evaluates whether the condition is met β€” a separate fast model (Haiku by default) reads the transcript so far and votes β€œdone” or β€œkeep going.” It does not run tools, inspect files, or verify the repo state; it only judges the conversation it can see.
  4. Loops β€” if not met, planning starts again

It tracks elapsed time, turns, and tokens; when the evaluator agrees the goal is satisfied, the goal clears and you get your terminal back. Available in interactive mode, programmatic mode (-p), and Remote Control.

The fundamental problem β€” the context wall

β€œThe slash-goal here typically stays in the same active conversation context window β€” meaning it will absolutely hit the context wall as the conversation progresses.” β€” Eric Tech

Each plan β†’ execute β†’ evaluate cycle of /goal adds to the same context window. The longer the run, the more the LLM’s effective accuracy drops. At some point β€” and you can’t predict exactly when β€” the agent:

  • Mis-plans a step because earlier decisions have drifted out of focused attention
  • Hallucinates a tool output or a file’s contents
  • Worst case: during the evaluation step, hallucinates that the condition is met when it isn’t

This is the failure mode /goal cannot solve from inside its own conversation: the system that’s about to make a critical β€œare we done?” decision is precisely the system whose attention is being eroded.

β”Œβ”€ One growing context window ─────────────────────────────┐
β”‚                                                          β”‚
β”‚  plan ─▢ execute ─▢ evaluate ─▢ plan ─▢ execute ─▢ ...   β”‚
β”‚                                                          β”‚
β”‚              ◀─── context fills, accuracy drops ───▢     β”‚
β”‚                                                          β”‚
β”‚  ⚠ false "condition met" possible near the wall          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The pattern: Orchestrator + Headless

The fix is mundane: stop doing the work inside the orchestrator’s own context window. Split the system into two roles:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Orchestrator session   β”‚   ← stays small, low context %
β”‚  (single Claude Code    β”‚   ← decides "what's next"
β”‚   conversation; you     β”‚   ← reads state; dispatches
β”‚   keep this clean)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  for each iteration:
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Headless worker        β”‚   ← fresh context each time
β”‚  (claude -p ...)        β”‚   ← does the real work
β”‚                         β”‚   ← can spawn its OWN subagents
β”‚  Reports terse result.  β”‚   ← *terminates* when done
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
   state file / GitHub project
   (the actual memory)

Why this works:

  • The orchestrator only sees terse iteration results β€” small, structured handoffs (e.g., β€œQA found 3 bugs; bug IDs in GitHub project”). Its context stays well below the wall.
  • Each headless worker is born with a clean context, does its iteration, writes state externally, and dies. There’s no accumulating drift across iterations.
  • The β€œmemory of the project” lives in a state file or GitHub project, not in any LLM’s conversation.

Why not just use subagents? A subagent reports its findings back to the parent β€” that means everything it produced ends up consumed in the parent’s context. Eric’s argument: for hour- or day-scale autonomous work, you want the orchestrator to stay so clean that subagents are a luxury it can’t afford. The headless worker can spawn its own subagents without polluting the orchestrator.

State: where the project actually lives

Eric’s recommended substrate: GitHub Projects columns. Anything else works β€” a .md file, a SQLite DB β€” but GitHub Projects has two practical wins:

Why GitHub Projects Detail
Free No new infrastructure
gh CLI built into Claude Code The agent can read/write tickets without an MCP server
Visible to humans You can watch the run unfold in a browser, intervene, re-prioritize
Per-ticket history Each ticket carries its own audit trail

Eric uses six columns:

Column Meaning
queue Items pending β€” the orchestrator pops from here
testing Item currently in flight
done Spec passing
bug Test failed; needs the build worker
flaky Only works on retry
skip Out of scope

The orchestrator’s job each iteration is essentially: β€œpop one from queue, dispatch the right headless skill, move the result to the right column.” That’s it. The interesting work is in the skills.

Worked example: Super-QA + Super-Build cycling on an app

Eric’s walkthrough demonstrates the pattern with two headless skills orchestrated by a third super-orchestrator skill:

super-orchestrator  ─▢  super-QA      ─▢  finds bugs ─▢  GitHub bug column
        β–²                                                       β”‚
        β”‚                                                       β–Ό
        └─◀──  super-build  ◀──  fixes bugs  ◀──  GitHub bug column

super-QA (find bugs)

  • Traverses the app’s pages using breadth-first search
  • Visited set keeps it from re-testing pages it already covered
  • Writes Playwright end-to-end tests for each page
  • If a test fails β†’ opens a ticket in the bug column
  • If a page has child pages β†’ adds them to queue
  • Terminates the headless session, returns β€œfound N bugs”

super-build (fix bugs)

  • Reads from the bug column
  • Uses the Superpowers TDD framework (obra/superpowers): write failing test β†’ implement β†’ refactor β†’ verify
  • For non-obvious design decisions, invokes Gstack β€” Garry Tan’s auto-decision skill. (Note: Eric describes /autoplan as voting across CEO / engineer / security / designer / QA roles; the upstream Gstack /autoplan actually runs a CEO β†’ Design β†’ Eng β†’ DX chain with six auto-decision principles and dual Claude/Codex voices. Treat Eric’s framing as a useful mental model, but read the actual Gstack source before adopting.)
  • Terminates with β€œfixed M bugs”

The loop terminates when

There are no items left in queue AND no items left in bug. That’s the completion condition the orchestrator monitors β€” external to any single LLM conversation.

Why this is β€œAgentic Engineering” rather than β€œprompt engineering”

The Orchestrator + Headless pattern is a small case study in the Agentic Engineering primer’s five-layer framing:

Layer What this pattern does
Prompt Iteration prompts are short and structured (β€œhere’s ticket #N, do super-QA on it”)
Agent Two roles β€” orchestrator (long-lived, clean) + worker (short-lived, fresh). Memory lives in state, not context.
LLM Same model; the win is how it’s invoked, not which model
MCP Tools are normal β€” gh CLI, Playwright, file edits
Tools Workers spawn freely; no parent-context pollution

The diagnostic question shifts from β€œis my prompt good?” to β€œis my orchestrator’s context window staying small?”

Teaching Mode β€” for CS-310 students

A two-week classroom unit, paired with the What is Agentic Engineering? primer. Plan for heavy scaffolding β€” Claude Code setup, account/auth/token caps, a pre-wired GitHub project, and a mock target repo should all be provided. Six contact hours is tight if students are configuring Claude Code from scratch; the lab times below assume the starter materials are ready on Day 1.

Week 1 β€” /goal in isolation (~2 hr lab)

Activity Output
Read this entry; watch the Eric Tech video Students can articulate the context-wall problem in their own words
In pairs, run a short /goal task on a provided sample repo and observe the context-usage indicator as it runs A note describing what they saw β€” typically a steady climb in context % as iterations stack up

Week 2 β€” Refactor into Orchestrator + Headless (~4 hr lab)

Activity Output
Provided starter: an orchestrator skill that calls claude -p for each iteration, plus a pre-configured GitHub project with the six columns and 5-10 seed tickets Students wire up a single iteration end-to-end
Replace the in-context loop from Week 1 with the orchestrator pattern; re-run on the same sample repo The same task, now with a flatter orchestrator context line because the work happens in fresh claude -p sessions
1-page reflection: when is plain /goal actually fine? Forces the student to name the threshold (short, single-pass, low-stakes tasks) β€” not every problem needs the orchestrator pattern

Assessment

  • Practical: provide a buggy app + a target spec; the student must build an orchestrator that drives it to green
  • Conceptual: given a transcript with a β€œfalse complete,” identify which iteration drifted and why the context-wall caused it

How LearnAI Team Could Use This

  • Production-style autonomous-agent demonstrations β€” the orchestrator + headless pattern is a useful way to show β€œAI building software overnight” without students drawing the wrong lesson (that /goal alone is sufficient).
  • Onboarding senior students to long-running agent workflows β€” the orchestrator-vs-worker split is a common abstraction in modern AI-engineering practice and worth surfacing before students hit it in industry.
  • Security teaching (CS-336) β€” the false-completion failure mode has a security flavor: an evaluator that only reads the transcript can be misled if the transcript itself has been corrupted by context drift (intentional or not). Worth at least a 1-hour discussion.
  • Companion to existing entries β€” pair with Gstack (decision-voting), Harness Engineering (why the runtime layer matters), and Autoresearch (an earlier autonomous-loop pattern).

Real-World Use Cases

Scenario How to use the pattern
Overnight bug-fix sweep on a legacy module Orchestrator + super-QA + super-build; goal: β€œqueue is empty AND bug column is empty”
Migrating a UI component library Orchestrator drives one component per iteration; state in GitHub project; each iteration handled headlessly
Mass API documentation backfill Orchestrator iterates over endpoints from a state file; worker writes + verifies docs per endpoint
Long-form research synthesis Orchestrator iterates over a reading-list state file; worker reads + summarizes one paper at a time
Course-grading automation (LearnAI use case) Orchestrator iterates over student submissions; worker runs the rubric + writes a feedback artifact per student

Important things to know

  • /goal alone is fine for short, single-pass tasks. The point isn’t that /goal is broken β€” it’s that the context-wall failure mode is invisible until it bites you. Use /goal for short, bounded jobs where the entire run fits comfortably inside a fresh context window; reach for the orchestrator pattern when the task plausibly runs hour-scale or longer.
  • The evaluator is not a safety net. Anthropic’s Haiku evaluator helps, but it reads the transcript only β€” no tool calls, no file inspection. Don’t trust β€œcondition met” as ground truth on long runs; check the actual repo state.
  • Subagents are not a substitute. Subagent results flow back into the parent’s context, defeating the point. The headless claude -p invocation is what keeps the orchestrator clean.
  • State must live outside any LLM conversation. A .md file, a SQLite DB, or GitHub Projects β€” pick one and commit. The orchestrator should be able to crash and restart without losing progress.
  • Cost is real. Hour- and day-scale runs incur hour- and day-scale token bills. Set hard limits at the orchestrator layer (max iterations, max tokens), not inside /goal.
  • The orchestrator skill is where most of the engineering effort lands. Queue management, retry policy, the β€œis this iteration good enough to commit?” check, escalation to humans when stuck β€” all of these live here, not in the worker skills.
  • Companion deep-dives in this wiki: