What is Agentic Engineering? A Teaching Primer

Agentic engineering is an emerging applied subarea at the intersection of software engineering, AI/ML engineering, and reliability practice for LLM-mediated systems — software systems where a large language model is one component of a larger goal-directed loop with planning, memory, tool use, and verification. The discipline organizes around five interacting layers — Prompt (提示词) · Agent (智能体) · LLM (大模型) · MCP · Tools (工具) — and the question every practitioner keeps asking is: “when something fails, which layer is responsible?” This entry is a teaching primer for faculty preparing CS-310 / CS-336 / AI-literacy lectures: it defines the discipline, names the five components, and lays out a 4-week course module.

*Companion entry — same 5-component framework with a vocabulary lens: AI Agent Primer — The Vocabulary Ladder and 18-Step Workflow

Live interactive animation (lecture-ready, single page): agentic-engineering-flow*

This entry is the “engineering” lens. Vocabulary definitions live in the companion primer above. This entry assumes the vocabulary and asks the next question: how do you actually engineer, instrument, and debug one of these systems?

Why “Agentic Engineering” deserves a separate name

Three older labels almost cover the work, but each was scoped before LLM-mediated systems became routine:

Older label	Where it overlaps	Where agentic engineering pushes further
Prompt engineering	Crafting the instruction sent to the LLM	Adds planning, memory, tool selection, recovery — across many model calls per task
Software engineering	Code, testing, deployment, observability	Adds first-class handling of non-deterministic model behavior inside the system
ML engineering / MLOps	Training, serving, evaluation, monitoring	Treats the trained model as fixed and engineers the runtime orchestration around it

Agentic engineering sits where all three meet: you accept the LLM as a fixed-but-stochastic component and engineer the surrounding system — prompts, planning, memory, tool selection, verification, error recovery — so the whole thing behaves reliably enough to ship.

The five-component framework (one-page reference)

A compact decomposition used widely in popular Chinese-language AI explainers and consistent with how production agent stacks (Claude Code, Codex, LangChain, LlamaIndex) actually organize their code. The five components, English-first with the original Chinese terms preserved:

#	Component	One-line role	Key engineering question
①	Prompt (提示词)	The instruction layer — role, task framing, output shape	Are your prompts versioned, reproducible, and traceable to behavior changes?
②	Agent (智能体)	Decision/dispatch layer — planning, memory, tool selection, verification	Is the plan inspectable? When the agent picks the wrong tool, can you tell why?
③	LLM (大模型)	Cognition layer — reasoning, classification, generation	When you swap the LLM (Claude → GPT, etc.), what breaks that shouldn’t be model-specific?
④	MCP	Connection layer — a protocol boundary between agent/runtime and external tools/data	Are tool boundaries explicit and least-privilege, or did you grant broad access “just to get it working”?
⑤	Tools (工具)	Action layer — web search, code execution, DB query, API calls	When a tool call fails, does the agent recover, retry intelligently, or hand the raw error to the user?

One-line summary, paraphrased from the popular explainer this framework comes from: Prompts spark intent; the LLM provides cognition; the Agent decides and dispatches; MCP standardizes the connection; tools execute in reality — the five together form the loop that lets AI both think and act.

For per-component definitions, examples, and deeper history, see the Vocabulary Primer. The rest of this entry assumes the words and pivots to engineering practice.

A more accurate picture of MCP than most explainers give

The compact framework above flattens MCP to a single “protocol layer” — useful for teaching, but worth refining once students are past the first lesson. Official MCP architecture has three roles (host / client / server) communicating over JSON-RPC; servers expose tools, resources, and prompts, and clients discover and invoke them. A few caveats most explainers skip:

MCP makes exposed operations explicit, but it does not by itself guarantee least privilege — official Anthropic docs recommend human-in-the-loop approval and warn that tool annotations are untrusted. Treat MCP as a transport / discovery convention, not a sandbox.
“Any MCP-compatible agent can use any MCP-compatible tool” is aspirational, not absolute. Compatibility depends on protocol version, transport, auth, declared capabilities, schemas, and permissions. Production deployments still bind specific clients to specific servers.
Other ecosystems achieve the same role differently — OpenAI’s function calling, custom RPC, LangChain tool interfaces. The layer is universal; MCP is one (currently dominant in the Anthropic/Claude orbit) implementation.

Layer-attribution — the diagnostic question

Once students have the framework, the highest-leverage skill is layer-attribution: when something goes wrong, name which layer (or layer boundary) is responsible. A worked diagnostic table:

Observed symptom	Likely failing layer(s)	What to check
Agent confidently asserts false facts	LLM + Prompt	Did the prompt require grounding via a tool? Did the agent skip the search step?
Agent picked the wrong tool	Agent / LLM / tool-schema boundary	Inspect the plan; check the tool descriptions / schemas the LLM was given
Tool returned an error and agent ignored it	Agent runtime (no retry / verification path)	Is there a fallback? Does the agent observe the tool’s return?
Same tool call hangs forever	Agent runtime (timeout enforcement) + MCP / Tool	Did the agent set/enforce a timeout? Is the MCP server healthy?
Agent forgets earlier turns	Agent memory / context management	Is the context window full? Is memory written/read deliberately or by accident?
Cost is 10× what you expected	Agent runtime (loops/retries) + Prompt + LLM routing	Look for retry loops, full-context memory injection, retrieval over-fetching, and unnecessary high-tier model calls
Two students get totally different outputs from the same prompt	LLM non-determinism	Set temperature; cache prompts; check for randomized tool ordering

This table is the centerpiece of an “Agentic Engineering 101” lab — students reproduce each symptom, diagnose the layer, fix it, and write up what they learned.

Teaching Mode — A 4-Week Module for CS-310

This is the bit that makes this entry different from a vocabulary glossary: a concrete syllabus block you can paste into an existing course. Aimed at upper-level undergrads who can program but haven’t worked with LLMs. Plan for 6-10 instructor hours of prep the first time you teach it, and budget provide-a-starter-repo for Weeks 2-4 so students aren’t building MCP servers from scratch.

Week 1 — Vocabulary + first agent demo (1-2 hr)

Activity	Output
Read this entry + the Vocabulary Primer	Students can name the five layers and one example tool each
Watch one agent run live (Claude Code on a small repo) and pause at every layer transition	Class-produced annotated transcript marking each layer crossing

Week 2 — Build a minimal agent (~4 hr lab; provide a starter repo)

Activity	Output
Using a provided starter (Python; OpenAI/Anthropic SDK + a mock “web-search” tool so students aren’t fighting API keys) implement: (a) prompt, (b) LLM call, (c) one tool call, (d) one synthesis step	A working ~100-line minimal agent that exercises layers ①②③⑤
Annotate which lines implement which of the five layers	Annotated diagram of student’s own code

(MCP is optional in Week 2 — the starter exposes the tool through a simple Python interface. Students who want to plug into a real MCP server can do that as extension work.)

Week 3 — Instrument it (debugging) (~3 hr lab + 1 hr lecture)

Activity	Output
Add logging to every layer crossing the student’s code actually has (typically: prompt → LLM, LLM → tool, tool → result, result → LLM-synthesize)	Students see their agent’s internal trace, not just its final output
Reproduce three failure modes from the diagnostic table; diagnose each	Three short writeups: symptom → suspected layer → evidence → fix

Week 4 — Harden it (~3 hr lab)

Activity	Output
Add: retry on tool failure, agent-side timeout/cancellation, verification step before returning to user, cost cap on LLM calls	An agent that survives realistic real-world conditions
Final reflection: which engineering discipline does this remind you of most — distributed systems? embedded? front-end?	A 1-page essay placing agentic engineering in the existing engineering landscape

Assessment ideas

Pop quiz (10 min): given a transcript of an agent run, mark which step belongs to which layer
Lab grading rubric: did the student instrument every layer crossing their code actually has? Can they explain why a specific log line implies a specific layer is at fault?
Term project: pick a domain (academic-paper search, code review, …) and engineer an agent for it. Submit: the agent, the trace logs of three different queries, and a layer-by-layer reflection.

How LearnAI Team Could Use This

Primary fit: CS-310 (Advanced OO Design) — the 4-week module slots into CS-310 cleanly as a system-design case study. The five components map to “separation of concerns” patterns students already know.
CS-336 (Program Analysis for Security) only with a security framing. This module fits CS-336 if you frame the lab around agent-and-tool security: prompt injection, taint / information-flow through tool boundaries, tool sandboxing, MCP permission models, and what happens when an LLM is convinced to misuse a tool. Don’t drop the generic version into CS-336 — re-shape it around security first.
Faculty-development workshop — run the 4-week module in compressed form (one 3-hour workshop) for colleagues who want to understand what their students are now using.
Diagnostic framework for student help requests — when a student says “my AI thing isn’t working,” use the diagnostic table as the triage flow before going deeper. Cuts office hours significantly.
Cross-discipline outreach — the five-component framework is language-light enough that non-CS faculty (bio, finance, design) can use it to explain why their colleagues are hiring agents instead of interns.

Real-World Use Cases

Scenario	Description
Onboarding new engineers	First reading before they touch the production agent stack — establishes shared vocabulary
Failure-mode review	When a deployed agent misbehaves, classify the failure by layer before assigning a fix owner
Tool evaluation	When a new “agent platform” launches, decompose its pitch into the five layers — what’s actually novel vs. rebranded
Vendor selection	When deciding between LangChain vs. Claude Code vs. an in-house stack, evaluate per-layer support
Curriculum design	Use the 5-layer × 4-week module as the agentic-systems unit in an upper-level CS course
Stakeholder briefing	When asked “what is an agent?” by a dean or industry partner, the framework covers most of the answer in 10 minutes

Important things to know

The five-component framework is a teaching artifact, not a formal architecture. Real agent frameworks (LangChain, LlamaIndex, Claude Code, Codex) implement these roles differently and sometimes collapse layers — Claude Code, for instance, blurs Agent and LLM behavior in places. The five components are a mental map useful for teaching and diagnosis; treat any production system as your authoritative source.
The Agent runtime carries more responsibility than most diagrams suggest. When you see “Tool hung” or “cost 10×,” the failure is often at the agent runtime / orchestrator layer (timeouts, retry loops, context management), not at the Tool or LLM. Teach the runtime as a first-class layer.
“MCP” is one specific protocol (Anthropic’s Model Context Protocol). Other agent ecosystems use other connection layers. The role it plays — standardized agent/tool bus — exists in every system; the implementation varies.
Skills are not one of the five components — they’re a popular implementation pattern within the Prompt + Agent layers (small, file-based, agent-loadable procedures). Treat Skills as a companion concept, not a sixth layer.
Companion deep-dives in this wiki:
- AI Agent Primer — Vocabulary Ladder and 18-Step Workflow — the vocabulary counterpart to this engineering primer
- Harness Engineering — The Real Bottleneck Isn’t the Model — the agent-runtime / orchestrator layer in depth
- Agents Need Control Flow — argument that the Agent layer should be code, not just LLM-prompted decisions
- Agentic AI Engineer Roadmap 2026 — Eight Pillars — career-development view on the same discipline
- How Anthropic Uses Skills — Thariq’s Framework — Skills as a Prompt-layer implementation pattern
- grill-me — When AI Interviews You Before Writing Code — a Prompt-layer pattern that improves Agent-layer planning