SpecOps β€” Spec-Driven Development with AI Coding Agents

SpecOps β€” Spec-Driven Development with AI Coding Agents

AI coding agents are fast. They’re also reckless. Hand one a vague prompt and it will generate 500 lines of code built on silent assumptions, skip edge cases, and create something that looks right but doesn’t match what you actually needed. SpecOps fixes this by forcing a mandatory spec-first workflow: understand the codebase, write detailed specifications, implement against those specs, then verify with adversarial evaluation. Every spec is Git-tracked markdown β€” no cloud accounts, no vendor lock-in, persistent across sessions and tools.

*Source: GitHub β€” sanmak/specops (MIT, v1.8.0) DeepLearning.AI β€” Spec-Driven Development with Coding Agents 爱可可-ηˆ±η”Ÿζ΄» Weibo post*

Why Spec-First Beats Code-First

The core problem: when you tell an AI agent β€œadd user authentication,” it makes dozens of invisible decisions β€” OAuth vs. JWT, session storage, token refresh strategy, error handling β€” without telling you. You discover the mismatch after 400 lines are already written.

Spec-first flips this. The agent writes a requirements doc before touching any code. You review the spec (5 minutes), catch misalignment early, and the implementation follows a verified plan. This is the difference between β€œprompt and pray” and disciplined engineering.

Approach What Happens Failure Mode
Vibe coding Prompt β†’ Agent codes β†’ You review output Silent assumptions, scope creep, rework loops
Spec-driven Prompt β†’ Agent writes spec β†’ You review spec β†’ Agent codes β†’ Adversarial verify Misalignment caught at spec stage, not after implementation

The DeepLearning.AI course (taught by Paul Everitt of JetBrains, 1h20m, free) frames it well: β€œVibe coding is fast, but it often produces code that doesn’t match what you asked for.”

The 4-Phase Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  UNDERSTAND  │───>β”‚    SPEC      │───>β”‚  IMPLEMENT   │───>β”‚  COMPLETE    β”‚
β”‚              β”‚    β”‚              β”‚    β”‚              β”‚    β”‚              β”‚
β”‚ Analyze code β”‚    β”‚ requirements β”‚    β”‚ Code per     β”‚    β”‚ Adversarial  β”‚
β”‚ Map domain   β”‚    β”‚ design.md    β”‚    β”‚ task list    β”‚    β”‚ evaluation   β”‚
β”‚ Load context β”‚    β”‚ tasks.md     β”‚    β”‚ Follow spec  β”‚    β”‚ Drift check  β”‚
β”‚              β”‚    β”‚ EARS format  β”‚    β”‚ Dep. govern. β”‚    β”‚ Verify pass  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      Phase 1            Phase 2            Phase 3            Phase 4

Phase 1: Understand

The agent scans your codebase, identifies architecture patterns, maps domain context, and loads any previous specs or production learnings. No code is written yet.

Phase 2: Spec

Three Git-tracked markdown files are generated:

  • requirements.md β€” Uses EARS notation (WHEN [event] THE SYSTEM SHALL [behavior]) for precise, testable criteria
  • design.md β€” Architecture decisions, component interactions, constraints
  • tasks.md β€” Ordered implementation steps with acceptance criteria

An adversarial evaluator scores the spec against hard quality thresholds before proceeding. The evaluator is structurally separated from the spec author β€” the agent can’t self-validate.

Phase 3: Implement

Code follows the task list. Every new dependency must pass a 5-criteria governance check:

  1. Scope match (does this dependency solve the actual need?)
  2. Maintenance health (active maintainers, recent commits?)
  3. Size proportionality (not pulling in 50MB for one function?)
  4. Security surface (known CVEs, attack vectors?)
  5. License compatibility

No bypass β€” always enforced.

Phase 4: Complete

Adversarial evaluation again, this time scoring implementation against the spec. Five automated drift-detection checks ensure the code matches what was specified. If drift is found, reconciliation is triggered before completion.

Installation

One-line install (any platform):

bash <(curl -fsSL https://raw.githubusercontent.com/sanmak/specops/main/scripts/remote-install.sh)

From source:

git clone https://github.com/sanmak/specops.git && cd specops && bash setup.sh

Claude Code marketplace:

/plugin marketplace add sanmak/specops
/plugin install specops@specops-marketplace
/reload-plugins

Usage

Create a spec for a new feature:

/specops Add user authentication with OAuth

Capture production learnings:

/specops learn batch-processing
β†’ Learning: "Concurrent writes above 500 connections degrade P99"
β†’ Prevention: "Design docs must include concurrency limits"

Multi-spec initiative (large features):

/specops initiative oauth-payments
β†’ Detects 2 bounded contexts (auth, payments)
β†’ Creates Spec 1: oauth-authentication (wave 1)
β†’ Creates Spec 2: payment-processing (wave 2, depends on spec 1)
β†’ Creates Initiative: oauth-payments (orchestrates both)

Platform Support

Platform Trigger
Claude Code /specops [description]
Cursor Use specops to [description]
GitHub Copilot Use specops to [description]
OpenAI Codex Use specops to [description]

One spec, portable across all your AI coding tools. The .specops/ directory is the single source of truth.

Configuration

Create .specops.json in your project root:

{
  "specsDir": ".specops",
  "vertical": "backend",
  "team": {
    "conventions": ["Use TypeScript", "Write tests for business logic"],
    "reviewRequired": true
  }
}

Domain Verticals

SpecOps ships seven templates tuned to different project types:

Vertical What It Adds to Specs
Backend API contracts, data models, error handling
Frontend Component hierarchy, state management, accessibility
Infrastructure Rollback steps, resource definitions, cost estimates
Data Pipeline Data contracts, backfill strategy, schema evolution
Library/SDK Public API surface, versioning, backward compatibility
Fullstack End-to-end flows, API-to-UI mapping
Builder Plugin architecture, extensibility points

Key Differentiators

Production Learning Loop

Most tools stop at implementation. SpecOps closes the feedback loop: after deployment, you capture what actually happened in production, and future specs touching the same files automatically load those learnings.

Deploy β†’ Discover issue β†’ /specops learn β†’ Stored as prevention rule
                                              ↓
                              Future specs auto-load this context

This is the only tool in the spec-driven ecosystem that does this.

Adversarial Evaluation

The evaluator is structurally separated from the spec writer. This matters because LLMs are notoriously bad at evaluating their own output β€” they’ll rate their work highly regardless. SpecOps uses a separate evaluation pass with independent scoring criteria and hard fail thresholds.

Cross-Session Persistence

Specs persist in Git. When you start a new session tomorrow, the agent recovers full context from .specops/ β€” no re-explaining what you’re building, no context loss, no β€œlet me re-read the codebase.”

Spec-Driven Development Ecosystem (2026)

SpecOps isn’t the only player. The broader ecosystem is maturing fast:

Tool Approach Best For
SpecOps Living specs + production feedback loop Teams wanting end-to-end spec lifecycle
GitHub SpecKit GitHub-native spec workflow Teams already deep in GitHub ecosystem
Kiro (AWS) Agent-driven spec generation AWS-centric teams
BMAD-METHOD Multi-agent spec decomposition Complex multi-team projects
Cursor + .cursorrules Static spec via rules files Solo developers, quick setup

How LearnAI Team Could Use This

  1. Course project scaffolding β€” Students describe a feature in plain English. SpecOps generates the requirements, design, and task list. Students learn to read and critique specs before writing code β€” a skill more valuable than coding itself.

  2. Research prototype discipline β€” When building research tools (type checkers, program analyzers), spec-first prevents the classic β€œI built something but can’t explain what it does” problem. The spec is the explanation.

  3. Assignment design β€” Use SpecOps to generate spec templates for programming assignments. Students implement against the spec and are graded against the acceptance criteria. Consistent, fair, automated.

  4. AI coding literacy β€” Teach students the difference between vibe coding and spec-driven development. This is the meta-skill: knowing how to direct AI agents, not just that they can code.

  5. Cross-session lab continuity β€” Students working on multi-week projects lose context between lab sessions. Git-tracked specs solve this β€” the agent picks up exactly where the student left off.

Real-World Use Cases

  • Startup MVP development β€” Founder describes the product. SpecOps generates specs for each feature. Non-technical founder can review requirements in plain English before any code is written. Prevents the β€œI paid $50k and got the wrong thing” scenario.

  • Legacy codebase modernization β€” The DeepLearning.AI course specifically covers this: use the Understand phase to map an existing codebase, generate specs for modernization work, then implement incrementally with verification at each step.

  • Multi-team feature coordination β€” The multi-spec initiative feature decomposes a large feature into bounded-context specs with dependency tracking. Team A builds auth (wave 1), Team B builds payments (wave 2, depends on wave 1). The initiative orchestrates both.

  • Compliance-driven development β€” Industries requiring audit trails (finance, healthcare) get Git-tracked requirements β†’ design β†’ implementation β†’ verification chains for free. Every decision is documented, every spec is versioned.

  • Open source contribution onboarding β€” New contributors run /specops to understand the codebase, then generate specs for their proposed changes. Maintainers review the spec (fast) before the contributor writes code (slow). Catches misalignment early.

The Deeper Point

Spec-driven development isn’t about slowing AI down β€” it’s about making the 5 minutes you spend reviewing a spec save the 2 hours you’d spend debugging wrong assumptions in code. The agents are fast enough. The bottleneck is alignment between what you want and what the agent builds. SpecOps attacks that bottleneck directly.

Resources