Quality gate for coding agents

Your agent will ship broken code.

✗ Blocked

spec-agent is the contract that won't let it. A deterministic gate, project rules it actually applies, and durable learning — for Claude Code, Copilot, Cursor & Codex. It doesn't replace your agent. It tells it no.

Get started Read the source

$ npx @marcusbarcelos/spec-agent init --id my-project --agents claude

watch it block a real bug

Deterministic gate 30-second runnable demo Open, tamper-isolated benchmark MIT · runs inside your agent

01 — Proof

The agent says done. The gate says not yet.

A deterministic checkpoint between your agent finishing and the code landing — not a prompt. Watch it block a real bug, hand it back, and pass only once it's fixed.

Your project's rule — each whitespace becomes one dash

- const slug = name.trim().replace(/\s+/g, "-")   // "a  b" → "a-b"  ✗ collapses runs

The gate runs before the agent can finish

SPEC-AGENT VERDICT: BLOCKED — slug whitespace invariant · finish blocked

The agent re-runs and corrects it

+ const slug = name.trim().replace(/\s/g, "-")    // "a  b" → "a--b"  ✓

SPEC-AGENT VERDICT: PASSED — done means verified

Without spec-agent, that first diff ships — the agent thought it was done. Run it yourself in 30s: examples/idempotency-demo.

Why a gate beats a better prompt →

The thesis

AI agents are fast. They are also confidently wrong.— an agent without a gate is an overconfident junior

When the base model is already capable, prompting alone barely moves the needle. The leverage is governance — block work that's objectively wrong, and teach the model your invariants so it applies them instead of re-deriving them wrong.

spec-agent is the quality contract between your repository and your coding agent. The automatic tech lead that says "No. This doesn't pass."

02 — Inside

A few parts. Each for a distinct failure.

Verification gate

A deterministic Stop-hook that blocks "done" while lint, typecheck or tests fail on what was touched. Catches the error the model can't see in itself.

Core · clearest ROI

Project learning & skill-forge

Turns a recurring mistake into an imperative, reusable rule the model actually applies — where, without it, it knows the rule and breaks it anyway.

The differentiator

Context-economy

Token discipline in prompt, tool input and output. A code graph instead of re-reading files.

Core

Agent-council

Multi-perspective review reserved for high-risk, ambiguous calls — DB migrations, contract breaks, security, architecture. Not for everyday turns.

Advanced

See it run in your CI →

03 — CI / PR

Make done enforceable, not a vibe.

spec-agent verify runs your gate and prints a verdict that reads like a code review — green in CI, blocking on a PR, non-zero exit when blocked.

# .github/workflows/spec-agent.yml
- run: npx @marcusbarcelos/spec-agent verify

SPEC-AGENT VERDICT: BLOCKED

  ✗ domain contract  node --test
      idempotency invariant failed: duplicate ledger entry for sale-1

Blocked: 1 check(s) failed. Fix and re-run.
Run summary: 1 check · 1 blocked · 0 passed · 127ms

The checks live in .spec/manifest.yaml — your tests, lint, typecheck, any command. This is where spec-agent stops being an assistant and becomes the guardian of the repo.

Runnable demo: examples/idempotency-demo — a commission ledger that must be idempotent by sale_id.

04 — Evidence

A reproducible method, and a first honest signal.

A small, tamper-isolated benchmark — the agent never sees the checkers. A method plus a first signal, not proof. Small N, stated openly.

finding	signal
The gate is the win	Recovered an objective failure the model shipped — targeted tasks went 80% → 100% via the fix-loop. Prompt rules alone moved ~0 on a capable model.
Durable knowledge changes behavior	A learned project rule flipped a wrong answer to right — the model knew the rule and violated it without the skill.
Council's niche is calibration	On ambiguous-but-sound trade-offs it never false-blocked (0/4), where a single pass did. Real, but narrow.

Full method & caveats: RESULTS · SKILLFORGE · COUNCIL. Small N — indicative, not statistical proof.

Will it work with my agent? →

05 — Reach

Works with your agent. An honest loss-model.

Full harness on Claude Code; on other agents it runs degraded-but-functional, with the gaps written down in the manifest's loss_report.

capability	Claude Code	other agents
verification gate	Stop hook	git pre-commit / CI
durable learning	native skills + memory	`.spec/learning/`
multi-agent (council)	native subagents	single-thread simulation
code graph	graphify CLI	graphify CLI

Agent-specific enhancers (claude-mem, superpowers, rtk, graphify) are optional — never dependencies.

06 — Start

Three commands.

# scaffold .spec/ + adapters into the current repo
npx @marcusbarcelos/spec-agent init --id my-project --agents claude,agents-md

# re-project adapters when the engine evolves (never touches your durable state)
npx @marcusbarcelos/spec-agent sync

# run the gate yourself (CI / PR) — verdict + non-zero exit if blocked
npx @marcusbarcelos/spec-agent verify

Requires Node ≥ 20. The harness runs inside the coding agent you already use.

07 — Author

Marcus Vinicius Barcelos

Senior Software Engineer

Senior software engineer — degree in Systems Analysis & Development, postgraduate in Applied AI Engineering. spec-agent was distilled and sanitized from a governance harness built for a real production codebase, not a whiteboard.

GitHub ↗npm ↗