AgentOps

Autonomous code validation for coding agents

Coding agents can produce plausible code that is still wrong. AgentOps helps answer the two questions that decide whether you can trust the work: is the code right, and is the agent output proven enough to grant more autonomy? It sits on top of the agent you already use (Claude Code, Codex, Cursor, OpenCode) and adds the validation membrane, evidence trail, and repo-local corpus that make that judgment repeatable.

See it work

_{/discovery → bead graph · /crank → sub-agents in waves · /validate --mixed → real Claude + Codex verdict. Live sessions. MP4}

AgentOps breaks intent into bounded slices, gives each a failing test and a write scope, and makes every phase boundary a gate that records evidence. The agent starts loaded with prior decisions and learnings instead of cold:

> /council --mixed validate this PR

[council] evidence sealed → 6 judges across Claude Code + Codex CLI
[claude/judge-1] WARN  rate limiting missing on /login
[codex/judge-1]  WARN  token bucket lacks jitter under burst
[claude/judge-2] PASS  redis integration follows pattern
Consensus: WARN, fix /login limit + refill jitter before shipping
Recorded → .agents/council/<run-id>/verdict.md

What you get

The center is validation: prove the agent output, keep the proof, and use that record to decide how much autonomy the next run earns. The supporting layers all stay local in .agents/ (no telemetry, no hosted control plane):

Layer	The problem	What AgentOps adds
Validation membrane	agent output can look correct while being wrong	tests, local gates, `/pre-mortem`, `/vibe`, `/council`, and pawl verdicts prove or reject the work
Evidence trail	"looks good" does not survive handoff	`.agents/` captures runs, decisions, findings, citations, verdicts, retros, and closeout proof
Context compiler	validators and implementers start cold	`ao context assemble` builds phase-scoped packets; `ao lookup` retrieves decay-ranked knowledge
Knowledge ratchet	lessons vanish between sessions	`/forge` mines learnings, `/evolve` reconciles, and durable lessons become constraints before more autonomy is granted

The corpus is an LLM wiki of markdown. Agents read it natively and write to it as they work, so it maintains itself instead of becoming another doc you keep up by hand. Why that beats Notion or Confluence: docs/wiki-for-agents.md. The full theory (context as the lifecycle, the CDLC): docs/cdlc.md.

Install

Pick your runtime, then type /quickstart in the agent.

# Claude Code
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

# Codex CLI (macOS/Linux/WSL).  OpenCode: install-opencode.sh
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash
# Codex CLI (Windows):
irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.ps1 | iex

# Gemini / Antigravity
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-agy.sh | bash

# Other skills-compatible agents
npx skills@latest add boshu2/agentops --cursor -g

The ao CLI is optional but recommended (bookkeeping, retrieval, health, the loops):

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops   # macOS
# Windows: irm https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-ao.ps1 | iex
# Or release binaries / build from source (cli/README.md).

Installs hookless: skills and the ao CLI guide the workflow, and the local cockpit gate is the release authority. GitHub Actions are an optional/manual backstop, not the routine shipping path. The only hard requirement is an agent runtime and git; everything else degrades gracefully. Full dependency matrix: docs/dependencies.md. Day-2 install, update, backup, permission, recovery, and escalation paths are in docs/install-day2-ops.md.

Quick start

You want to…	Run	Done when
set up a repo	`ao quick-start`, then `/quickstart`	AgentOps reports readiness and a next action
ship one validated change	`/rpi "a small goal"`	discovery, build, validation, and learnings all leave evidence in `.agents/`
review something now	`/council validate this PR` · `/vibe recent`	a consolidated verdict and a record before you ship

Already installed? Ask your agent: /quickstart. Or run ao doctor and ao demo. First-session walkthrough: docs/first-value-path.md.

Skills

Every skill works alone; flows compose them. Full catalog: docs/SKILLS.md, unsure where to start? Skill Router.

Skill	Use it when
`/quickstart`	you want the fastest setup check and next action
`/research`	you need codebase context and prior learnings before changing code
`/pre-mortem`	you want to pressure-test a plan before building
`/rpi`	you want discovery, build, validation, and bookkeeping in one flow
`/council`	you want independent judges (optionally Claude and Codex) to return one verdict
`/validate`	you want a code-quality and risk review before shipping
`/evolve`	a goal-driven improvement loop that compounds knowledge without mutating source

The `ao` CLI

Repo-native control plane behind the skills. Full reference: CLI commands.

ao quick-start            # set up AgentOps in a repo
ao search "query"         # search history and local knowledge
ao lookup --query "topic" # retrieve curated learnings
ao context assemble       # build a task briefing
ao rpi phased "fix X"     # run the phased loop from the terminal
ao compile                # rebuild the corpus
ao metrics health         # flywheel health

In session vs. out of session. The whole loop runs in a plain session: no daemon, no scheduler, no cloud (the sovereignty floor). For always-on work, the same loop opts into a swappable substrate (an NTM tmux swarm, MCP via ao mcp serve, or managed-agents) that dispatches a whole ao rpi per ready bead. Details: docs/3.0.md; component routing: docs/architecture/component-map.md.

Honest limitations

It doesn't write code. It wraps Claude Code / Codex / Cursor / OpenCode with bookkeeping, gates, and a corpus; the harness still writes it.
No hosted control plane or telemetry. Everything lives in your repo; there's no cross-team dashboard unless you commit .agents/.
Multi-model councils cost tokens. Six judges per PR isn't free; running them on a substrate makes the cost predictable, not zero.
The corpus needs hygiene. ao defrag and ao maturity keep it healthy; neglected, it rots like any markdown vault.
There are many skills. /quickstart and the Skill Router exist so you don't have to learn them all up front; current inventory is generated from skills/**/SKILL.md.

What if the labs ship this natively? They will. The durable value is the .agents/ corpus you build, not the tool that builds it: plain markdown in your repo, it carries forward to whatever ships next, stays forkable, and is Apache-2.0 with no lock-in.

Docs & contributing

What 3.1 adds · What 3.0 is · component map · docs index · newcomer guide · architecture · FAQ · built on the 12-factor doctrine.

Contributing: docs/CONTRIBUTING.md (agents: read AGENTS.md, track work with br). License: Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3,543 Commits
.agents		.agents
.agy-plugin		.agy-plugin
.claude-plugin		.claude-plugin
.claude		.claude
.codex-plugin		.codex-plugin
.codex		.codex
.githooks		.githooks
.github		.github
.opencode		.opencode
agents		agents
bin		bin
cli		cli
deploy		deploy
docs		docs
evals		evals
evidence		evidence
examples/schedules		examples/schedules
homebrew-tap		homebrew-tap
images		images
lib		lib
plugins		plugins
schemas		schemas
scripts		scripts
skills-codex-overrides		skills-codex-overrides
skills-codex		skills-codex
skills		skills
spec		spec
tests		tests
.codecov.yml		.codecov.yml
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.goreleaser.yml		.goreleaser.yml
.markdownlint.json		.markdownlint.json
AGENTS-CI.md		AGENTS-CI.md
AGENTS-CODEX.md		AGENTS-CODEX.md
AGENTS-RUNTIME.md		AGENTS-RUNTIME.md
AGENTS-WORKFLOW.md		AGENTS-WORKFLOW.md
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
GOALS.md		GOALS.md
LICENSE		LICENSE
MEMORY.md		MEMORY.md
Makefile		Makefile
PRACTICE-REGISTRY.md		PRACTICE-REGISTRY.md
PRE-REDUCTION-SNAPSHOT.md		PRE-REDUCTION-SNAPSHOT.md
PRODUCT.md		PRODUCT.md
PROGRAM.md		PROGRAM.md
README.md		README.md
REDUCTION.md		REDUCTION.md
SYSTEM.md		SYSTEM.md
goals-affects-files.yaml		goals-affects-files.yaml
mkdocs.yml		mkdocs.yml
registry.json		registry.json
renovate.json		renovate.json
requirements-docs.txt		requirements-docs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOps

Autonomous code validation for coding agents

See it work

What you get

Install

Quick start

Skills

The `ao` CLI

Honest limitations

Docs & contributing

About

Uh oh!

Releases 91

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentOps

Autonomous code validation for coding agents

See it work

What you get

Install

Quick start

Skills

The ao CLI

Honest limitations

Docs & contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 91

Uh oh!

Contributors

Uh oh!

Languages

The `ao` CLI