Skip to content

Latest commit

 

History

History
329 lines (229 loc) · 20 KB

File metadata and controls

329 lines (229 loc) · 20 KB

Context Development Life Cycle

TL;DR: AgentOps is an SDLC control plane for agentic software development. Its internal mechanism is the Context Development Life Cycle (CDLC): every phase of software delivery has a context counterpart, and every high-value context token follows the Context Density Rule: carry intent, boundary, evidence, decision, constraint, or next action.

Software engineering took 50 years to build the discipline that turned indeterministic teams into shippable software. AgentOps keeps the public category language people already understand - SDLC, DevOps, CI/CD, tests, review, release gates - and applies that same shape to context.

Packets, briefings, skills, verdicts, and learnings are artifacts. The deeper product is the practice layer: BDD/Gherkin, DDD, hexagonal architecture, TDD, CI/CD, SRE, ADRs, wikis, Agile/XP, and pragmatic engineering encoded into the runtime structure agents work inside.

The translation is direct. Each piece of the software-engineering stack has a coding-agent counterpart:

Software Engineering Coding-Agent World
Source code Context (corpus, planning rules, learnings)
SDLC Context Development Life Cycle
Libraries (Maven, npm, crates.io) Context libraries (the .agents/ corpus)
Compilers Context compilers (ao compile → wiki)
Code review Multi-model councils
CI/CD Validation gates (/vibe, /pre-mortem)
Postmortems Automated postmortems (/post-mortem → learnings)
Runbooks Skills + planning rules
Software factories The in-session loop (ao rpi, /evolve) run out of session on an orchestration substrate (reference: NTM + MCP + managed-agents)
Markdown / Git / Linux (open primitives) LLM Wiki of Markdown
Open-source corpus Your private corpus (.agents/ in your repo)

We call the internal lifecycle the Context Development Life Cycle (CDLC). You do not need to know that name to understand the product: AgentOps is the SDLC control plane, and CDLC is how it compiles context for agent work.

Companion docs

  • Operating loop — the operational discipline that runs inside these phases: BDD intent → vertical slices → conflict-free wave → bead acceptance → evidence. The CDLC describes the seven phases of context engineering; the operating loop describes how an agent actually executes work through them.
  • Wiki for agents — what .agents/ actually is and why agents can read it natively
  • Trust factory — how the validation gates and councils make agent output trustworthy

The Parallel

In 2009, DevOps asked: what if ops looked more like dev? The answer was CI/CD, infrastructure as code, and the SDLC infinity loop — Plan, Code, Build, Test, Release, Deploy, Operate, Monitor.

Inside an agentic SDLC, the CDLC asks the same question about context: what if the instructions, knowledge, and constraints we feed to coding agents were engineered with the same rigor as the code they produce?

The answer is the same shape. Different substrate.

     SDLC (code)                    CDLC (context)
    ┌──────────┐                   ┌──────────┐
    │   Plan   │                   │ Generate │
    │   Code   │                   │ Compile  │
    │  Build   │                   │   Test   │
    │   Test   │                   │Distribute│
    │ Release  │                   │ Deliver  │
    │  Deploy  │                   │ Observe  │
    │ Operate  │                   │  Adapt   │
    │ Monitor  │                   │          │
    └──────────┘                   └──────────┘
         ↕                              ↕
    infinity loop                  infinity loop

The SDLC produces deployable artifacts. The CDLC produces injectable context for the agents doing that work. Both compound through feedback loops. Both degrade without discipline.


The Narrow Waist

The CDLC has a narrow waist because LLM agents do not have infinite context:

historical software-engineering practice
        ↓
agent-context-limited constraint
        ↓
small verifiable slices
        ↓
dense intent + executable evidence
        ↓
less rediscovery, less drift, less hallucinated done

Four practices carry the highest density:

Practice CDLC role
BDD / Gherkin States what behavior matters in observable terms
DDD Gives humans and agents shared names, aggregates, and bounded contexts
Hexagonal architecture Keeps tools, model runtimes, and vendor adapters outside the core loop
TDD Gives the agent an executable local done condition

Everything else plugs into that waist. CI/CD runs the proof repeatedly. SRE/DORA measures health. ADRs and provenance explain why decisions happened. Wikis and ratchets keep knowledge durable. Agile/XP keeps work in small vertical increments. Pragmatic engineering keeps the slice evidence-bearing and reversible.

The density invariant has a domain name: Context Density Rule. The domain entry lives at skills/domain/references/context-density-rule.md.

That is why waterfall is the wrong shape here. It spends context on large speculative artifacts before proof exists. CDLC prefers atomic process: one behavior, one bounded context, one first failing test, one write scope, one acceptance proof, and one learning only when it changes future behavior.


The Seven Phases

1. Generate

Create the context that agents will consume. Prompts, skills, instructions, specifications.

SDLC parallel Plan + Code
What it means Author skills, write agent.md instructions, pull documentation, create specs
Why it matters Context that isn't created doesn't exist. Agents start from zero without it.

AgentOps implementation:

  • /research — investigate before writing context
  • /plan — decompose goals into structured implementation specs
  • SKILL.md authoring — reusable context packages with triggers, steps, and output contracts
  • ao context assemble — request skill- or phase-scoped context explicitly
  • MCP integrations — pull context from GitLab, GitHub, Slack, tickets

The generation phase is where most teams stop. They write a Claude.md, maybe a few rules, and call it done. CDLC says generation is one-seventh of the work.

2. Compile

Assemble raw context into phase-appropriate, role-scoped, freshness-weighted packets.

SDLC parallel Build
What it means Select, rank, trim, and package context for the current task
Why it matters Raw context is too large, too stale, or too broad. Compilation makes it precise.

AgentOps implementation:

  • ao context assemble — build phase-scoped context packets
  • ao lookup — retrieve decay-ranked learnings on demand
  • ao inject — deprecated compatibility adapter for legacy retrieval paths
  • ao compile — rebuild the derived knowledge wiki (Mine → Grow → Defrag → Lint)
  • ao maturity --expire/--evict — remove stale context before it pollutes the window
  • Finding compiler — distill raw findings into prevention rules

This is the phase that separates a context compiler from a prompt builder. A prompt builder concatenates. A compiler selects, ranks, trims, and delivers the minimum viable context for the current phase.

3. Test

Validate that context produces the intended agent behavior.

SDLC parallel Test
What it means Run evals on context: does SKILL.md X produce behavior Y?
Why it matters You change two lines in your Claude.md. Do you know the impact?

AgentOps implementation:

  • /pre-mortem — validate plans before implementation (LLM-as-judge)
  • /vibe — validate code after implementation (multi-model consensus)
  • /council — multi-judge adversarial review
  • ao eval run — deterministic eval suites with scoring dimensions
  • context_comprehension dimension — structural quality assessment of SKILL.md files
  • Baseline A/B — skill-on vs skill-off delta measurement

Testing context is fundamentally different from testing code. Evals are non-deterministic. You run them five times and measure pass rate. Error budgets replace pass/fail. This is the hardest phase to get right, and the one most teams skip entirely.

4. Distribute

Package and share context across projects, teams, and runtimes.

SDLC parallel Release
What it means Version context, resolve dependencies, publish to registries
Why it matters Context that lives in one person's head (or one repo's Claude.md) doesn't scale.

AgentOps implementation:

  • Skills registry — 170+ skills as distributable context packages
  • /converter — export skills to Cursor rules, Codex format, OpenCode config
  • ao compile — package the knowledge wiki for distribution
  • Cross-runtime compatibility — same skills target Claude Code, Codex CLI, Cursor, and OpenCode
  • install.sh — one-line installation of the full context package

Distribution is where context becomes an organizational asset. One team fixes a testing pattern, packages it as a skill, and every other team gets the fix on next install.

5. Deliver

Inject the right context into the right session at the right time.

SDLC parallel Deploy
What it means Load context into the agent's window at session start
Why it matters A compiled context packet is worthless if it doesn't reach the agent.

AgentOps implementation:

  • Explicit context packets — deliver the assembled phase context to the agent
  • Optional SessionStart hooks — runtime adapter profile, not the default path
  • ao lookup — on-demand knowledge search during a session
  • SkillLoadEvent — track which skills were loaded (citation pipeline)
  • Phase-scoped delivery — /research gets different context than /implement

Delivery is the moment where compilation meets the session. Right context, right window, right time. Phase-specific. Role-scoped. Freshness-weighted.

6. Observe

Monitor whether delivered context produces good outcomes.

SDLC parallel Operate + Monitor
What it means Track agent behavior, capture correction signals, measure session outcomes
Why it matters Without observation, context quality is a guess.

AgentOps implementation:

  • quality-signals.sh — detect user corrections and repeated prompts in real time
  • SkillLoadEvent + session-outcome — link "what was loaded" to "how it went"
  • Citation tracking — .agents/ao/citations.jsonl records every artifact retrieval
  • Context monitor — track context window usage and budget
  • ao session-outcome — compute session reward signal from transcript patterns

Observation is the phase that closes the gap between "we shipped context" and "the context worked." Every PR rejection is feedback on context. Every user correction is a signal. Every production failure in generated code traces back to missing context.

7. Adapt

Feed observations back into context improvement. Close the loop.

SDLC parallel Feedback → Plan (restart)
What it means Use session outcomes to improve context for next session
Why it matters Without adaptation, the same context produces the same mistakes forever.

AgentOps implementation:

  • MemRL feedback — cited artifacts receive session reward, updating utility scores
  • Quality-signal → flywheel wiring — user corrections reduce skill utility
  • ao forge transcript — extract learnings from completed sessions
  • ao flywheel close-loop — score, promote, and curate extracted knowledge
  • /evolve — autonomous reconciliation loop that fixes the worst fitness gap
  • /dream — overnight compounding that runs the full adapt cycle unattended

Adaptation is where the CDLC becomes a flywheel. Each session's outcomes improve the next session's context. Knowledge that works gets promoted. Knowledge that fails gets demoted. The system compounds.


SDLC → CDLC Mapping Table

SDLC Phase CDLC Phase Key Question AgentOps Surface
Plan Generate What context should exist? /research, /plan, SKILL.md
Code + Build Compile How is context assembled for this task? ao context assemble, ao lookup, ao compile
Test Test Does this context produce the right behavior? /pre-mortem, /vibe, ao eval run
Release Distribute How do others get this context? Skills registry, /converter, install.sh
Deploy Deliver Did the right context reach the agent? Explicit phase packets, optional SessionStart hooks, SkillLoadEvent
Operate Observe Is the context working in practice? quality-signals.sh, citation tracking, session-outcome
Monitor → Plan Adapt What should change for next time? MemRL feedback, /forge, /evolve, /dream

Operating loop within the phases

The seven phases describe what context engineering is. The operating loop describes how an agent executes work through them. They are not the same artifact.

A single turn of the operating loop touches every CDLC phase:

BDD-shaped intent issue            ← Generate (the intent is the spec; phase 1)
  → vertical slices                ← Compile (one slice per Given/When/Then; phase 2)
  → TDD per slice                  ← Test (first failing test before code; phase 3)
  → conflict-free parallel wave    ← Distribute + Deliver (workers receive scoped context; phases 4–5)
  → integrated bead completion     ← Observe (acceptance examples must pass; phase 6)
  → evidence + learning capture    ← Adapt (ratcheted promotion into the next loop turn; phase 7)

The loop is the unit of work that compounds. The phases are the layers it travels through. Every process skill in this repo (/discovery, /plan, /implement, /crank, /validation, /council, /pre-mortem, /vibe, /post-mortem, /forge, /retro) is one move in that loop, with the upstream artifact contracts and downstream evidence requirements pinned to the loop position — not to a free-floating phase number.

Canonical reference: Operating loop. Doctrine source: .agents/research/2026-05-15-cdlc-dojo-doctrine.md. Fitness gate: GOALS.md Directive #12.

The Leverage Hierarchy

Not all phases are equal. Donella Meadows ranked twelve places to intervene in a system, from weakest (#12: tweak a number) to strongest (#1: change the paradigm). The CDLC phases climb that ladder.

Leverage Meadows Point CDLC Phase What It Means
Low #12–#10: Parameters, buffers, structure Generate Writing a better prompt helps, but it's the lowest-leverage thing you can do. Most teams stop here.
Medium #9–#8: Delays, balancing feedback Compile, Test Assembling the right context and validating it before delivery. Feedback loops that catch errors.
Threshold #6: Information flows Distribute, Deliver Making context available where it's needed. The point where individual effort becomes organizational capability.
High #5: Rules Observe Measuring what actually happens. Rules that govern what gets promoted, demoted, or discarded.
Highest #4–#3: Self-organization, goals Adapt The system improves itself. Learnings promote automatically. Goals reconcile. The flywheel compounds without human intervention.

The pattern: the phases most teams skip are the ones Meadows says matter most. Writing a prompt is #12. Building a system that improves its own context based on what it observes is #4. That's an 8-level leverage gap.

Full leverage-point mapping: docs/leverage-points.md. Convergence map tying each CDLC phase to all five theoretical pillars: docs/the-science.md.


How the 12 Factors Build the Flywheel

The 12-factor doctrine is a build order — four tiers that construct the compounding product loop in sequence. The flywheel emerges when bookkeeping, context compilation, validation gates, and learning loops are running together.

Tier Factors Product Layer What It Builds Theory
Foundation (I–IV) Context Is Everything, Track in Git, One Agent One Job, Research First Context Compiler The substrate — context exists, is versioned, is scoped, is researched Cognitive science (40% load, lost-in-middle). Meadows #12–#6.
Flow (V–VI) Validate Externally, Lock Progress Forward Validation Gates The filter — bad context gets caught, good context can't regress Brownian Ratchet (chaos + filter + one-way gate). Meadows #8–#7.
Knowledge (VII–IX) Extract Learnings, Compound Knowledge, Measure What Matters Knowledge Flywheel The engine — learnings extract, score, promote, inject. The loop closes. MemRL (Zhang 2025). Self-organization (Meadows #4). Escape velocity: σ×ρ > δ.
Scale (X–XII) Isolate Workers, Supervise Hierarchically, Harvest Failures Infrastructure The multiplier — all three layers across parallel agents. Failure becomes fuel. Control theory (K8s reconciliation). SRE (SLOs + error budgets).

The flywheel doesn't exist until the Knowledge tier kicks in — but it can't function without the layers beneath it. Factor VIII (Compound Knowledge) is the climax: the moment the loop closes and starts compounding. Everything before it is setup. Everything after it is scale.

The theoretical threads

Each tier draws from a different body of theory:

  • Cognitive science (Sweller 1988, Liu 2023) constrains the Foundation: the 40% load rule, lost-in-middle attention mechanics, buffer-sizing. Without these constraints, you could dump everything into the window. You can't.
  • The Brownian Ratchet operates in the Flow tier: agents produce noisy output. Validation gates are the filter. The ratchet (Factor VI) is the one-way gate. Chaos + filter + gate = net forward progress.
  • MemRL (Zhang 2025) drives the Knowledge tier: reinforcement learning on episodic memory. Citation events become training signals. Utility scores update. The flywheel has its own learning algorithm.
  • Control theory enables the Scale tier: declared state (GOALS.md) + reconcile loop (/evolve) + error budgets (fitness gates). The system continuously reconciles actual state to desired state.
  • Systems dynamics (Meadows 2008) provides the leverage hierarchy: Foundation is necessary infrastructure (#12–#10), Flow adds feedback (#8–#7), Knowledge reaches self-organization (#4–#3). The highest-leverage phases are the ones most teams never build.

Full convergence map tying each CDLC phase to all five threads: The Science — Part 6.


Why This Matters

LLMs are engines. Context is fuel. You can't tune the engine — that's the model vendor's job. But you can engineer the fuel. AgentOps is the SDLC control plane; the CDLC is how it engineers the fuel.

DevOps proved that disciplined systems around indeterministic workers (humans) produce reliable output. SRE proved it again with SLOs and error budgets. Kubernetes proved it for infrastructure with control loops.

CDLC is the same proof for coding agents. The model stays the same. The context compounds. The system gets better with each use.


See Also