Skip to content

feat: close the governed capability ladder from plan to activate#1098

Merged
chumyin merged 16 commits into
devfrom
work/refstudy-delivery-20260408
Apr 8, 2026
Merged

feat: close the governed capability ladder from plan to activate#1098
chumyin merged 16 commits into
devfrom
work/refstudy-delivery-20260408

Conversation

@chumyin
Copy link
Copy Markdown
Collaborator

@chumyin chumyin commented Apr 8, 2026

Summary

  • Problem:
    • LoongClaw already had governed runtime-capability propose/review/index/plan surfaces, but the ladder still stopped short of operator-usable draft artifacts, explicit activation, and rollback-grade activation records.
  • Why it matters:
    • Maintainers need a bounded, reviewable path from accepted capability families to concrete managed artifacts and explicit activation outcomes without fake-success paths or environment-dependent verification.
  • What changed:
    • Added governed runtime-capability apply support for managed_skill, programmatic_flow, and profile_note_addendum families.
    • Extended runtime-capability plan so operators can inspect structured draft payloads before materializing artifacts.
    • Made managed-skill / programmatic-flow / profile-note drafts self-contained and deterministic.
    • Added governed runtime-capability activate support for managed-skill and profile-note targets, while keeping programmatic-flow activation fail-closed until a real runtime surface exists.
    • Persisted activation records with rollback-oriented evidence so apply/activate outcomes stay auditable.
    • Fixed false-positive activation reporting and hardened local verification so rollback/fail-closed tests do not depend on permission-bit behavior.
    • Kept public repo wording LoongClaw-centric in AGENTS/CLAUDE guidance.
  • What did not change (scope boundary):
    • No live runtime mutation path for programmatic flows.
    • No replay executor or trajectory export work in this PR; that surface is already tracked separately on current dev.

Linked Issues

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Security hardening
  • CI / workflow / release

Touched Areas

  • Kernel / policy / approvals
  • Contracts / protocol / spec
  • Daemon / CLI / install
  • Providers / routing
  • Tools
  • Browser automation
  • Channels / integrations
  • ACP / conversation / session runtime
  • Memory / context assembly
  • Docs / contributor workflow
  • CI / release / workflows

Risk Track

  • Track A (routine / low-risk)
  • Track B (higher-risk / policy-impacting)

Validation

  • cargo fmt --all -- --check
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo test --workspace
  • cargo test --workspace --all-features
  • Relevant architecture / dep-graph / docs checks for touched areas
  • Additional scenario, benchmark, or manual checks when behavior changed
  • If tests mutate process-global env: document how state is restored or serialized

Commands and evidence:

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace
cargo test --workspace --all-features
./scripts/check_architecture_boundaries.sh
./scripts/check_dep_graph.sh
cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1
cargo test -p loongclaw-daemon runtime_trajectory --test integration -- --test-threads=1

User-visible / Operator-visible Changes

  • loongclaw runtime-capability apply --root <path> --family-id <id> [--json]
  • loongclaw runtime-capability activate --artifact <path> [--config <path>] [--apply] [--replace] [--json]
  • runtime-capability plan now surfaces structured draft payload details that match the applied artifact content.
  • Activation records now persist rollback-oriented evidence instead of relying on transient console output alone.

Failure Recovery

  • Fast rollback or disable path:
    • Revert this PR; applied capability artifacts and activation records are bounded filesystem artifacts and do not require schema migration.
  • Observable failure symptoms reviewers should watch for:
    • capability families still materializing only memory_stage_profile payloads instead of the new target kinds
    • managed-skill activation reporting success without installed artifacts actually matching the applied payload
    • profile-note activation failing to record rollback-grade activation evidence
    • config / external-skill / restore tests regressing only on specific filesystem permission models

Reviewer Focus

  • crates/daemon/src/runtime_capability_cli.rs
    • target-kind transition, structured payloads, activation flow, and activation record persistence.
  • crates/daemon/tests/integration/runtime_capability_cli.rs
    • coverage for apply / activate / rollback and fail-closed paths.
  • docs/product-specs/runtime-capability.md, docs/ROADMAP.md
    • product contract updates for the deeper capability ladder.
  • crates/app/src/config/runtime.rs, crates/app/src/tools/external_skills.rs, crates/daemon/tests/integration/*
    • deterministic failure injection and rollback verification hardening.

@github-actions github-actions Bot added documentation Improvements or additions to documentation. dependencies Pull requests that update dependency files. spec Architecture boundaries, product specs, and design docs. daemon Daemon binary, CLI entrypoints, and install flow. providers Provider routing, selection, and transport behavior. tools Tool runtime, policy adapters, and tool catalog behavior. channels Channel adapters and external integration surfaces. memory Memory system, context assembly, and persistence flow. conversation Conversation runtime, session flow, and prompt assembly. acp ACP manager, binding, routing, and control plane surfaces. size: XL Very large pull request: more than 1000 changed lines. labels Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added workspace tracing deps and observability helpers; instrumented runtime turn/provider/tool flows with structured logs; introduced a read-only runtime-trajectory export/show CLI, artifact schema, session trajectory exporter, session-repo accessors, daemon CLI wiring, tests, and docs. Also added runtime-capability apply CLI and related artifacts.

Changes

Cohort / File(s) Summary
Docs & Content Guidance
AGENTS.md, CLAUDE.md, docs/product-specs/index.md, docs/product-specs/runtime-trajectory.md, docs/ROADMAP.md
Added LoongClaw-centric wording rule; added runtime-trajectory product spec and runtime-capability apply roadmap/spec updates.
Workspace deps
Cargo.toml, crates/app/Cargo.toml, crates/daemon/Cargo.toml
Added tracing and tracing-subscriber to workspace deps and enabled tracing.workspace = true for crates.
Observability helpers
crates/app/src/observability.rs, crates/daemon/src/observability.rs, crates/app/src/lib.rs
New app/daemon observability modules: JSON/key/error summarizers and daemon tracing init; wired app observability module.
Tracing instrumentation
crates/app/src/acp/manager.rs, crates/app/src/channel/mod.rs, crates/app/src/tools/mod.rs, crates/app/src/provider/..., crates/app/src/provider/runtime_binding.rs
Added tracing/timing logs around ACP turns, inbound channel processing, tool execution, and provider failover; added ProviderRuntimeBinding::as_str.
Memory / SQLite accessors
crates/app/src/memory/sqlite.rs, crates/app/src/memory/mod.rs
Added PersistedConversationTurnRecord, SQL to fetch full turn rows, session_turn_records_direct, and re-exported types (feature-gated).
Session repo & trajectory gating
crates/app/src/session/repository.rs, crates/app/src/session/mod.rs
Added list_all_events, upsert_session_terminal_outcome, ALL_EVENTS_PAGE_LIMIT, and feature-gated trajectory module declaration.
Runtime Trajectory feature
crates/app/src/session/trajectory.rs
New large module: artifact schema, export types, export mode, export_runtime_trajectory, mapping helpers, and aggregated statistics.
Daemon CLI wiring & runtime-trajectory CLI
crates/daemon/src/lib.rs, crates/daemon/src/main.rs, crates/daemon/src/runtime_trajectory_cli.rs
Init tracing on startup, added runtime-trajectory CLI with export/show, handlers, file I/O, text rendering, and re-exports.
Daemon observability init
crates/daemon/src/observability.rs
Tracing initialization: log format parsing, env-var precedence, env-filter resolution, and subscriber setup with tests.
CLI tests & integration
crates/daemon/tests/integration/*, crates/daemon/tests/integration/runtime_trajectory_cli.rs
Added integration tests for runtime-trajectory export/show, round-trip, metric/rollup assertions; updated CLI parsing tests and small env isolation fixes.
Runtime-capability apply
crates/daemon/src/runtime_capability_cli.rs, crates/daemon/tests/integration/runtime_capability_cli.rs
Added apply subcommand, apply/persistence models, idempotent materialization behavior, schema constants, and integration tests.
Other small edits
crates/app/src/channel/registry.rs, crates/app/src/memory/...
Test env isolation changes and minor re-exports; mostly additive logging/timing instrumentation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as "Daemon CLI (runtime-trajectory)"
    participant Traj as "session::trajectory::export_runtime_trajectory"
    participant Repo as "SessionRepository"
    participant DB as "SQLite DB"

    CLI->>Traj: export_runtime_trajectory(session_id, mode, config, exported_at)
    Traj->>Repo: load_session_summary_with_legacy_fallback(session_id)
    Repo->>DB: SELECT session summary
    DB-->>Repo: session_summary
    Repo-->>Traj: session_summary

    Traj->>Repo: resolve lineage root & collect session IDs (mode)
    Repo->>DB: SELECT sessions (lineage)
    DB-->>Repo: session_list
    Repo-->>Traj: session_list

    loop per session
        Traj->>Repo: session_turn_records_direct(session_id)
        Repo->>DB: SELECT turn rows for session
        DB-->>Repo: PersistedConversationTurnRecord[]
        Repo-->>Traj: turn_records

        Traj->>Repo: list_all_events(session_id)
        Repo->>DB: SELECT events/pages
        DB-->>Repo: SessionEventRecord[]
        Repo-->>Traj: session_events

        Traj->>Repo: load_terminal_outcome(session_id)
        Repo->>DB: SELECT terminal_outcomes
        DB-->>Repo: terminal_outcome?
        Repo-->>Traj: terminal_outcome

        Traj->>DB: SELECT approval_requests for session turns
        DB-->>Traj: approval_requests[]
    end

    Traj->>Traj: aggregate statistics & build artifact
    Traj-->>CLI: RuntimeTrajectoryArtifactDocument (JSON)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

Possibly related PRs

Suggested labels

docs

Suggested reviewers

  • gh-xj

Poem

🐇 I hop through logs with tiny feet,

I gather turns and rows complete,
From root to leaf I stitch the trail,
Export the tale beyond the vale,
A carrot cheer for trajectory neat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from #1089 (runtime trajectory export CLI, stable JSON schema, lineage support, round-trip show command) and #1092 (capability apply command, planned payloads, idempotent apply, target restriction to managed_skill/programmatic_flow/profile_note_addendum).
Out of Scope Changes check ✅ Passed All changes are directly aligned with stated objectives: trajectory export surface, capability apply implementation, supporting observability/tracing additions, and related CLI/test/docs work. No unrelated refactoring or out-of-scope modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: close the governed capability ladder from plan to activate' accurately captures one of the two major features (capability apply), but omits the equally significant runtime trajectory export feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch work/refstudy-delivery-20260408

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
crates/app/src/tools/mod.rs (1)

651-653: Only materialize payload metadata when this target will emit.

json_value_kind and top_level_json_keys now run on every core-tool call, even when loongclaw.tools logging is disabled. Since this sits on the hot execution path, it’s worth gating that work behind the active log level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/app/src/tools/mod.rs` around lines 651 - 653, Only compute payload
metadata eagerly when loongclaw.tools logging is active: guard the calls to
crate::observability::json_value_kind(&request.payload) and
crate::observability::top_level_json_keys(&request.payload) with a check for the
loongclaw.tools log level (e.g. a
crate::observability::is_enabled("loongclaw.tools") or equivalent helper). If
the check is false, avoid calling those functions and set
payload_kind/payload_keys to None/empty (or the existing default) so the hot
path that handles request.payload (referenced as request.payload and
requested_tool_name) does not perform unnecessary work.
crates/app/src/session/repository.rs (1)

1508-1544: Consider consolidating with existing upsert_terminal_outcome to reduce duplication.

This new method is nearly identical to the existing upsert_terminal_outcome (lines 1249-1282), differing only in the session existence check: this method uses load_session_summary_with_legacy_fallback while the existing one uses load_session.

The distinction makes sense for trajectory export (supporting legacy sessions inferred from turn history), but the duplication is notable. Consider extracting a shared private helper that accepts a session-existence-check closure or boolean flag, or document why the split is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/app/src/session/repository.rs` around lines 1508 - 1544, The two
functions upsert_session_terminal_outcome and upsert_terminal_outcome are nearly
identical except for how they validate session existence
(load_session_summary_with_legacy_fallback vs load_session); refactor by
extracting a private helper (e.g., fn upsert_terminal_outcome_inner(...,
session_exists_check: impl Fn(&str) -> Result<bool, String>) or a flag) that
performs normalization, payload encoding, DB upsert, and record construction,
and call that helper from both upsert_session_terminal_outcome and
upsert_terminal_outcome passing the appropriate existence-check closure (or
boolean) to remove duplication or alternatively add a short comment explaining
the intentional split if you choose not to refactor.
crates/app/src/session/trajectory.rs (1)

221-247: Performance note: Lineage depth computed per-session via DB traversal.

sort_runtime_trajectory_sessions calls repo.session_lineage_depth() for each session, where each call traverses the parent chain via multiple DB queries. For a lineage with N sessions and max depth D, this results in O(N × D) queries.

This is acceptable for typical session lineages but could become slow for deeply nested or very large lineage trees. If this becomes a bottleneck in practice, consider computing depths in a single pass by building an in-memory parent map from the already-loaded session summaries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/app/src/session/trajectory.rs` around lines 221 - 247,
sort_runtime_trajectory_sessions currently calls
SessionRepository::session_lineage_depth for each SessionSummaryRecord which
causes O(N×D) DB traversals; instead, build an in-memory parent map from the
provided sessions (use session.session_id -> session.parent_session_id from the
SessionSummaryRecord list), then compute lineage depths with a single-pass
memoized traversal (or iterative topological/DFS with memoization) into
depth_by_session_id and use that map in the existing sort; keep the sort logic
unchanged and only replace the per-session repo calls with the in-memory depth
computation inside sort_runtime_trajectory_sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/app/src/memory/sqlite.rs`:
- Around line 1923-1925: The export reader currently calls row.get(1) into
session_turn_index as a non-null i64 which will error on NULL DB rows; change
the decode to read an Option<i64> (i.e., row.get::<_, Option<i64>>(1)) and
handle None defensively inside the export path (either treat missing
session_turn_index as a default/sentinel, skip legacy rows, or propagate a more
specific recoverable warning) in the function that performs export reads so
legacy/inconsistent rows do not cause a hard failure; update any subsequent uses
of session_turn_index to account for Option and convert/handle the value before
using it.

In `@crates/app/src/tools/mod.rs`:
- Around line 679-698: The logs for tool invocations always show
canonical_tool_name = "tool.invoke", which hides the actual inner tool (e.g.,
file.read, web.fetch); update both the success (tracing::debug!) and error
(tracing::warn!) blocks around the invoke handling to compute and include the
resolved inner tool name (derive from the canonical_name/payload or the value
used to dispatch the inner tool, e.g., call it resolved_tool_name or
inner_tool_name) and add it as a structured field (inner_tool_name =
%resolved_tool_name) alongside requested_tool_name and canonical_tool_name so
each log entry shows the real invoked tool (references: variables
canonical_name, requested_tool_name, payload_keys, outcome, error).

In `@crates/daemon/src/main.rs`:
- Line 42: The debug log currently serializes and logs the full parsed CLI
payload via tracing::debug!(..., command = ?cli.command), which may include
sensitive user-provided text; update the log to avoid emitting raw payloads by
logging a safe marker instead — for example log only the command variant/name or
a sanitized summary (e.g., cli.command.variant() or cli.command.name()) or
explicitly redact sensitive fields before logging; change the tracing::debug!
invocation to emit that sanitized value (or a fixed string like "CLI command
parsed") rather than ?cli.command.

---

Nitpick comments:
In `@crates/app/src/session/repository.rs`:
- Around line 1508-1544: The two functions upsert_session_terminal_outcome and
upsert_terminal_outcome are nearly identical except for how they validate
session existence (load_session_summary_with_legacy_fallback vs load_session);
refactor by extracting a private helper (e.g., fn
upsert_terminal_outcome_inner(..., session_exists_check: impl Fn(&str) ->
Result<bool, String>) or a flag) that performs normalization, payload encoding,
DB upsert, and record construction, and call that helper from both
upsert_session_terminal_outcome and upsert_terminal_outcome passing the
appropriate existence-check closure (or boolean) to remove duplication or
alternatively add a short comment explaining the intentional split if you choose
not to refactor.

In `@crates/app/src/session/trajectory.rs`:
- Around line 221-247: sort_runtime_trajectory_sessions currently calls
SessionRepository::session_lineage_depth for each SessionSummaryRecord which
causes O(N×D) DB traversals; instead, build an in-memory parent map from the
provided sessions (use session.session_id -> session.parent_session_id from the
SessionSummaryRecord list), then compute lineage depths with a single-pass
memoized traversal (or iterative topological/DFS with memoization) into
depth_by_session_id and use that map in the existing sort; keep the sort logic
unchanged and only replace the per-session repo calls with the in-memory depth
computation inside sort_runtime_trajectory_sessions.

In `@crates/app/src/tools/mod.rs`:
- Around line 651-653: Only compute payload metadata eagerly when
loongclaw.tools logging is active: guard the calls to
crate::observability::json_value_kind(&request.payload) and
crate::observability::top_level_json_keys(&request.payload) with a check for the
loongclaw.tools log level (e.g. a
crate::observability::is_enabled("loongclaw.tools") or equivalent helper). If
the check is false, avoid calling those functions and set
payload_kind/payload_keys to None/empty (or the existing default) so the hot
path that handles request.payload (referenced as request.payload and
requested_tool_name) does not perform unnecessary work.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4b329228-511f-429c-927e-07b7c802669b

📥 Commits

Reviewing files that changed from the base of the PR and between 90527de and 8d54cfe.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • AGENTS.md
  • CLAUDE.md
  • Cargo.toml
  • crates/app/Cargo.toml
  • crates/app/src/acp/manager.rs
  • crates/app/src/channel/mod.rs
  • crates/app/src/channel/registry.rs
  • crates/app/src/lib.rs
  • crates/app/src/memory/mod.rs
  • crates/app/src/memory/sqlite.rs
  • crates/app/src/observability.rs
  • crates/app/src/provider/request_failover_runtime.rs
  • crates/app/src/provider/request_session_runtime.rs
  • crates/app/src/provider/runtime_binding.rs
  • crates/app/src/session/mod.rs
  • crates/app/src/session/repository.rs
  • crates/app/src/session/trajectory.rs
  • crates/app/src/tools/mod.rs
  • crates/daemon/Cargo.toml
  • crates/daemon/src/lib.rs
  • crates/daemon/src/main.rs
  • crates/daemon/src/observability.rs
  • crates/daemon/src/runtime_trajectory_cli.rs
  • crates/daemon/tests/integration/cli_tests.rs
  • crates/daemon/tests/integration/migrate_cli.rs
  • crates/daemon/tests/integration/mod.rs
  • crates/daemon/tests/integration/runtime_trajectory_cli.rs
  • docs/product-specs/index.md
  • docs/product-specs/runtime-trajectory.md

Comment thread crates/app/src/memory/sqlite.rs Outdated
Comment thread crates/app/src/tools/mod.rs Outdated
Comment thread crates/daemon/src/main.rs Outdated
@chumyin chumyin changed the title feat: add governed runtime trajectory export feat: add governed trajectory export and capability apply Apr 8, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/daemon/src/runtime_capability_cli.rs`:
- Around line 1379-1439:
persist_runtime_capability_apply_artifact/write_pretty_json_file_create_new
currently opens the final path with create_new(true) which can make the file
visible before the JSON is fully written, causing races; instead, write the
pretty-serialized bytes to a temporary file in the same directory (e.g., path
with a random/suffixed name), fsync the temp file (and optionally its parent
dir), then atomically rename (fs::rename) the temp file to the final path and
map an AlreadyExists error from rename to the same "already exists" string so
persist_runtime_capability_apply_artifact can still load and compare the
existing artifact; update write_pretty_json_file_create_new to perform these
steps and keep the same error messages and return types so callers like
persist_runtime_capability_apply_artifact continue to work unchanged.

In `@crates/daemon/tests/integration/runtime_capability_cli.rs`:
- Around line 2360-2366: The assertion compares report.output_path using
OS-native separators, which fails on Windows; normalize the path before checking
the suffix by passing report.output_path through normalized_path_text(...) (or
otherwise canonicalizing separators) and then assert that
normalized_path_text(&report.output_path).ends_with(&format!("managed_skills/{}.json",
report.applied_artifact.artifact_id)) so the check is platform independent;
update the assertion that references report.output_path and keep the same
message text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8fd3e86e-0c0f-49ce-8b21-f46cc4be47eb

📥 Commits

Reviewing files that changed from the base of the PR and between 8d54cfe and d4d32f6.

📒 Files selected for processing (5)
  • crates/daemon/src/runtime_capability_cli.rs
  • crates/daemon/tests/integration/cli_tests.rs
  • crates/daemon/tests/integration/runtime_capability_cli.rs
  • docs/ROADMAP.md
  • docs/product-specs/runtime-capability.md
✅ Files skipped from review due to trivial changes (2)
  • docs/ROADMAP.md
  • docs/product-specs/runtime-capability.md

Comment thread crates/daemon/src/runtime_capability_cli.rs
Comment thread crates/daemon/tests/integration/runtime_capability_cli.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/daemon/src/runtime_capability_cli.rs (1)

1417-1478: ⚠️ Potential issue | 🔴 Critical

Publish the apply artifact atomically to prevent idempotency races.

write_pretty_json_file_create_new exposes the destination path immediately after create_new(true), before JSON is fully written. A concurrent apply can hit “already exists”, then fail parsing partially written content and incorrectly fail idempotent behavior.

💡 Suggested fix direction
- open final path with create_new(true)
- write JSON directly to final path
+ write full JSON to a temp file in the same directory
+ flush + sync temp file
+ publish to final path using an atomic no-clobber step
+ map "already exists" from publish step to current idempotency branch
+ best-effort cleanup of temp file on failure
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/daemon/src/runtime_capability_cli.rs` around lines 1417 - 1478,
persist_runtime_capability_apply_artifact can race because
write_pretty_json_file_create_new creates the target path before the JSON is
fully written; fix by making write_pretty_json_file_create_new write to a
temporary file in the same directory, flush and sync the temp file
(file.sync_all()), then attempt an atomic fs::rename(temp_path, path); if rename
fails because destination already exists, load the existing artifact (use
load_runtime_capability_apply_artifact) and compare to decide AlreadyApplied vs
error; ensure temp file is removed on error and create parent dirs as currently
done.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/daemon/src/runtime_capability_cli.rs`:
- Around line 1417-1478: persist_runtime_capability_apply_artifact can race
because write_pretty_json_file_create_new creates the target path before the
JSON is fully written; fix by making write_pretty_json_file_create_new write to
a temporary file in the same directory, flush and sync the temp file
(file.sync_all()), then attempt an atomic fs::rename(temp_path, path); if rename
fails because destination already exists, load the existing artifact (use
load_runtime_capability_apply_artifact) and compare to decide AlreadyApplied vs
error; ensure temp file is removed on error and create parent dirs as currently
done.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 89207569-a044-46ba-b1cc-87b0ae67ffcc

📥 Commits

Reviewing files that changed from the base of the PR and between c320728 and c70246f.

📒 Files selected for processing (2)
  • crates/daemon/src/runtime_capability_cli.rs
  • crates/daemon/tests/integration/runtime_capability_cli.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/daemon/tests/integration/runtime_capability_cli.rs

@chumyin chumyin self-assigned this Apr 8, 2026
@github-actions github-actions Bot added the migration Onboarding, legacy import, and migration flow. label Apr 8, 2026
@chumyin chumyin changed the title feat: add governed trajectory export and capability apply feat: complete the governed trajectory and capability ladder Apr 8, 2026
@github-actions github-actions Bot added the config Runtime config parsing, schema, and defaults. label Apr 8, 2026
@chumyin chumyin force-pushed the work/refstudy-delivery-20260408 branch from 0ca8a7f to 3bace47 Compare April 8, 2026 11:57
@chumyin chumyin changed the title feat: complete the governed trajectory and capability ladder feat: close the governed capability ladder from plan to activate Apr 8, 2026
@github-actions github-actions Bot removed dependencies Pull requests that update dependency files. providers Provider routing, selection, and transport behavior. channels Channel adapters and external integration surfaces. memory Memory system, context assembly, and persistence flow. conversation Conversation runtime, session flow, and prompt assembly. acp ACP manager, binding, routing, and control plane surfaces. labels Apr 8, 2026
chumyin added 2 commits April 8, 2026 05:02
The public agent instructions now make the repository rule explicit: public issues, PRs, and public docs should stay LoongClaw-centric, while deeper cross-project comparisons belong in the internal knowledge base.

This captures the requested boundary in the repo-local operator map so future delivery work does not repeat the same public wording mistake.

Constraint: AGENTS.md and CLAUDE.md must stay mirrored in the same change
Rejected: Add the rule to internal knowledge-base docs only | the mistake happened in public-repo delivery, so the repo-local instructions need the reminder
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Mention external projects in public-repo artifacts only when that reference is strictly necessary for user-facing understanding
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features
Not-tested: Fix for pre-existing test failure channel::registry::tests::discord_status_splits_config_backed_send_and_stub_serve
LoongClaw already let operators propose, review, index, and plan runtime capabilities, but it still stopped one step short of a reusable incubator workflow. This change adds a read-only apply stage that materializes draft artifacts under the planned delivery surface for managed skills, programmatic flows, and profile-note addenda without mutating live runtime state.

The implementation reuses the existing planner output, keeps writes idempotent when the artifact already matches, and extends integration coverage across all three supported target kinds plus the non-promotable gate.

Constraint: Capability incubation must stay read-only and must not activate live runtime behavior during apply
Rejected: Reuse the older memory-stage-profile-only apply path | it would diverge from the currently supported public target kinds and the governed incubator issue scope
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep future capability-apply payloads aligned with the target kinds exposed in the public runtime-capability surface unless the product spec changes first
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1
Not-tested: Updating the already-open source-repo PR body after adding the apply stage
chumyin added 10 commits April 8, 2026 05:02
The capability incubator now exposes a structured planned payload in the runtime-capability plan output, so operators can inspect the exact draft artifact shape before materializing it. The apply path reuses that payload when building managed-skill, programmatic-flow, and profile-note draft artifacts, which keeps the incubator ladder internally consistent instead of duplicating target-specific decisions in separate code paths.

The integration suite now checks payload emission for ready, not-ready, and blocked families across the supported target kinds.

Constraint: The incubator remains read-only; planning and apply must not activate live runtime behavior
Rejected: Keep apply as metadata-only output without a draft payload contract | that would leave the plan/apply ladder underspecified and make later activation work harder to review
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: If a future target kind is added, extend both the planned payload and apply coverage together so the incubator ladder stays aligned
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features
Not-tested: Updating the source-repo PR description to mention the planned payload deepening
The product spec and roadmap now describe the deeper incubator contract that landed in the runtime-capability planner and apply flow. This keeps the public implementation and the documented ladder aligned, so operators can rely on the plan command surfacing a structured draft payload preview before the apply command materializes a governed artifact.

The wording stays LoongClaw-centric and scoped to the current runtime-capability surface.

Constraint: Public docs must describe only the capability surfaces that actually ship in the source repo
Rejected: Leave the planner payload shape undocumented | would make the new incubator step harder to understand and review from product docs alone
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When the incubator ladder gains a new visible stage, update both the roadmap and the relevant product spec in the same change
Tested: cargo test --workspace; cargo test --workspace --all-features
Not-tested: Separate docs-only validation beyond the passing workspace test suites
The capability incubator now carries target-specific draft payloads through both the plan and apply stages. Managed-skill families emit a generated SKILL.md draft, programmatic-flow families emit a deterministic flow.json draft, and profile-note families emit an advisory addendum draft.

This keeps the incubator ladder coherent: operators can inspect the exact payload in the runtime-capability plan output, then materialize the same payload with the runtime-capability apply command without hidden target-specific reconstruction. That is the smallest correct step before adding any future activation executor.

Constraint: Capability incubation must remain read-only and must not assume a live activation surface already exists
Rejected: Add activation first and leave draft payloads metadata-only | that would hide the artifact shape until after mutation and would couple activation logic to undocumented reconstruction rules
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: If a target kind can be applied, its draft payload should be inspectable in the planner output before any future activation path is added
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features
Not-tested: Updating the open PR description to mention self-contained draft payload contents
The incubator ladder now continues past apply. Operators can activate
managed-skill and profile-note draft artifacts through an explicit
runtime-capability command, while programmatic-flow activation stays
fail-closed until a governed runtime surface exists.

The same change makes draft payloads self-contained and testable across
plan, apply, and activate, which keeps the ladder consistent and gives
reviewers concrete artifacts to inspect before and after activation.

Constraint: Activation must stay explicit, target-aware, and read-only by default
Rejected: Pretend programmatic flows have a live activation surface | would create a fake-success path and hide the missing runtime seam
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Do not add a new activatable target kind unless the planner payload, apply artifact, and activate executor all land together
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features
Not-tested: Updating the open PR description after adding activate
Activation already had explicit apply and idempotence lanes, but it still trusted
successful tool/config mutations more than observed runtime state. This change
makes managed-skill and profile-note activation verify the live target after
mutation, surfaces verification evidence in the activation report, and carries
rollback hints forward from the applied draft artifact.

Constraint: programmatic_flow still has no governed activation surface and must remain fail-closed
Rejected: trust tool success alone | leaves room for false-success activation reports
Confidence: high
Scope-risk: narrow
Directive: keep each activation target paired with explicit post-apply verification before widening supported surfaces
Tested: cargo fmt --all -- --check
Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings
Tested: cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1
Tested: cargo test --workspace
Tested: cargo test --workspace --all-features
Not-tested: manual CLI invocation outside integration fixtures
The activation lane could verify success, but operators still had no governed way
to return a draft activation back to its recorded pre-activation state. This
change persists activation records, adds an explicit rollback command, and makes
supported targets prove both activation and rollback state instead of relying on
best-effort manual cleanup.

Constraint: programmatic_flow still lacks a governed activation host and must remain fail-closed for both activate and rollback
Rejected: delete activation records after rollback | destroys the audit trail operators need for review
Confidence: high
Scope-risk: moderate
Directive: any new activation target must ship with a persisted rollback payload and post-rollback verification before it is considered supported
Tested: cargo fmt --all -- --check
Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings
Tested: cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1
Tested: cargo test --workspace
Tested: cargo test --workspace --all-features
Not-tested: manual CLI invocation of rollback against operator-managed real installations
The review follow-ups now fix the concrete observability and artifact
robustness issues on the branch, and the surrounding regression tests stop
relying on permission-bit behavior that can vary across environments.

The result is a cleaner trajectory/apply surface plus deterministic config,
external-skill, chat, doctor, restore, and browser-preview failure tests that
still validate the intended rollback and fail-closed behavior.

Constraint: Verification must stay truthful on contributor machines even when filesystem permission semantics differ
Rejected: Leave permission-based tests as-is | they produced environment-dependent false positives and false negatives
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Prefer deterministic failure injection or structural invalid fixtures over permission-bit assumptions when testing rollback paths
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; ./scripts/check_architecture_boundaries.sh; ./scripts/check_dep_graph.sh
Not-tested: Fresh CI run on GitHub Actions after pushing this branch
The capability ladder cherry-picks landed on top of a newer daemon and test surface than the original branch targeted.

This follow-up swaps the digest formatting onto the current hash output type, removes one stale unix-only import, and realigns the trajectory CLI parse expectation with the command shape that already ships on dev.

Constraint: The rewritten PR branch must compile and test cleanly against current dev without reintroducing stale interface assumptions
Rejected: Keep the old formatting and parser expectations | the rebased branch would fail before verification could even start
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When replaying older branch commits onto a moving CLI surface, prefer tiny compatibility shims over broad rewrites
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; ./scripts/check_architecture_boundaries.sh; ./scripts/check_dep_graph.sh
Not-tested: GitHub Actions run after force-updating the PR branch
The governance job compares the tracked April 2026 drift report against the current tree.

Rebasing the capability ladder onto current dev changed the tools module metrics, so the tracked report needed one fresh regeneration to keep governance truthful.

Constraint: Governance requires the checked-in monthly drift report to match the current tree exactly
Rejected: Ignore the stale report and rely on CI output alone | governance is designed to fail closed on drift report freshness
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When rebasing large feature branches onto dev, regenerate the monthly architecture drift report before re-running governance
Tested: bash scripts/generate_architecture_drift_report.sh docs/releases/architecture-drift-2026-04.md; bash scripts/check_architecture_drift_freshness.sh docs/releases/architecture-drift-2026-04.md
Not-tested: Full CI rerun after pushing the refreshed report
A few daemon integration expectations still reflected older CLI help text and an earlier symlink failure surface, and one work-unit CLI round-trip needed the shared daemon environment lock to avoid cross-test interference.

This keeps the rebased branch aligned with current dev behavior while preserving the intended fail-closed and rollback guarantees.

Constraint: The rebased PR branch must pass the current dev integration suite without depending on stale help text or cross-test environment races
Rejected: Leave the flaky expectations in place | the branch would stay red even though the underlying capability path was correct
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When rebasing onto a newer CLI surface, refresh help-text assertions and add shared daemon env locking where subprocess tests depend on process-global state
Tested: cargo test --workspace; cargo test --workspace --all-features
Not-tested: Fresh remote CI run after force-updating the PR branch
@chumyin chumyin force-pushed the work/refstudy-delivery-20260408 branch from 3bace47 to 8945bf2 Compare April 8, 2026 12:33
chumyin added 4 commits April 8, 2026 05:37
The governance docs check still treated the roadmap as if  were the live enforcement seam.

Current dev routes tool approval through policy extensions and execution-layer dispatch, so the roadmap now describes the active architecture instead of the deprecated hook.

Constraint: Docs governance fails closed when roadmap wording drifts from the live architecture
Rejected: Leave the deprecated hook name in roadmap prose | it would keep governance red and misdescribe the current policy path
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep roadmap references aligned with the active enforcement seam, and mention deprecated hooks only as compatibility notes
Tested: scripts/bootstrap_release_local_artifacts.sh; LOONGCLAW_RELEASE_DOCS_STRICT=1 scripts/check-docs.sh
Not-tested: Follow-up remote governance rerun after pushing this doc correction
Managed-skill activation succeeded locally but failed on Windows because canonicalized file roots and candidate paths used different prefix forms during the safe-path containment check.

Switching those canonicalization points to  keeps the path semantics the same while removing the verbatim-prefix mismatch that made in-root staging paths look like escapes.

Constraint: Managed-skill activation must remain fail-closed while still working on Windows runners with verbatim path prefixes
Rejected: Special-case only the activation staging path string | that would leave the underlying safe-path comparison inconsistent for other file-root callers
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a path comparison is security-sensitive and cross-platform, normalize both sides through the same canonicalization strategy before checking containment
Tested: cargo test --test integration integration::runtime_capability_cli::runtime_capability_activate_managed_skill_apply_installs_skill_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_rollback_managed_skill_restores_pre_activation_state_and_is_idempotent -- --exact --nocapture
Not-tested: Fresh Windows CI rerun after pushing this path normalization fix
…on paths

The Windows CI failures showed two related issues in the safe-path gate: verbatim-prefixed canonical paths could compare unequal to non-prefixed roots, and the external-skill migration path went through a sibling containment helper with the same assumption.

This change normalizes both file and migration path checks through the same simplified canonical form so in-root paths stay allowed while real escapes still fail closed.

Constraint: Path sandboxing must stay fail-closed while remaining portable across Windows canonical path encodings
Rejected: Patch only the managed-skill activation call site | the same mismatch also affected migration-driven external-skill installs
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Reuse one cross-platform normalization strategy for every file-root containment check instead of fixing individual callers ad hoc
Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --test integration integration::migrate_cli::run_migrate_cli_apply_selected_mode_can_apply_external_skill_plan -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_activate_managed_skill_apply_installs_skill_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_rollback_managed_skill_restores_pre_activation_state_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::work_unit_cli::work_unit_cli_create_claim_complete_and_archive_round_trip -- --exact --nocapture; cargo test --test integration integration::work_unit_cli::work_unit_cli_update_text_output_uses_snake_case_status_labels -- --exact --nocapture
Not-tested: Fresh Windows GitHub Actions rerun after pushing this normalization fix
The Windows runners now get simplified canonical paths out of the file-root safety layer, which fixes false escape rejections but also changes the exact path text that some tests compare and some workspace-memory lookups key on.

This follow-up normalizes the workspace-memory keying path and updates the affected tests to compare against the same normalized canonical form.

Constraint: Cross-platform path-safety fixes must not leave Windows-only assertion or lookup mismatches behind
Rejected: Revert the simplified path output just to satisfy old tests | that would reintroduce the original Windows containment bug
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a path helper intentionally changes canonical form, update every lookup key and assertion that depends on that textual representation in the same change
Tested: cargo test -p loongclaw-app tools::memory_tools::tests::memory_get_tool_strips_frontmatter_and_surfaces_workspace_metadata --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_returns_bounded_line_window_from_memory_file --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_uses_selected_memory_system_id_in_provenance --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_reads_requested_window_without_loading_invalid_tail --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::config_import_apply_mode_writes_target_config --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::feishu_messages_resource_get_tool_downloads_message_resource_to_safe_file_root --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::workspace_root_tests::file_read_uses_workspace_root_from_trusted_internal_payload --lib -- --exact --nocapture
Not-tested: Full Windows CI rerun after pushing the normalized expectation updates
@chumyin chumyin merged commit f97a769 into dev Apr 8, 2026
18 checks passed
@chumyin chumyin deleted the work/refstudy-delivery-20260408 branch April 8, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Runtime config parsing, schema, and defaults. daemon Daemon binary, CLI entrypoints, and install flow. documentation Improvements or additions to documentation. migration Onboarding, legacy import, and migration flow. size: XL Very large pull request: more than 1000 changed lines. spec Architecture boundaries, product specs, and design docs. tools Tool runtime, policy adapters, and tool catalog behavior.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Establish a governed capability incubator for bounded self-improvement

1 participant