feat: close the governed capability ladder from plan to activate by chumyin · Pull Request #1098 · eastreams/loong

chumyin · 2026-04-08T05:28:27Z

Summary

Problem:
- LoongClaw already had governed runtime-capability propose/review/index/plan surfaces, but the ladder still stopped short of operator-usable draft artifacts, explicit activation, and rollback-grade activation records.
Why it matters:
- Maintainers need a bounded, reviewable path from accepted capability families to concrete managed artifacts and explicit activation outcomes without fake-success paths or environment-dependent verification.
What changed:
- Added governed runtime-capability apply support for managed_skill, programmatic_flow, and profile_note_addendum families.
- Extended runtime-capability plan so operators can inspect structured draft payloads before materializing artifacts.
- Made managed-skill / programmatic-flow / profile-note drafts self-contained and deterministic.
- Added governed runtime-capability activate support for managed-skill and profile-note targets, while keeping programmatic-flow activation fail-closed until a real runtime surface exists.
- Persisted activation records with rollback-oriented evidence so apply/activate outcomes stay auditable.
- Fixed false-positive activation reporting and hardened local verification so rollback/fail-closed tests do not depend on permission-bit behavior.
- Kept public repo wording LoongClaw-centric in AGENTS/CLAUDE guidance.
What did not change (scope boundary):
- No live runtime mutation path for programmatic flows.
- No replay executor or trajectory export work in this PR; that surface is already tracked separately on current dev.

Linked Issues

Change Type

Touched Areas

Risk Track

Track A (routine / low-risk)
Track B (higher-risk / policy-impacting)

Validation

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace
cargo test --workspace --all-features
Relevant architecture / dep-graph / docs checks for touched areas
Additional scenario, benchmark, or manual checks when behavior changed
If tests mutate process-global env: document how state is restored or serialized

Commands and evidence:

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace
cargo test --workspace --all-features
./scripts/check_architecture_boundaries.sh
./scripts/check_dep_graph.sh
cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1
cargo test -p loongclaw-daemon runtime_trajectory --test integration -- --test-threads=1

User-visible / Operator-visible Changes

loongclaw runtime-capability apply --root <path> --family-id <id> [--json]
loongclaw runtime-capability activate --artifact <path> [--config <path>] [--apply] [--replace] [--json]
runtime-capability plan now surfaces structured draft payload details that match the applied artifact content.
Activation records now persist rollback-oriented evidence instead of relying on transient console output alone.

Failure Recovery

Fast rollback or disable path:
- Revert this PR; applied capability artifacts and activation records are bounded filesystem artifacts and do not require schema migration.
Observable failure symptoms reviewers should watch for:
- capability families still materializing only memory_stage_profile payloads instead of the new target kinds
- managed-skill activation reporting success without installed artifacts actually matching the applied payload
- profile-note activation failing to record rollback-grade activation evidence
- config / external-skill / restore tests regressing only on specific filesystem permission models

Reviewer Focus

crates/daemon/src/runtime_capability_cli.rs
- target-kind transition, structured payloads, activation flow, and activation record persistence.
crates/daemon/tests/integration/runtime_capability_cli.rs
- coverage for apply / activate / rollback and fail-closed paths.
docs/product-specs/runtime-capability.md, docs/ROADMAP.md
- product contract updates for the deeper capability ladder.
crates/app/src/config/runtime.rs, crates/app/src/tools/external_skills.rs, crates/daemon/tests/integration/*
- deterministic failure injection and rollback verification hardening.

coderabbitai · 2026-04-08T05:28:59Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added workspace tracing deps and observability helpers; instrumented runtime turn/provider/tool flows with structured logs; introduced a read-only runtime-trajectory export/show CLI, artifact schema, session trajectory exporter, session-repo accessors, daemon CLI wiring, tests, and docs. Also added runtime-capability apply CLI and related artifacts.

Changes

Cohort / File(s)	Summary
Docs & Content Guidance `AGENTS.md`, `CLAUDE.md`, `docs/product-specs/index.md`, `docs/product-specs/runtime-trajectory.md`, `docs/ROADMAP.md`	Added LoongClaw-centric wording rule; added runtime-trajectory product spec and runtime-capability `apply` roadmap/spec updates.
Workspace deps `Cargo.toml`, `crates/app/Cargo.toml`, `crates/daemon/Cargo.toml`	Added `tracing` and `tracing-subscriber` to workspace deps and enabled `tracing.workspace = true` for crates.
Observability helpers `crates/app/src/observability.rs`, `crates/daemon/src/observability.rs`, `crates/app/src/lib.rs`	New app/daemon observability modules: JSON/key/error summarizers and daemon tracing init; wired app observability module.
Tracing instrumentation `crates/app/src/acp/manager.rs`, `crates/app/src/channel/mod.rs`, `crates/app/src/tools/mod.rs`, `crates/app/src/provider/...`, `crates/app/src/provider/runtime_binding.rs`	Added tracing/timing logs around ACP turns, inbound channel processing, tool execution, and provider failover; added `ProviderRuntimeBinding::as_str`.
Memory / SQLite accessors `crates/app/src/memory/sqlite.rs`, `crates/app/src/memory/mod.rs`	Added `PersistedConversationTurnRecord`, SQL to fetch full turn rows, `session_turn_records_direct`, and re-exported types (feature-gated).
Session repo & trajectory gating `crates/app/src/session/repository.rs`, `crates/app/src/session/mod.rs`	Added `list_all_events`, `upsert_session_terminal_outcome`, `ALL_EVENTS_PAGE_LIMIT`, and feature-gated `trajectory` module declaration.
Runtime Trajectory feature `crates/app/src/session/trajectory.rs`	New large module: artifact schema, export types, export mode, `export_runtime_trajectory`, mapping helpers, and aggregated statistics.
Daemon CLI wiring & runtime-trajectory CLI `crates/daemon/src/lib.rs`, `crates/daemon/src/main.rs`, `crates/daemon/src/runtime_trajectory_cli.rs`	Init tracing on startup, added `runtime-trajectory` CLI with `export`/`show`, handlers, file I/O, text rendering, and re-exports.
Daemon observability init `crates/daemon/src/observability.rs`	Tracing initialization: log format parsing, env-var precedence, env-filter resolution, and subscriber setup with tests.
CLI tests & integration `crates/daemon/tests/integration/*`, `crates/daemon/tests/integration/runtime_trajectory_cli.rs`	Added integration tests for runtime-trajectory export/show, round-trip, metric/rollup assertions; updated CLI parsing tests and small env isolation fixes.
Runtime-capability apply `crates/daemon/src/runtime_capability_cli.rs`, `crates/daemon/tests/integration/runtime_capability_cli.rs`	Added `apply` subcommand, apply/persistence models, idempotent materialization behavior, schema constants, and integration tests.
Other small edits `crates/app/src/channel/registry.rs`, `crates/app/src/memory/...`	Test env isolation changes and minor re-exports; mostly additive logging/timing instrumentation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as "Daemon CLI (runtime-trajectory)"
    participant Traj as "session::trajectory::export_runtime_trajectory"
    participant Repo as "SessionRepository"
    participant DB as "SQLite DB"

    CLI->>Traj: export_runtime_trajectory(session_id, mode, config, exported_at)
    Traj->>Repo: load_session_summary_with_legacy_fallback(session_id)
    Repo->>DB: SELECT session summary
    DB-->>Repo: session_summary
    Repo-->>Traj: session_summary

    Traj->>Repo: resolve lineage root & collect session IDs (mode)
    Repo->>DB: SELECT sessions (lineage)
    DB-->>Repo: session_list
    Repo-->>Traj: session_list

    loop per session
        Traj->>Repo: session_turn_records_direct(session_id)
        Repo->>DB: SELECT turn rows for session
        DB-->>Repo: PersistedConversationTurnRecord[]
        Repo-->>Traj: turn_records

        Traj->>Repo: list_all_events(session_id)
        Repo->>DB: SELECT events/pages
        DB-->>Repo: SessionEventRecord[]
        Repo-->>Traj: session_events

        Traj->>Repo: load_terminal_outcome(session_id)
        Repo->>DB: SELECT terminal_outcomes
        DB-->>Repo: terminal_outcome?
        Repo-->>Traj: terminal_outcome

        Traj->>DB: SELECT approval_requests for session turns
        DB-->>Traj: approval_requests[]
    end

    Traj->>Traj: aggregate statistics & build artifact
    Traj-->>CLI: RuntimeTrajectoryArtifactDocument (JSON)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

[Feature]: Add runtime trajectory artifacts for session replay and evaluation #1090 — Implements runtime-trajectory export/CLI and artifact objectives matching this PR.
[Feature]: add session search and trajectory artifact CLI surfaces #902 — Related: adds trajectory artifact and CLI surface similar to changes here.
[Feature]: Add governed runtime-capability apply executor for memory-stage profiles #702 — Related: runtime-capability apply objectives addressed by the new apply subcommand and artifact flow.

Possibly related PRs

feat: add session search and trajectory artifact CLIs #905 — Closely overlapping runtime-trajectory export and CLI implementation.
Canonicalize tool retry context across wrapped tool calls #828 — Overlaps observability/tracing additions and instrumentation across same crates.
feat(daemon): emit dry-run memory stage profile payloads #682 — Related to runtime_capability apply/planned-payload and CLI plumbing.

Suggested labels

docs

Suggested reviewers

gh-xj

Poem

🐇 I hop through logs with tiny feet,

I gather turns and rows complete,
From root to leaf I stitch the trail,
Export the tale beyond the vale,
A carrot cheer for trajectory neat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.35% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR successfully implements all coding requirements from `#1089` (runtime trajectory export CLI, stable JSON schema, lineage support, round-trip show command) and `#1092` (capability apply command, planned payloads, idempotent apply, target restriction to managed_skill/programmatic_flow/profile_note_addendum).
Out of Scope Changes check	✅ Passed	All changes are directly aligned with stated objectives: trajectory export surface, capability apply implementation, supporting observability/tracing additions, and related CLI/test/docs work. No unrelated refactoring or out-of-scope modifications detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'feat: close the governed capability ladder from plan to activate' accurately captures one of the two major features (capability apply), but omits the equally significant runtime trajectory export feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch work/refstudy-delivery-20260408

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

crates/app/src/tools/mod.rs (1)
651-653: Only materialize payload metadata when this target will emit.

json_value_kind and top_level_json_keys now run on every core-tool call, even when loongclaw.tools logging is disabled. Since this sits on the hot execution path, it’s worth gating that work behind the active log level.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/app/src/tools/mod.rs` around lines 651 - 653, Only compute payload
metadata eagerly when loongclaw.tools logging is active: guard the calls to
crate::observability::json_value_kind(&request.payload) and
crate::observability::top_level_json_keys(&request.payload) with a check for the
loongclaw.tools log level (e.g. a
crate::observability::is_enabled("loongclaw.tools") or equivalent helper). If
the check is false, avoid calling those functions and set
payload_kind/payload_keys to None/empty (or the existing default) so the hot
path that handles request.payload (referenced as request.payload and
requested_tool_name) does not perform unnecessary work.
crates/app/src/session/repository.rs (1)
1508-1544: Consider consolidating with existing upsert_terminal_outcome to reduce duplication.

This new method is nearly identical to the existing upsert_terminal_outcome (lines 1249-1282), differing only in the session existence check: this method uses load_session_summary_with_legacy_fallback while the existing one uses load_session.

The distinction makes sense for trajectory export (supporting legacy sessions inferred from turn history), but the duplication is notable. Consider extracting a shared private helper that accepts a session-existence-check closure or boolean flag, or document why the split is intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/app/src/session/repository.rs` around lines 1508 - 1544, The two
functions upsert_session_terminal_outcome and upsert_terminal_outcome are nearly
identical except for how they validate session existence
(load_session_summary_with_legacy_fallback vs load_session); refactor by
extracting a private helper (e.g., fn upsert_terminal_outcome_inner(...,
session_exists_check: impl Fn(&str) -> Result<bool, String>) or a flag) that
performs normalization, payload encoding, DB upsert, and record construction,
and call that helper from both upsert_session_terminal_outcome and
upsert_terminal_outcome passing the appropriate existence-check closure (or
boolean) to remove duplication or alternatively add a short comment explaining
the intentional split if you choose not to refactor.
crates/app/src/session/trajectory.rs (1)
221-247: Performance note: Lineage depth computed per-session via DB traversal.

sort_runtime_trajectory_sessions calls repo.session_lineage_depth() for each session, where each call traverses the parent chain via multiple DB queries. For a lineage with N sessions and max depth D, this results in O(N × D) queries.

This is acceptable for typical session lineages but could become slow for deeply nested or very large lineage trees. If this becomes a bottleneck in practice, consider computing depths in a single pass by building an in-memory parent map from the already-loaded session summaries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/app/src/session/trajectory.rs` around lines 221 - 247,
sort_runtime_trajectory_sessions currently calls
SessionRepository::session_lineage_depth for each SessionSummaryRecord which
causes O(N×D) DB traversals; instead, build an in-memory parent map from the
provided sessions (use session.session_id -> session.parent_session_id from the
SessionSummaryRecord list), then compute lineage depths with a single-pass
memoized traversal (or iterative topological/DFS with memoization) into
depth_by_session_id and use that map in the existing sort; keep the sort logic
unchanged and only replace the per-session repo calls with the in-memory depth
computation inside sort_runtime_trajectory_sessions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/app/src/memory/sqlite.rs`:
- Around line 1923-1925: The export reader currently calls row.get(1) into
session_turn_index as a non-null i64 which will error on NULL DB rows; change
the decode to read an Option<i64> (i.e., row.get::<_, Option<i64>>(1)) and
handle None defensively inside the export path (either treat missing
session_turn_index as a default/sentinel, skip legacy rows, or propagate a more
specific recoverable warning) in the function that performs export reads so
legacy/inconsistent rows do not cause a hard failure; update any subsequent uses
of session_turn_index to account for Option and convert/handle the value before
using it.

In `@crates/app/src/tools/mod.rs`:
- Around line 679-698: The logs for tool invocations always show
canonical_tool_name = "tool.invoke", which hides the actual inner tool (e.g.,
file.read, web.fetch); update both the success (tracing::debug!) and error
(tracing::warn!) blocks around the invoke handling to compute and include the
resolved inner tool name (derive from the canonical_name/payload or the value
used to dispatch the inner tool, e.g., call it resolved_tool_name or
inner_tool_name) and add it as a structured field (inner_tool_name =
%resolved_tool_name) alongside requested_tool_name and canonical_tool_name so
each log entry shows the real invoked tool (references: variables
canonical_name, requested_tool_name, payload_keys, outcome, error).

In `@crates/daemon/src/main.rs`:
- Line 42: The debug log currently serializes and logs the full parsed CLI
payload via tracing::debug!(..., command = ?cli.command), which may include
sensitive user-provided text; update the log to avoid emitting raw payloads by
logging a safe marker instead — for example log only the command variant/name or
a sanitized summary (e.g., cli.command.variant() or cli.command.name()) or
explicitly redact sensitive fields before logging; change the tracing::debug!
invocation to emit that sanitized value (or a fixed string like "CLI command
parsed") rather than ?cli.command.

---

Nitpick comments:
In `@crates/app/src/session/repository.rs`:
- Around line 1508-1544: The two functions upsert_session_terminal_outcome and
upsert_terminal_outcome are nearly identical except for how they validate
session existence (load_session_summary_with_legacy_fallback vs load_session);
refactor by extracting a private helper (e.g., fn
upsert_terminal_outcome_inner(..., session_exists_check: impl Fn(&str) ->
Result<bool, String>) or a flag) that performs normalization, payload encoding,
DB upsert, and record construction, and call that helper from both
upsert_session_terminal_outcome and upsert_terminal_outcome passing the
appropriate existence-check closure (or boolean) to remove duplication or
alternatively add a short comment explaining the intentional split if you choose
not to refactor.

In `@crates/app/src/session/trajectory.rs`:
- Around line 221-247: sort_runtime_trajectory_sessions currently calls
SessionRepository::session_lineage_depth for each SessionSummaryRecord which
causes O(N×D) DB traversals; instead, build an in-memory parent map from the
provided sessions (use session.session_id -> session.parent_session_id from the
SessionSummaryRecord list), then compute lineage depths with a single-pass
memoized traversal (or iterative topological/DFS with memoization) into
depth_by_session_id and use that map in the existing sort; keep the sort logic
unchanged and only replace the per-session repo calls with the in-memory depth
computation inside sort_runtime_trajectory_sessions.

In `@crates/app/src/tools/mod.rs`:
- Around line 651-653: Only compute payload metadata eagerly when
loongclaw.tools logging is active: guard the calls to
crate::observability::json_value_kind(&request.payload) and
crate::observability::top_level_json_keys(&request.payload) with a check for the
loongclaw.tools log level (e.g. a
crate::observability::is_enabled("loongclaw.tools") or equivalent helper). If
the check is false, avoid calling those functions and set
payload_kind/payload_keys to None/empty (or the existing default) so the hot
path that handles request.payload (referenced as request.payload and
requested_tool_name) does not perform unnecessary work.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4b329228-511f-429c-927e-07b7c802669b

📥 Commits

Reviewing files that changed from the base of the PR and between 90527de and 8d54cfe.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (29)

AGENTS.md
CLAUDE.md
Cargo.toml
crates/app/Cargo.toml
crates/app/src/acp/manager.rs
crates/app/src/channel/mod.rs
crates/app/src/channel/registry.rs
crates/app/src/lib.rs
crates/app/src/memory/mod.rs
crates/app/src/memory/sqlite.rs
crates/app/src/observability.rs
crates/app/src/provider/request_failover_runtime.rs
crates/app/src/provider/request_session_runtime.rs
crates/app/src/provider/runtime_binding.rs
crates/app/src/session/mod.rs
crates/app/src/session/repository.rs
crates/app/src/session/trajectory.rs
crates/app/src/tools/mod.rs
crates/daemon/Cargo.toml
crates/daemon/src/lib.rs
crates/daemon/src/main.rs
crates/daemon/src/observability.rs
crates/daemon/src/runtime_trajectory_cli.rs
crates/daemon/tests/integration/cli_tests.rs
crates/daemon/tests/integration/migrate_cli.rs
crates/daemon/tests/integration/mod.rs
crates/daemon/tests/integration/runtime_trajectory_cli.rs
docs/product-specs/index.md
docs/product-specs/runtime-trajectory.md

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/daemon/src/runtime_capability_cli.rs`:
- Around line 1379-1439:
persist_runtime_capability_apply_artifact/write_pretty_json_file_create_new
currently opens the final path with create_new(true) which can make the file
visible before the JSON is fully written, causing races; instead, write the
pretty-serialized bytes to a temporary file in the same directory (e.g., path
with a random/suffixed name), fsync the temp file (and optionally its parent
dir), then atomically rename (fs::rename) the temp file to the final path and
map an AlreadyExists error from rename to the same "already exists" string so
persist_runtime_capability_apply_artifact can still load and compare the
existing artifact; update write_pretty_json_file_create_new to perform these
steps and keep the same error messages and return types so callers like
persist_runtime_capability_apply_artifact continue to work unchanged.

In `@crates/daemon/tests/integration/runtime_capability_cli.rs`:
- Around line 2360-2366: The assertion compares report.output_path using
OS-native separators, which fails on Windows; normalize the path before checking
the suffix by passing report.output_path through normalized_path_text(...) (or
otherwise canonicalizing separators) and then assert that
normalized_path_text(&report.output_path).ends_with(&format!("managed_skills/{}.json",
report.applied_artifact.artifact_id)) so the check is platform independent;
update the assertion that references report.output_path and keep the same
message text.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8fd3e86e-0c0f-49ce-8b21-f46cc4be47eb

📥 Commits

Reviewing files that changed from the base of the PR and between 8d54cfe and d4d32f6.

📒 Files selected for processing (5)

crates/daemon/src/runtime_capability_cli.rs
crates/daemon/tests/integration/cli_tests.rs
crates/daemon/tests/integration/runtime_capability_cli.rs
docs/ROADMAP.md
docs/product-specs/runtime-capability.md

✅ Files skipped from review due to trivial changes (2)

docs/ROADMAP.md
docs/product-specs/runtime-capability.md

coderabbitai

♻️ Duplicate comments (1)

crates/daemon/src/runtime_capability_cli.rs (1)

1417-1478: ⚠️ Potential issue | 🔴 Critical

Publish the apply artifact atomically to prevent idempotency races.

write_pretty_json_file_create_new exposes the destination path immediately after create_new(true), before JSON is fully written. A concurrent apply can hit “already exists”, then fail parsing partially written content and incorrectly fail idempotent behavior.

💡 Suggested fix direction

- open final path with create_new(true)
- write JSON directly to final path
+ write full JSON to a temp file in the same directory
+ flush + sync temp file
+ publish to final path using an atomic no-clobber step
+ map "already exists" from publish step to current idempotency branch
+ best-effort cleanup of temp file on failure

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/daemon/src/runtime_capability_cli.rs` around lines 1417 - 1478,
persist_runtime_capability_apply_artifact can race because
write_pretty_json_file_create_new creates the target path before the JSON is
fully written; fix by making write_pretty_json_file_create_new write to a
temporary file in the same directory, flush and sync the temp file
(file.sync_all()), then attempt an atomic fs::rename(temp_path, path); if rename
fails because destination already exists, load the existing artifact (use
load_runtime_capability_apply_artifact) and compare to decide AlreadyApplied vs
error; ensure temp file is removed on error and create parent dirs as currently
done.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/daemon/src/runtime_capability_cli.rs`:
- Around line 1417-1478: persist_runtime_capability_apply_artifact can race
because write_pretty_json_file_create_new creates the target path before the
JSON is fully written; fix by making write_pretty_json_file_create_new write to
a temporary file in the same directory, flush and sync the temp file
(file.sync_all()), then attempt an atomic fs::rename(temp_path, path); if rename
fails because destination already exists, load the existing artifact (use
load_runtime_capability_apply_artifact) and compare to decide AlreadyApplied vs
error; ensure temp file is removed on error and create parent dirs as currently
done.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 89207569-a044-46ba-b1cc-87b0ae67ffcc

📥 Commits

Reviewing files that changed from the base of the PR and between c320728 and c70246f.

📒 Files selected for processing (2)

crates/daemon/src/runtime_capability_cli.rs
crates/daemon/tests/integration/runtime_capability_cli.rs

✅ Files skipped from review due to trivial changes (1)

crates/daemon/tests/integration/runtime_capability_cli.rs

The public agent instructions now make the repository rule explicit: public issues, PRs, and public docs should stay LoongClaw-centric, while deeper cross-project comparisons belong in the internal knowledge base. This captures the requested boundary in the repo-local operator map so future delivery work does not repeat the same public wording mistake. Constraint: AGENTS.md and CLAUDE.md must stay mirrored in the same change Rejected: Add the rule to internal knowledge-base docs only | the mistake happened in public-repo delivery, so the repo-local instructions need the reminder Confidence: high Scope-risk: narrow Reversibility: clean Directive: Mention external projects in public-repo artifacts only when that reference is strictly necessary for user-facing understanding Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features Not-tested: Fix for pre-existing test failure channel::registry::tests::discord_status_splits_config_backed_send_and_stub_serve

LoongClaw already let operators propose, review, index, and plan runtime capabilities, but it still stopped one step short of a reusable incubator workflow. This change adds a read-only apply stage that materializes draft artifacts under the planned delivery surface for managed skills, programmatic flows, and profile-note addenda without mutating live runtime state. The implementation reuses the existing planner output, keeps writes idempotent when the artifact already matches, and extends integration coverage across all three supported target kinds plus the non-promotable gate. Constraint: Capability incubation must stay read-only and must not activate live runtime behavior during apply Rejected: Reuse the older memory-stage-profile-only apply path | it would diverge from the currently supported public target kinds and the governed incubator issue scope Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep future capability-apply payloads aligned with the target kinds exposed in the public runtime-capability surface unless the product spec changes first Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1 Not-tested: Updating the already-open source-repo PR body after adding the apply stage

The capability incubator now exposes a structured planned payload in the runtime-capability plan output, so operators can inspect the exact draft artifact shape before materializing it. The apply path reuses that payload when building managed-skill, programmatic-flow, and profile-note draft artifacts, which keeps the incubator ladder internally consistent instead of duplicating target-specific decisions in separate code paths. The integration suite now checks payload emission for ready, not-ready, and blocked families across the supported target kinds. Constraint: The incubator remains read-only; planning and apply must not activate live runtime behavior Rejected: Keep apply as metadata-only output without a draft payload contract | that would leave the plan/apply ladder underspecified and make later activation work harder to review Confidence: high Scope-risk: moderate Reversibility: clean Directive: If a future target kind is added, extend both the planned payload and apply coverage together so the incubator ladder stays aligned Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features Not-tested: Updating the source-repo PR description to mention the planned payload deepening

The product spec and roadmap now describe the deeper incubator contract that landed in the runtime-capability planner and apply flow. This keeps the public implementation and the documented ladder aligned, so operators can rely on the plan command surfacing a structured draft payload preview before the apply command materializes a governed artifact. The wording stays LoongClaw-centric and scoped to the current runtime-capability surface. Constraint: Public docs must describe only the capability surfaces that actually ship in the source repo Rejected: Leave the planner payload shape undocumented | would make the new incubator step harder to understand and review from product docs alone Confidence: high Scope-risk: narrow Reversibility: clean Directive: When the incubator ladder gains a new visible stage, update both the roadmap and the relevant product spec in the same change Tested: cargo test --workspace; cargo test --workspace --all-features Not-tested: Separate docs-only validation beyond the passing workspace test suites

The capability incubator now carries target-specific draft payloads through both the plan and apply stages. Managed-skill families emit a generated SKILL.md draft, programmatic-flow families emit a deterministic flow.json draft, and profile-note families emit an advisory addendum draft. This keeps the incubator ladder coherent: operators can inspect the exact payload in the runtime-capability plan output, then materialize the same payload with the runtime-capability apply command without hidden target-specific reconstruction. That is the smallest correct step before adding any future activation executor. Constraint: Capability incubation must remain read-only and must not assume a live activation surface already exists Rejected: Add activation first and leave draft payloads metadata-only | that would hide the artifact shape until after mutation and would couple activation logic to undocumented reconstruction rules Confidence: high Scope-risk: moderate Reversibility: clean Directive: If a target kind can be applied, its draft payload should be inspectable in the planner output before any future activation path is added Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features Not-tested: Updating the open PR description to mention self-contained draft payload contents

The incubator ladder now continues past apply. Operators can activate managed-skill and profile-note draft artifacts through an explicit runtime-capability command, while programmatic-flow activation stays fail-closed until a governed runtime surface exists. The same change makes draft payloads self-contained and testable across plan, apply, and activate, which keeps the ladder consistent and gives reviewers concrete artifacts to inspect before and after activation. Constraint: Activation must stay explicit, target-aware, and read-only by default Rejected: Pretend programmatic flows have a live activation surface | would create a fake-success path and hide the missing runtime seam Confidence: high Scope-risk: moderate Reversibility: clean Directive: Do not add a new activatable target kind unless the planner payload, apply artifact, and activate executor all land together Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features Not-tested: Updating the open PR description after adding activate

Activation already had explicit apply and idempotence lanes, but it still trusted successful tool/config mutations more than observed runtime state. This change makes managed-skill and profile-note activation verify the live target after mutation, surfaces verification evidence in the activation report, and carries rollback hints forward from the applied draft artifact. Constraint: programmatic_flow still has no governed activation surface and must remain fail-closed Rejected: trust tool success alone | leaves room for false-success activation reports Confidence: high Scope-risk: narrow Directive: keep each activation target paired with explicit post-apply verification before widening supported surfaces Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1 Tested: cargo test --workspace Tested: cargo test --workspace --all-features Not-tested: manual CLI invocation outside integration fixtures

The activation lane could verify success, but operators still had no governed way to return a draft activation back to its recorded pre-activation state. This change persists activation records, adds an explicit rollback command, and makes supported targets prove both activation and rollback state instead of relying on best-effort manual cleanup. Constraint: programmatic_flow still lacks a governed activation host and must remain fail-closed for both activate and rollback Rejected: delete activation records after rollback | destroys the audit trail operators need for review Confidence: high Scope-risk: moderate Directive: any new activation target must ship with a persisted rollback payload and post-rollback verification before it is considered supported Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1 Tested: cargo test --workspace Tested: cargo test --workspace --all-features Not-tested: manual CLI invocation of rollback against operator-managed real installations

The review follow-ups now fix the concrete observability and artifact robustness issues on the branch, and the surrounding regression tests stop relying on permission-bit behavior that can vary across environments. The result is a cleaner trajectory/apply surface plus deterministic config, external-skill, chat, doctor, restore, and browser-preview failure tests that still validate the intended rollback and fail-closed behavior. Constraint: Verification must stay truthful on contributor machines even when filesystem permission semantics differ Rejected: Leave permission-based tests as-is | they produced environment-dependent false positives and false negatives Confidence: high Scope-risk: moderate Reversibility: clean Directive: Prefer deterministic failure injection or structural invalid fixtures over permission-bit assumptions when testing rollback paths Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; ./scripts/check_architecture_boundaries.sh; ./scripts/check_dep_graph.sh Not-tested: Fresh CI run on GitHub Actions after pushing this branch

The capability ladder cherry-picks landed on top of a newer daemon and test surface than the original branch targeted. This follow-up swaps the digest formatting onto the current hash output type, removes one stale unix-only import, and realigns the trajectory CLI parse expectation with the command shape that already ships on dev. Constraint: The rewritten PR branch must compile and test cleanly against current dev without reintroducing stale interface assumptions Rejected: Keep the old formatting and parser expectations | the rebased branch would fail before verification could even start Confidence: high Scope-risk: narrow Reversibility: clean Directive: When replaying older branch commits onto a moving CLI surface, prefer tiny compatibility shims over broad rewrites Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; ./scripts/check_architecture_boundaries.sh; ./scripts/check_dep_graph.sh Not-tested: GitHub Actions run after force-updating the PR branch

The governance job compares the tracked April 2026 drift report against the current tree. Rebasing the capability ladder onto current dev changed the tools module metrics, so the tracked report needed one fresh regeneration to keep governance truthful. Constraint: Governance requires the checked-in monthly drift report to match the current tree exactly Rejected: Ignore the stale report and rely on CI output alone | governance is designed to fail closed on drift report freshness Confidence: high Scope-risk: narrow Reversibility: clean Directive: When rebasing large feature branches onto dev, regenerate the monthly architecture drift report before re-running governance Tested: bash scripts/generate_architecture_drift_report.sh docs/releases/architecture-drift-2026-04.md; bash scripts/check_architecture_drift_freshness.sh docs/releases/architecture-drift-2026-04.md Not-tested: Full CI rerun after pushing the refreshed report

A few daemon integration expectations still reflected older CLI help text and an earlier symlink failure surface, and one work-unit CLI round-trip needed the shared daemon environment lock to avoid cross-test interference. This keeps the rebased branch aligned with current dev behavior while preserving the intended fail-closed and rollback guarantees. Constraint: The rebased PR branch must pass the current dev integration suite without depending on stale help text or cross-test environment races Rejected: Leave the flaky expectations in place | the branch would stay red even though the underlying capability path was correct Confidence: high Scope-risk: narrow Reversibility: clean Directive: When rebasing onto a newer CLI surface, refresh help-text assertions and add shared daemon env locking where subprocess tests depend on process-global state Tested: cargo test --workspace; cargo test --workspace --all-features Not-tested: Fresh remote CI run after force-updating the PR branch

The governance docs check still treated the roadmap as if were the live enforcement seam. Current dev routes tool approval through policy extensions and execution-layer dispatch, so the roadmap now describes the active architecture instead of the deprecated hook. Constraint: Docs governance fails closed when roadmap wording drifts from the live architecture Rejected: Leave the deprecated hook name in roadmap prose | it would keep governance red and misdescribe the current policy path Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep roadmap references aligned with the active enforcement seam, and mention deprecated hooks only as compatibility notes Tested: scripts/bootstrap_release_local_artifacts.sh; LOONGCLAW_RELEASE_DOCS_STRICT=1 scripts/check-docs.sh Not-tested: Follow-up remote governance rerun after pushing this doc correction

Managed-skill activation succeeded locally but failed on Windows because canonicalized file roots and candidate paths used different prefix forms during the safe-path containment check. Switching those canonicalization points to keeps the path semantics the same while removing the verbatim-prefix mismatch that made in-root staging paths look like escapes. Constraint: Managed-skill activation must remain fail-closed while still working on Windows runners with verbatim path prefixes Rejected: Special-case only the activation staging path string | that would leave the underlying safe-path comparison inconsistent for other file-root callers Confidence: high Scope-risk: narrow Reversibility: clean Directive: When a path comparison is security-sensitive and cross-platform, normalize both sides through the same canonicalization strategy before checking containment Tested: cargo test --test integration integration::runtime_capability_cli::runtime_capability_activate_managed_skill_apply_installs_skill_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_rollback_managed_skill_restores_pre_activation_state_and_is_idempotent -- --exact --nocapture Not-tested: Fresh Windows CI rerun after pushing this path normalization fix

…on paths The Windows CI failures showed two related issues in the safe-path gate: verbatim-prefixed canonical paths could compare unequal to non-prefixed roots, and the external-skill migration path went through a sibling containment helper with the same assumption. This change normalizes both file and migration path checks through the same simplified canonical form so in-root paths stay allowed while real escapes still fail closed. Constraint: Path sandboxing must stay fail-closed while remaining portable across Windows canonical path encodings Rejected: Patch only the managed-skill activation call site | the same mismatch also affected migration-driven external-skill installs Confidence: high Scope-risk: narrow Reversibility: clean Directive: Reuse one cross-platform normalization strategy for every file-root containment check instead of fixing individual callers ad hoc Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --test integration integration::migrate_cli::run_migrate_cli_apply_selected_mode_can_apply_external_skill_plan -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_activate_managed_skill_apply_installs_skill_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_rollback_managed_skill_restores_pre_activation_state_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::work_unit_cli::work_unit_cli_create_claim_complete_and_archive_round_trip -- --exact --nocapture; cargo test --test integration integration::work_unit_cli::work_unit_cli_update_text_output_uses_snake_case_status_labels -- --exact --nocapture Not-tested: Fresh Windows GitHub Actions rerun after pushing this normalization fix

The Windows runners now get simplified canonical paths out of the file-root safety layer, which fixes false escape rejections but also changes the exact path text that some tests compare and some workspace-memory lookups key on. This follow-up normalizes the workspace-memory keying path and updates the affected tests to compare against the same normalized canonical form. Constraint: Cross-platform path-safety fixes must not leave Windows-only assertion or lookup mismatches behind Rejected: Revert the simplified path output just to satisfy old tests | that would reintroduce the original Windows containment bug Confidence: high Scope-risk: narrow Reversibility: clean Directive: When a path helper intentionally changes canonical form, update every lookup key and assertion that depends on that textual representation in the same change Tested: cargo test -p loongclaw-app tools::memory_tools::tests::memory_get_tool_strips_frontmatter_and_surfaces_workspace_metadata --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_returns_bounded_line_window_from_memory_file --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_uses_selected_memory_system_id_in_provenance --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_reads_requested_window_without_loading_invalid_tail --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::config_import_apply_mode_writes_target_config --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::feishu_messages_resource_get_tool_downloads_message_resource_to_safe_file_root --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::workspace_root_tests::file_read_uses_workspace_root_from_trusted_internal_payload --lib -- --exact --nocapture Not-tested: Full Windows CI rerun after pushing the normalized expectation updates

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread crates/app/src/memory/sqlite.rs Outdated

Comment thread crates/app/src/tools/mod.rs Outdated

Comment thread crates/daemon/src/main.rs Outdated

chumyin changed the title ~~feat: add governed runtime trajectory export~~ feat: add governed trajectory export and capability apply Apr 8, 2026

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread crates/daemon/src/runtime_capability_cli.rs

Comment thread crates/daemon/tests/integration/runtime_capability_cli.rs

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

chumyin self-assigned this Apr 8, 2026

github-actions Bot added the migration Onboarding, legacy import, and migration flow. label Apr 8, 2026

chumyin changed the title ~~feat: add governed trajectory export and capability apply~~ feat: complete the governed trajectory and capability ladder Apr 8, 2026

github-actions Bot added the config Runtime config parsing, schema, and defaults. label Apr 8, 2026

chumyin force-pushed the work/refstudy-delivery-20260408 branch from 0ca8a7f to 3bace47 Compare April 8, 2026 11:57

chumyin changed the title ~~feat: complete the governed trajectory and capability ladder~~ feat: close the governed capability ladder from plan to activate Apr 8, 2026

chumyin added 2 commits April 8, 2026 05:02

chumyin added 10 commits April 8, 2026 05:02

chumyin force-pushed the work/refstudy-delivery-20260408 branch from 3bace47 to 8945bf2 Compare April 8, 2026 12:33

chumyin added 4 commits April 8, 2026 05:37

chumyin merged commit f97a769 into dev Apr 8, 2026
18 checks passed

chumyin deleted the work/refstudy-delivery-20260408 branch April 8, 2026 13:53

chumyin mentioned this pull request Apr 8, 2026

[Feature]: Establish a governed capability incubator for bounded self-improvement #1092

Closed

2 tasks

coderabbitai Bot mentioned this pull request Apr 11, 2026

fix(web): sync runtime-facing chat and settings follow-ups #1200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: close the governed capability ladder from plan to activate#1098

feat: close the governed capability ladder from plan to activate#1098
chumyin merged 16 commits into
devfrom
work/refstudy-delivery-20260408

chumyin commented Apr 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chumyin commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linked Issues

Change Type

Touched Areas

Risk Track

Validation

User-visible / Operator-visible Changes

Failure Recovery

Reviewer Focus

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chumyin commented Apr 8, 2026 •

edited

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading