feat: close the governed capability ladder from plan to activate#1098
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded workspace tracing deps and observability helpers; instrumented runtime turn/provider/tool flows with structured logs; introduced a read-only runtime-trajectory export/show CLI, artifact schema, session trajectory exporter, session-repo accessors, daemon CLI wiring, tests, and docs. Also added runtime-capability Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as "Daemon CLI (runtime-trajectory)"
participant Traj as "session::trajectory::export_runtime_trajectory"
participant Repo as "SessionRepository"
participant DB as "SQLite DB"
CLI->>Traj: export_runtime_trajectory(session_id, mode, config, exported_at)
Traj->>Repo: load_session_summary_with_legacy_fallback(session_id)
Repo->>DB: SELECT session summary
DB-->>Repo: session_summary
Repo-->>Traj: session_summary
Traj->>Repo: resolve lineage root & collect session IDs (mode)
Repo->>DB: SELECT sessions (lineage)
DB-->>Repo: session_list
Repo-->>Traj: session_list
loop per session
Traj->>Repo: session_turn_records_direct(session_id)
Repo->>DB: SELECT turn rows for session
DB-->>Repo: PersistedConversationTurnRecord[]
Repo-->>Traj: turn_records
Traj->>Repo: list_all_events(session_id)
Repo->>DB: SELECT events/pages
DB-->>Repo: SessionEventRecord[]
Repo-->>Traj: session_events
Traj->>Repo: load_terminal_outcome(session_id)
Repo->>DB: SELECT terminal_outcomes
DB-->>Repo: terminal_outcome?
Repo-->>Traj: terminal_outcome
Traj->>DB: SELECT approval_requests for session turns
DB-->>Traj: approval_requests[]
end
Traj->>Traj: aggregate statistics & build artifact
Traj-->>CLI: RuntimeTrajectoryArtifactDocument (JSON)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
crates/app/src/tools/mod.rs (1)
651-653: Only materialize payload metadata when this target will emit.
json_value_kindandtop_level_json_keysnow run on every core-tool call, even whenloongclaw.toolslogging is disabled. Since this sits on the hot execution path, it’s worth gating that work behind the active log level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/src/tools/mod.rs` around lines 651 - 653, Only compute payload metadata eagerly when loongclaw.tools logging is active: guard the calls to crate::observability::json_value_kind(&request.payload) and crate::observability::top_level_json_keys(&request.payload) with a check for the loongclaw.tools log level (e.g. a crate::observability::is_enabled("loongclaw.tools") or equivalent helper). If the check is false, avoid calling those functions and set payload_kind/payload_keys to None/empty (or the existing default) so the hot path that handles request.payload (referenced as request.payload and requested_tool_name) does not perform unnecessary work.crates/app/src/session/repository.rs (1)
1508-1544: Consider consolidating with existingupsert_terminal_outcometo reduce duplication.This new method is nearly identical to the existing
upsert_terminal_outcome(lines 1249-1282), differing only in the session existence check: this method usesload_session_summary_with_legacy_fallbackwhile the existing one usesload_session.The distinction makes sense for trajectory export (supporting legacy sessions inferred from turn history), but the duplication is notable. Consider extracting a shared private helper that accepts a session-existence-check closure or boolean flag, or document why the split is intentional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/src/session/repository.rs` around lines 1508 - 1544, The two functions upsert_session_terminal_outcome and upsert_terminal_outcome are nearly identical except for how they validate session existence (load_session_summary_with_legacy_fallback vs load_session); refactor by extracting a private helper (e.g., fn upsert_terminal_outcome_inner(..., session_exists_check: impl Fn(&str) -> Result<bool, String>) or a flag) that performs normalization, payload encoding, DB upsert, and record construction, and call that helper from both upsert_session_terminal_outcome and upsert_terminal_outcome passing the appropriate existence-check closure (or boolean) to remove duplication or alternatively add a short comment explaining the intentional split if you choose not to refactor.crates/app/src/session/trajectory.rs (1)
221-247: Performance note: Lineage depth computed per-session via DB traversal.
sort_runtime_trajectory_sessionscallsrepo.session_lineage_depth()for each session, where each call traverses the parent chain via multiple DB queries. For a lineage with N sessions and max depth D, this results in O(N × D) queries.This is acceptable for typical session lineages but could become slow for deeply nested or very large lineage trees. If this becomes a bottleneck in practice, consider computing depths in a single pass by building an in-memory parent map from the already-loaded session summaries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/app/src/session/trajectory.rs` around lines 221 - 247, sort_runtime_trajectory_sessions currently calls SessionRepository::session_lineage_depth for each SessionSummaryRecord which causes O(N×D) DB traversals; instead, build an in-memory parent map from the provided sessions (use session.session_id -> session.parent_session_id from the SessionSummaryRecord list), then compute lineage depths with a single-pass memoized traversal (or iterative topological/DFS with memoization) into depth_by_session_id and use that map in the existing sort; keep the sort logic unchanged and only replace the per-session repo calls with the in-memory depth computation inside sort_runtime_trajectory_sessions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/app/src/memory/sqlite.rs`:
- Around line 1923-1925: The export reader currently calls row.get(1) into
session_turn_index as a non-null i64 which will error on NULL DB rows; change
the decode to read an Option<i64> (i.e., row.get::<_, Option<i64>>(1)) and
handle None defensively inside the export path (either treat missing
session_turn_index as a default/sentinel, skip legacy rows, or propagate a more
specific recoverable warning) in the function that performs export reads so
legacy/inconsistent rows do not cause a hard failure; update any subsequent uses
of session_turn_index to account for Option and convert/handle the value before
using it.
In `@crates/app/src/tools/mod.rs`:
- Around line 679-698: The logs for tool invocations always show
canonical_tool_name = "tool.invoke", which hides the actual inner tool (e.g.,
file.read, web.fetch); update both the success (tracing::debug!) and error
(tracing::warn!) blocks around the invoke handling to compute and include the
resolved inner tool name (derive from the canonical_name/payload or the value
used to dispatch the inner tool, e.g., call it resolved_tool_name or
inner_tool_name) and add it as a structured field (inner_tool_name =
%resolved_tool_name) alongside requested_tool_name and canonical_tool_name so
each log entry shows the real invoked tool (references: variables
canonical_name, requested_tool_name, payload_keys, outcome, error).
In `@crates/daemon/src/main.rs`:
- Line 42: The debug log currently serializes and logs the full parsed CLI
payload via tracing::debug!(..., command = ?cli.command), which may include
sensitive user-provided text; update the log to avoid emitting raw payloads by
logging a safe marker instead — for example log only the command variant/name or
a sanitized summary (e.g., cli.command.variant() or cli.command.name()) or
explicitly redact sensitive fields before logging; change the tracing::debug!
invocation to emit that sanitized value (or a fixed string like "CLI command
parsed") rather than ?cli.command.
---
Nitpick comments:
In `@crates/app/src/session/repository.rs`:
- Around line 1508-1544: The two functions upsert_session_terminal_outcome and
upsert_terminal_outcome are nearly identical except for how they validate
session existence (load_session_summary_with_legacy_fallback vs load_session);
refactor by extracting a private helper (e.g., fn
upsert_terminal_outcome_inner(..., session_exists_check: impl Fn(&str) ->
Result<bool, String>) or a flag) that performs normalization, payload encoding,
DB upsert, and record construction, and call that helper from both
upsert_session_terminal_outcome and upsert_terminal_outcome passing the
appropriate existence-check closure (or boolean) to remove duplication or
alternatively add a short comment explaining the intentional split if you choose
not to refactor.
In `@crates/app/src/session/trajectory.rs`:
- Around line 221-247: sort_runtime_trajectory_sessions currently calls
SessionRepository::session_lineage_depth for each SessionSummaryRecord which
causes O(N×D) DB traversals; instead, build an in-memory parent map from the
provided sessions (use session.session_id -> session.parent_session_id from the
SessionSummaryRecord list), then compute lineage depths with a single-pass
memoized traversal (or iterative topological/DFS with memoization) into
depth_by_session_id and use that map in the existing sort; keep the sort logic
unchanged and only replace the per-session repo calls with the in-memory depth
computation inside sort_runtime_trajectory_sessions.
In `@crates/app/src/tools/mod.rs`:
- Around line 651-653: Only compute payload metadata eagerly when
loongclaw.tools logging is active: guard the calls to
crate::observability::json_value_kind(&request.payload) and
crate::observability::top_level_json_keys(&request.payload) with a check for the
loongclaw.tools log level (e.g. a
crate::observability::is_enabled("loongclaw.tools") or equivalent helper). If
the check is false, avoid calling those functions and set
payload_kind/payload_keys to None/empty (or the existing default) so the hot
path that handles request.payload (referenced as request.payload and
requested_tool_name) does not perform unnecessary work.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4b329228-511f-429c-927e-07b7c802669b
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
AGENTS.mdCLAUDE.mdCargo.tomlcrates/app/Cargo.tomlcrates/app/src/acp/manager.rscrates/app/src/channel/mod.rscrates/app/src/channel/registry.rscrates/app/src/lib.rscrates/app/src/memory/mod.rscrates/app/src/memory/sqlite.rscrates/app/src/observability.rscrates/app/src/provider/request_failover_runtime.rscrates/app/src/provider/request_session_runtime.rscrates/app/src/provider/runtime_binding.rscrates/app/src/session/mod.rscrates/app/src/session/repository.rscrates/app/src/session/trajectory.rscrates/app/src/tools/mod.rscrates/daemon/Cargo.tomlcrates/daemon/src/lib.rscrates/daemon/src/main.rscrates/daemon/src/observability.rscrates/daemon/src/runtime_trajectory_cli.rscrates/daemon/tests/integration/cli_tests.rscrates/daemon/tests/integration/migrate_cli.rscrates/daemon/tests/integration/mod.rscrates/daemon/tests/integration/runtime_trajectory_cli.rsdocs/product-specs/index.mddocs/product-specs/runtime-trajectory.md
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/daemon/src/runtime_capability_cli.rs`:
- Around line 1379-1439:
persist_runtime_capability_apply_artifact/write_pretty_json_file_create_new
currently opens the final path with create_new(true) which can make the file
visible before the JSON is fully written, causing races; instead, write the
pretty-serialized bytes to a temporary file in the same directory (e.g., path
with a random/suffixed name), fsync the temp file (and optionally its parent
dir), then atomically rename (fs::rename) the temp file to the final path and
map an AlreadyExists error from rename to the same "already exists" string so
persist_runtime_capability_apply_artifact can still load and compare the
existing artifact; update write_pretty_json_file_create_new to perform these
steps and keep the same error messages and return types so callers like
persist_runtime_capability_apply_artifact continue to work unchanged.
In `@crates/daemon/tests/integration/runtime_capability_cli.rs`:
- Around line 2360-2366: The assertion compares report.output_path using
OS-native separators, which fails on Windows; normalize the path before checking
the suffix by passing report.output_path through normalized_path_text(...) (or
otherwise canonicalizing separators) and then assert that
normalized_path_text(&report.output_path).ends_with(&format!("managed_skills/{}.json",
report.applied_artifact.artifact_id)) so the check is platform independent;
update the assertion that references report.output_path and keep the same
message text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8fd3e86e-0c0f-49ce-8b21-f46cc4be47eb
📒 Files selected for processing (5)
crates/daemon/src/runtime_capability_cli.rscrates/daemon/tests/integration/cli_tests.rscrates/daemon/tests/integration/runtime_capability_cli.rsdocs/ROADMAP.mddocs/product-specs/runtime-capability.md
✅ Files skipped from review due to trivial changes (2)
- docs/ROADMAP.md
- docs/product-specs/runtime-capability.md
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/daemon/src/runtime_capability_cli.rs (1)
1417-1478:⚠️ Potential issue | 🔴 CriticalPublish the apply artifact atomically to prevent idempotency races.
write_pretty_json_file_create_newexposes the destination path immediately aftercreate_new(true), before JSON is fully written. A concurrentapplycan hit “already exists”, then fail parsing partially written content and incorrectly fail idempotent behavior.💡 Suggested fix direction
- open final path with create_new(true) - write JSON directly to final path + write full JSON to a temp file in the same directory + flush + sync temp file + publish to final path using an atomic no-clobber step + map "already exists" from publish step to current idempotency branch + best-effort cleanup of temp file on failure🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/daemon/src/runtime_capability_cli.rs` around lines 1417 - 1478, persist_runtime_capability_apply_artifact can race because write_pretty_json_file_create_new creates the target path before the JSON is fully written; fix by making write_pretty_json_file_create_new write to a temporary file in the same directory, flush and sync the temp file (file.sync_all()), then attempt an atomic fs::rename(temp_path, path); if rename fails because destination already exists, load the existing artifact (use load_runtime_capability_apply_artifact) and compare to decide AlreadyApplied vs error; ensure temp file is removed on error and create parent dirs as currently done.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/daemon/src/runtime_capability_cli.rs`:
- Around line 1417-1478: persist_runtime_capability_apply_artifact can race
because write_pretty_json_file_create_new creates the target path before the
JSON is fully written; fix by making write_pretty_json_file_create_new write to
a temporary file in the same directory, flush and sync the temp file
(file.sync_all()), then attempt an atomic fs::rename(temp_path, path); if rename
fails because destination already exists, load the existing artifact (use
load_runtime_capability_apply_artifact) and compare to decide AlreadyApplied vs
error; ensure temp file is removed on error and create parent dirs as currently
done.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 89207569-a044-46ba-b1cc-87b0ae67ffcc
📒 Files selected for processing (2)
crates/daemon/src/runtime_capability_cli.rscrates/daemon/tests/integration/runtime_capability_cli.rs
✅ Files skipped from review due to trivial changes (1)
- crates/daemon/tests/integration/runtime_capability_cli.rs
0ca8a7f to
3bace47
Compare
The public agent instructions now make the repository rule explicit: public issues, PRs, and public docs should stay LoongClaw-centric, while deeper cross-project comparisons belong in the internal knowledge base. This captures the requested boundary in the repo-local operator map so future delivery work does not repeat the same public wording mistake. Constraint: AGENTS.md and CLAUDE.md must stay mirrored in the same change Rejected: Add the rule to internal knowledge-base docs only | the mistake happened in public-repo delivery, so the repo-local instructions need the reminder Confidence: high Scope-risk: narrow Reversibility: clean Directive: Mention external projects in public-repo artifacts only when that reference is strictly necessary for user-facing understanding Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features Not-tested: Fix for pre-existing test failure channel::registry::tests::discord_status_splits_config_backed_send_and_stub_serve
LoongClaw already let operators propose, review, index, and plan runtime capabilities, but it still stopped one step short of a reusable incubator workflow. This change adds a read-only apply stage that materializes draft artifacts under the planned delivery surface for managed skills, programmatic flows, and profile-note addenda without mutating live runtime state. The implementation reuses the existing planner output, keeps writes idempotent when the artifact already matches, and extends integration coverage across all three supported target kinds plus the non-promotable gate. Constraint: Capability incubation must stay read-only and must not activate live runtime behavior during apply Rejected: Reuse the older memory-stage-profile-only apply path | it would diverge from the currently supported public target kinds and the governed incubator issue scope Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep future capability-apply payloads aligned with the target kinds exposed in the public runtime-capability surface unless the product spec changes first Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1 Not-tested: Updating the already-open source-repo PR body after adding the apply stage
The capability incubator now exposes a structured planned payload in the runtime-capability plan output, so operators can inspect the exact draft artifact shape before materializing it. The apply path reuses that payload when building managed-skill, programmatic-flow, and profile-note draft artifacts, which keeps the incubator ladder internally consistent instead of duplicating target-specific decisions in separate code paths. The integration suite now checks payload emission for ready, not-ready, and blocked families across the supported target kinds. Constraint: The incubator remains read-only; planning and apply must not activate live runtime behavior Rejected: Keep apply as metadata-only output without a draft payload contract | that would leave the plan/apply ladder underspecified and make later activation work harder to review Confidence: high Scope-risk: moderate Reversibility: clean Directive: If a future target kind is added, extend both the planned payload and apply coverage together so the incubator ladder stays aligned Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features Not-tested: Updating the source-repo PR description to mention the planned payload deepening
The product spec and roadmap now describe the deeper incubator contract that landed in the runtime-capability planner and apply flow. This keeps the public implementation and the documented ladder aligned, so operators can rely on the plan command surfacing a structured draft payload preview before the apply command materializes a governed artifact. The wording stays LoongClaw-centric and scoped to the current runtime-capability surface. Constraint: Public docs must describe only the capability surfaces that actually ship in the source repo Rejected: Leave the planner payload shape undocumented | would make the new incubator step harder to understand and review from product docs alone Confidence: high Scope-risk: narrow Reversibility: clean Directive: When the incubator ladder gains a new visible stage, update both the roadmap and the relevant product spec in the same change Tested: cargo test --workspace; cargo test --workspace --all-features Not-tested: Separate docs-only validation beyond the passing workspace test suites
The capability incubator now carries target-specific draft payloads through both the plan and apply stages. Managed-skill families emit a generated SKILL.md draft, programmatic-flow families emit a deterministic flow.json draft, and profile-note families emit an advisory addendum draft. This keeps the incubator ladder coherent: operators can inspect the exact payload in the runtime-capability plan output, then materialize the same payload with the runtime-capability apply command without hidden target-specific reconstruction. That is the smallest correct step before adding any future activation executor. Constraint: Capability incubation must remain read-only and must not assume a live activation surface already exists Rejected: Add activation first and leave draft payloads metadata-only | that would hide the artifact shape until after mutation and would couple activation logic to undocumented reconstruction rules Confidence: high Scope-risk: moderate Reversibility: clean Directive: If a target kind can be applied, its draft payload should be inspectable in the planner output before any future activation path is added Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features Not-tested: Updating the open PR description to mention self-contained draft payload contents
The incubator ladder now continues past apply. Operators can activate managed-skill and profile-note draft artifacts through an explicit runtime-capability command, while programmatic-flow activation stays fail-closed until a governed runtime surface exists. The same change makes draft payloads self-contained and testable across plan, apply, and activate, which keeps the ladder consistent and gives reviewers concrete artifacts to inspect before and after activation. Constraint: Activation must stay explicit, target-aware, and read-only by default Rejected: Pretend programmatic flows have a live activation surface | would create a fake-success path and hide the missing runtime seam Confidence: high Scope-risk: moderate Reversibility: clean Directive: Do not add a new activatable target kind unless the planner payload, apply artifact, and activate executor all land together Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1; cargo test --workspace; cargo test --workspace --all-features Not-tested: Updating the open PR description after adding activate
Activation already had explicit apply and idempotence lanes, but it still trusted successful tool/config mutations more than observed runtime state. This change makes managed-skill and profile-note activation verify the live target after mutation, surfaces verification evidence in the activation report, and carries rollback hints forward from the applied draft artifact. Constraint: programmatic_flow still has no governed activation surface and must remain fail-closed Rejected: trust tool success alone | leaves room for false-success activation reports Confidence: high Scope-risk: narrow Directive: keep each activation target paired with explicit post-apply verification before widening supported surfaces Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1 Tested: cargo test --workspace Tested: cargo test --workspace --all-features Not-tested: manual CLI invocation outside integration fixtures
The activation lane could verify success, but operators still had no governed way to return a draft activation back to its recorded pre-activation state. This change persists activation records, adds an explicit rollback command, and makes supported targets prove both activation and rollback state instead of relying on best-effort manual cleanup. Constraint: programmatic_flow still lacks a governed activation host and must remain fail-closed for both activate and rollback Rejected: delete activation records after rollback | destroys the audit trail operators need for review Confidence: high Scope-risk: moderate Directive: any new activation target must ship with a persisted rollback payload and post-rollback verification before it is considered supported Tested: cargo fmt --all -- --check Tested: cargo clippy --workspace --all-targets --all-features -- -D warnings Tested: cargo test -p loongclaw-daemon runtime_capability --test integration -- --test-threads=1 Tested: cargo test --workspace Tested: cargo test --workspace --all-features Not-tested: manual CLI invocation of rollback against operator-managed real installations
The review follow-ups now fix the concrete observability and artifact robustness issues on the branch, and the surrounding regression tests stop relying on permission-bit behavior that can vary across environments. The result is a cleaner trajectory/apply surface plus deterministic config, external-skill, chat, doctor, restore, and browser-preview failure tests that still validate the intended rollback and fail-closed behavior. Constraint: Verification must stay truthful on contributor machines even when filesystem permission semantics differ Rejected: Leave permission-based tests as-is | they produced environment-dependent false positives and false negatives Confidence: high Scope-risk: moderate Reversibility: clean Directive: Prefer deterministic failure injection or structural invalid fixtures over permission-bit assumptions when testing rollback paths Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; ./scripts/check_architecture_boundaries.sh; ./scripts/check_dep_graph.sh Not-tested: Fresh CI run on GitHub Actions after pushing this branch
The capability ladder cherry-picks landed on top of a newer daemon and test surface than the original branch targeted. This follow-up swaps the digest formatting onto the current hash output type, removes one stale unix-only import, and realigns the trajectory CLI parse expectation with the command shape that already ships on dev. Constraint: The rewritten PR branch must compile and test cleanly against current dev without reintroducing stale interface assumptions Rejected: Keep the old formatting and parser expectations | the rebased branch would fail before verification could even start Confidence: high Scope-risk: narrow Reversibility: clean Directive: When replaying older branch commits onto a moving CLI surface, prefer tiny compatibility shims over broad rewrites Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --workspace; cargo test --workspace --all-features; ./scripts/check_architecture_boundaries.sh; ./scripts/check_dep_graph.sh Not-tested: GitHub Actions run after force-updating the PR branch
The governance job compares the tracked April 2026 drift report against the current tree. Rebasing the capability ladder onto current dev changed the tools module metrics, so the tracked report needed one fresh regeneration to keep governance truthful. Constraint: Governance requires the checked-in monthly drift report to match the current tree exactly Rejected: Ignore the stale report and rely on CI output alone | governance is designed to fail closed on drift report freshness Confidence: high Scope-risk: narrow Reversibility: clean Directive: When rebasing large feature branches onto dev, regenerate the monthly architecture drift report before re-running governance Tested: bash scripts/generate_architecture_drift_report.sh docs/releases/architecture-drift-2026-04.md; bash scripts/check_architecture_drift_freshness.sh docs/releases/architecture-drift-2026-04.md Not-tested: Full CI rerun after pushing the refreshed report
A few daemon integration expectations still reflected older CLI help text and an earlier symlink failure surface, and one work-unit CLI round-trip needed the shared daemon environment lock to avoid cross-test interference. This keeps the rebased branch aligned with current dev behavior while preserving the intended fail-closed and rollback guarantees. Constraint: The rebased PR branch must pass the current dev integration suite without depending on stale help text or cross-test environment races Rejected: Leave the flaky expectations in place | the branch would stay red even though the underlying capability path was correct Confidence: high Scope-risk: narrow Reversibility: clean Directive: When rebasing onto a newer CLI surface, refresh help-text assertions and add shared daemon env locking where subprocess tests depend on process-global state Tested: cargo test --workspace; cargo test --workspace --all-features Not-tested: Fresh remote CI run after force-updating the PR branch
3bace47 to
8945bf2
Compare
The governance docs check still treated the roadmap as if were the live enforcement seam. Current dev routes tool approval through policy extensions and execution-layer dispatch, so the roadmap now describes the active architecture instead of the deprecated hook. Constraint: Docs governance fails closed when roadmap wording drifts from the live architecture Rejected: Leave the deprecated hook name in roadmap prose | it would keep governance red and misdescribe the current policy path Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep roadmap references aligned with the active enforcement seam, and mention deprecated hooks only as compatibility notes Tested: scripts/bootstrap_release_local_artifacts.sh; LOONGCLAW_RELEASE_DOCS_STRICT=1 scripts/check-docs.sh Not-tested: Follow-up remote governance rerun after pushing this doc correction
Managed-skill activation succeeded locally but failed on Windows because canonicalized file roots and candidate paths used different prefix forms during the safe-path containment check. Switching those canonicalization points to keeps the path semantics the same while removing the verbatim-prefix mismatch that made in-root staging paths look like escapes. Constraint: Managed-skill activation must remain fail-closed while still working on Windows runners with verbatim path prefixes Rejected: Special-case only the activation staging path string | that would leave the underlying safe-path comparison inconsistent for other file-root callers Confidence: high Scope-risk: narrow Reversibility: clean Directive: When a path comparison is security-sensitive and cross-platform, normalize both sides through the same canonicalization strategy before checking containment Tested: cargo test --test integration integration::runtime_capability_cli::runtime_capability_activate_managed_skill_apply_installs_skill_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_rollback_managed_skill_restores_pre_activation_state_and_is_idempotent -- --exact --nocapture Not-tested: Fresh Windows CI rerun after pushing this path normalization fix
…on paths The Windows CI failures showed two related issues in the safe-path gate: verbatim-prefixed canonical paths could compare unequal to non-prefixed roots, and the external-skill migration path went through a sibling containment helper with the same assumption. This change normalizes both file and migration path checks through the same simplified canonical form so in-root paths stay allowed while real escapes still fail closed. Constraint: Path sandboxing must stay fail-closed while remaining portable across Windows canonical path encodings Rejected: Patch only the managed-skill activation call site | the same mismatch also affected migration-driven external-skill installs Confidence: high Scope-risk: narrow Reversibility: clean Directive: Reuse one cross-platform normalization strategy for every file-root containment check instead of fixing individual callers ad hoc Tested: cargo fmt --all -- --check; cargo clippy --workspace --all-targets --all-features -- -D warnings; cargo test --test integration integration::migrate_cli::run_migrate_cli_apply_selected_mode_can_apply_external_skill_plan -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_activate_managed_skill_apply_installs_skill_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::runtime_capability_cli::runtime_capability_rollback_managed_skill_restores_pre_activation_state_and_is_idempotent -- --exact --nocapture; cargo test --test integration integration::work_unit_cli::work_unit_cli_create_claim_complete_and_archive_round_trip -- --exact --nocapture; cargo test --test integration integration::work_unit_cli::work_unit_cli_update_text_output_uses_snake_case_status_labels -- --exact --nocapture Not-tested: Fresh Windows GitHub Actions rerun after pushing this normalization fix
The Windows runners now get simplified canonical paths out of the file-root safety layer, which fixes false escape rejections but also changes the exact path text that some tests compare and some workspace-memory lookups key on. This follow-up normalizes the workspace-memory keying path and updates the affected tests to compare against the same normalized canonical form. Constraint: Cross-platform path-safety fixes must not leave Windows-only assertion or lookup mismatches behind Rejected: Revert the simplified path output just to satisfy old tests | that would reintroduce the original Windows containment bug Confidence: high Scope-risk: narrow Reversibility: clean Directive: When a path helper intentionally changes canonical form, update every lookup key and assertion that depends on that textual representation in the same change Tested: cargo test -p loongclaw-app tools::memory_tools::tests::memory_get_tool_strips_frontmatter_and_surfaces_workspace_metadata --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_returns_bounded_line_window_from_memory_file --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_uses_selected_memory_system_id_in_provenance --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::memory_get_tool_reads_requested_window_without_loading_invalid_tail --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::config_import_apply_mode_writes_target_config --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::tests::feishu_messages_resource_get_tool_downloads_message_resource_to_safe_file_root --lib -- --exact --nocapture; cargo test -p loongclaw-app tools::workspace_root_tests::file_read_uses_workspace_root_from_trusted_internal_payload --lib -- --exact --nocapture Not-tested: Full Windows CI rerun after pushing the normalized expectation updates
Summary
runtime-capability applysupport formanaged_skill,programmatic_flow, andprofile_note_addendumfamilies.runtime-capability planso operators can inspect structured draft payloads before materializing artifacts.runtime-capability activatesupport for managed-skill and profile-note targets, while keeping programmatic-flow activation fail-closed until a real runtime surface exists.dev.Linked Issues
Change Type
Touched Areas
Risk Track
Validation
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features -- -D warningscargo test --workspacecargo test --workspace --all-featuresCommands and evidence:
User-visible / Operator-visible Changes
loongclaw runtime-capability apply --root <path> --family-id <id> [--json]loongclaw runtime-capability activate --artifact <path> [--config <path>] [--apply] [--replace] [--json]runtime-capability plannow surfaces structured draft payload details that match the applied artifact content.Failure Recovery
memory_stage_profilepayloads instead of the new target kindsReviewer Focus
crates/daemon/src/runtime_capability_cli.rscrates/daemon/tests/integration/runtime_capability_cli.rsdocs/product-specs/runtime-capability.md,docs/ROADMAP.mdcrates/app/src/config/runtime.rs,crates/app/src/tools/external_skills.rs,crates/daemon/tests/integration/*