Conversation
Collaborator
chupaSV
commented
Mar 4, 2026
- Add Docker CLI to container for sandbox mode detection
- Add Odoo WASM channel (channels-src/odoo)
- Add Default::default() to HttpCapability for forward compatibility
- Add Docker CLI to container for sandbox mode detection - Add Odoo WASM channel (channels-src/odoo) - Add Default::default() to HttpCapability for forward compatibility
chupaSV
pushed a commit
that referenced
this pull request
Mar 5, 2026
* refactor: extract shared assertion helpers to support/assertions.rs Move 5 assertion helpers from e2e_spot_checks.rs to a shared module. Add assert_all_tools_succeeded and assert_tool_succeeded for eliminating false positives in E2E tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool output capture via tool_results() accessor Extract (name, preview) from ToolResult status events in TestChannel and TestRig, enabling content assertions on tool outputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct tool parameters in 3 broken trace fixtures - tool_time.json: add missing "operation": "now" for time tool - robust_correct_tool.json: same fix - memory_full_cycle.json: change "path" to "target" for memory_write Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add tool success and output assertions to eliminate false positives Every E2E test that exercises tools now calls assert_all_tools_succeeded. Added tool output content assertions where tool results are predictable (time year, read_file content, memory_read content). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: capture per-tool timing from ToolStarted/ToolCompleted events Record Instant on ToolStarted and compute elapsed duration on ToolCompleted, wiring real timing data into collect_metrics() instead of hardcoded zeros. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: add RAII CleanupGuard for temp file/dir cleanup in tests Replace manual cleanup_test_dir() calls and inline remove_file() with Drop-based CleanupGuard that ensures cleanup even if a test panics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Drop impl and graceful shutdown for TestRig Wrap agent_handle in Option so Drop can abort leaked tasks. Signal the channel shutdown before aborting for future cooperative shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace agent startup sleep with oneshot ready signal Use a oneshot channel fired in Channel::start() instead of a fixed 100ms sleep, eliminating the race condition on slow systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace fragile string-matching iteration limit with count-based detection Use tool completion count vs max_tool_iterations instead of scanning status messages for "iteration"/"limit" substrings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use assert_all_tools_succeeded for memory_full_cycle test Remove incorrect comment about memory_tree failing with empty path (it actually succeeds). Omit empty path from fixture and use the standard assert_all_tools_succeeded instead of per-tool assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: promote benchmark metrics types to library code Move TraceMetrics, ScenarioResult, RunResult, MetricDelta, and compare_runs() from tests/support/metrics.rs to src/benchmark/metrics.rs. Existing tests use re-export for backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add Scenario and Criterion types for agent benchmarking Scenario defines a task with input, success criteria, and resource limits. Criterion is an enum of programmatic checks (tool_used, response_contains, etc.) evaluated without LLM judgment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add initial benchmark scenario suite (12 scenarios across 5 categories) Scenarios cover tool_selection, tool_chaining, error_recovery, efficiency, and memory_operations. All loaded from JSON with deserialization validation test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add benchmark runner with BenchChannel and InstrumentedLlm BenchChannel is a minimal Channel implementation for benchmarks. InstrumentedLlm wraps any LlmProvider to capture per-call metrics. Runner creates a fresh agent per scenario, evaluates success criteria, and produces RunResult with timing, token, and cost metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add baseline management, reports, and benchmark entry point - baseline.rs: load/save/promote benchmark results - report.rs: format comparison reports with regression detection - benchmark_runner.rs: integration test with real LLM (feature-gated) - Add benchmark feature flag to Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply cargo fmt to benchmark module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add multi-turn scenario types with setup, judge, ResponseNotContains Add BenchScenario, Turn, TurnAssertions, JudgeConfig, ScenarioSetup, WorkspaceSetup, SeedDocument types for multi-turn benchmark scenarios. Add ResponseNotContains criterion variant. Add TurnAssertions::to_criteria() converter for backward compat with existing evaluation engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add JSON scenario loader with recursive discovery and tag filter Add load_bench_scenarios() for the new BenchScenario format with recursive directory traversal and tag-based filtering. Create 4 initial trajectory scenarios across tool-selection, multi-turn, and efficiency categories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): multi-turn runner with workspace seeding and per-turn metrics Add run_bench_scenario() that loops over BenchScenario turns, seeds workspace documents, collects per-turn metrics (tokens, tool calls, wall time), and evaluates per-turn assertions. Add TurnMetrics to metrics.rs and clear_for_next_turn() to BenchChannel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add LLM-as-judge scoring with prompt formatting and score parsing Create judge.rs with format_judge_prompt, parse_judge_score, and judge_turn. Wire into run_bench_scenario for turns with judge config -- scores below min_score fail the turn. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add CLI subcommand (ironclaw benchmark) Add BenchmarkCommand with --tags, --scenario, --no-judge, --timeout, --update-baseline flags. Wire into Command enum and main.rs dispatch. Feature-gated behind benchmark flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): per-scenario JSON output with full trajectory Add save_scenario_results() that writes per-scenario JSON files alongside the run summary. Each scenario gets its own file with turn_metrics trajectory. Update CLI to use new output format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add ToolRegistry::retain_only and wire tool filtering in scenarios Add a retain_only() method to ToolRegistry that filters tools down to a given allowlist. Wire this into run_bench_scenario() so that when a scenario specifies a tools list in its setup, only those tools are available during the benchmark run. Includes two tests for the new method: one verifying filtering works and one verifying empty input is a no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): wire identity overrides into workspace before agent start Add seed_identity() helper that writes identity files (IDENTITY.md, USER.md, etc.) into the workspace before the agent starts, so that workspace.system_prompt() picks them up. Wire it into run_bench_scenario() after workspace seeding. Include a test that verifies identity files are written and readable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add --parallel and --max-cost CLI flags Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(benchmark): use feature-conditional snapshot names for CLI help tests Prevents snapshot conflicts between default (no benchmark) and all-features (with benchmark) builds by using separate snapshot names per feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): parallel execution with JoinSet and budget cap enforcement Replace sequential loop in run_all_bench() with parallel execution using JoinSet + semaphore when config.parallel > 1. Add budget cap enforcement that skips remaining scenarios when max_total_cost_usd is exceeded. Track skipped count in RunResult.skipped_scenarios and display it in format_report(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add tool restriction and identity override test scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: fix formatting for Phase 3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add SkillRegistry::retain_only and wire skill filtering in scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add --json flag for machine-readable output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add GitHub Actions benchmark workflow (manual trigger) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(benchmark): remove in-tree benchmark harness, keep retain_only utilities Move benchmark-specific code out of ironclaw in preparation for the nearai/benchmarks trajectory adapter. This removes: - src/benchmark/ (runner, scenarios, metrics, judge, report, etc.) - src/cli/benchmark.rs and the Benchmark CLI subcommand - benchmarks/ data directory (scenarios + trajectories) - .github/workflows/benchmark.yml - The "benchmark" Cargo feature flag What remains: - ToolRegistry::retain_only() and SkillRegistry::retain_only() - Test support types (TraceMetrics, InstrumentedLlm) inlined into tests/support/ instead of re-exporting from the deleted module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add README for LLM trace fixture format Documents the trajectory JSON format, response types, request hints, directory structure, and how to write new traces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(test): unify trace format around turns, add multi-turn support Introduce TraceTurn type that groups user_input with LLM response steps, making traces self-contained conversation trajectories. Add run_trace() to TestRig for automatic multi-turn replay. Backward-compatible: flat "steps" JSON is deserialized as a single turn transparently. Includes all trace fixtures (spot, coverage, advanced), plan docs, and new e2e tests for steering, error recovery, long chains, memory, and prompt injection resilience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): fix CI failures after merging main - Fix tool_json fixture: use "data" parameter (not "input") to match JsonTool schema - Fix status_events test: remove assertion for "time" tool that isn't in the fixture (only "echo" calls are used) - Allow dead_code in test support metrics/instrumented_llm modules (utilities for future benchmark tests) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Working on recording traces and testing them * feat(test): add declarative expects to trace fixtures, split infra tests Add TraceExpects struct with 9 optional assertion fields (response_contains, tools_used, all_tools_succeeded, etc.) that can be declared in fixture JSON instead of hand-written Rust. Add verify_expects() and run_recorded_trace() so recorded trace tests become one-liners. Split trace infra tests (deserialization, backward compat) into tests/trace_format.rs which doesn't require the libsql feature gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): add expects to all trace fixtures, simplify e2e tests Add declarative expects blocks to all 19 trace fixture JSONs across spot/, coverage/, advanced/, and root directories. Update all 8 e2e test files to use verify_trace_expects() / run_and_verify_trace(), replacing ~270 lines of hand-written assertions with fixture-driven verification. Tests that check things beyond expects (file content on disk, metrics, event ordering) keep those extra assertions alongside the declarative ones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): adapt tests to AppBuilder refactor, fix formatting Update test files to work with refactored TestRigBuilder that uses AppBuilder::build_all() (removing with_tools/with_workspace methods). Update telegram_check fixture to use tool_list instead of echo. Fix cargo fmt issues in src/llm/mod.rs and src/llm/recording.rs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): deduplicate support unit tests into single binary Support modules (assertions, cleanup, test_channel, test_rig, trace_llm) had #[cfg(test)] mod tests blocks that were compiled and run 12 times — once per e2e test binary that declares `mod support;`. Extracted all 29 support unit tests into a dedicated `tests/support_unit_tests.rs` so they run exactly once. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix trailing newlines in support files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): unify trace types and fix recorded multi-turn replay Import shared types (TraceStep, TraceResponse, TraceToolCall, RequestHint, ExpectedToolResult, MemorySnapshotEntry, HttpExchange*) from ironclaw::llm::recording instead of redefining them in trace_llm.rs. Fix the flat-steps deserializer to split at UserInput boundaries into multiple turns, instead of filtering them out and wrapping everything into a single turn. This enables recorded multi-turn traces to be replayed as proper multi-turn conversations via run_trace(). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): fix CI failures - unused imports and missing struct fields - Add #[allow(unused_imports)] on pub use re-exports in trace_llm.rs (types are re-exported for downstream test files, not used locally) - Add `..` to ToolCompleted pattern in test_channel.rs to match new `error` and `parameters` fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): fix CI failures after merging main - Add missing `error` and `parameters` fields to ToolCompleted constructors in support_unit_tests.rs - Add `..` to ToolCompleted pattern match in support_unit_tests.rs - Add #[allow(dead_code)] to CleanupGuard, LlmTrace impl, and TraceLlm impl (only used behind #[cfg(feature = "libsql")]) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adding coverage running script * fix(test): address review feedback on E2E test infrastructure - Increase wait_for_responses polling to exponential backoff (50ms-500ms) and raise default timeout from 15s to 30s to reduce CI flakiness (#1) - Strengthen prompt_injection_resilience test with positive safety layer assertion via has_safety_warnings(), enable injection_check (#2) - Add assert_tool_order() helper and tools_order field in TraceExpects for verifying tool execution ordering in multi-step traces (#3) - Document TraceLlm sequential-call assumption for concurrency (#6) - Clean up CleanupGuard with PathKind enum instead of shotgun remove_file + remove_dir_all on every path (#8) - Fix coverage.sh: default to --lib only, fix multi-filter syntax, add COV_ALL_TARGETS option - Add coverage/ to .gitignore - Remove planning docs from PR [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review - use HashSet in retain_only, improve skill test - Use HashSet for O(N+M) lookup in SkillRegistry::retain_only and ToolRegistry::retain_only instead of linear scan - Strengthen test_retain_only_empty_is_noop in SkillRegistry to pre-populate with a skill before asserting the no-op behavior [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): revert incorrect safety layer assertion in injection test The safety layer sanitizes tool output, not user input. The injection test sends a malicious user message with no tools called, so the safety layer never fires. Reverted to the original test which correctly validates the LLM refuses via trace expects. Also fixed case-sensitive request hint ("ignore" -> "Ignore") to suppress noisy warning. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clean stale profdata before coverage run Adds `cargo llvm-cov clean` before each run to prevent "mismatched data" warnings from stale instrumentation profiles. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix formatting in retain_only test [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
main to Odoo
chupaSV
pushed a commit
that referenced
this pull request
Mar 10, 2026
* test: add WIT compatibility tests for all WASM tools and channels Adds CI and integration tests to catch WIT interface breakage across all 14 WASM extensions (10 tools + 4 channels). Previously, changing wit/tool.wit or wit/channel.wit could silently break guest-side tools that weren't rebuilt until release time. Three new pieces: 1. scripts/build-wasm-extensions.sh — builds all WASM extensions from source by reading registry manifests. Used by CI and locally. 2. tests/wit_compat.rs — integration tests that compile and instantiate each .wasm binary against the current wasmtime host linker with stubbed host functions. Catches added/removed/renamed WIT functions, signature mismatches, and missing exports. Skips gracefully when artifacts aren't built so `cargo test` still passes standalone. 3. .github/workflows/test.yml — new wasm-wit-compat CI job that builds all extensions then runs instantiation tests on every PR. Added to the branch protection roll-up. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix rustfmt formatting in wit_compat tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback on WIT compat tests - Switch build script from python3 to jq for JSON parsing, consistent with release.yml and avoids python3 dependency (#1, #7) - Use dirs::home_dir() instead of HOME env var for portability (#2) - Filter extensions by manifest "kind" field instead of path (#3) - Replace .flatten() with explicit error handling in dir iteration (#4, #5) - Split stub_tool_host_functions into stub_shared_host_functions + tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
chupaSV
pushed a commit
that referenced
this pull request
Mar 10, 2026
) * feat: add inbound attachment support to WASM channel system Add attachment record to WIT interface and implement inbound media parsing across all four channel implementations (Telegram, Slack, WhatsApp, Discord). Attachments flow from WASM channels through EmittedMessage to IncomingMessage with validation (size limits, MIME allowlist, count caps) at the host boundary. - Add `attachment` record to `emitted-message` in wit/channel.wit - Add `IncomingAttachment` struct to channel.rs and re-export - Add host-side validation (20MB total, 10 max, MIME allowlist) - Telegram: parse photo, document, audio, video, voice, sticker - Slack: parse file attachments with url_private - WhatsApp: parse image, audio, video, document with captions - Discord: backward-compatible empty attachments - Update FEATURE_PARITY.md section 7 - Add fixture-based tests per channel and host integration tests [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: integrate outbound attachment support and reconcile WIT types (nearai#409) Reconcile PR nearai#409's outbound attachment work with our inbound attachment support into a unified design: WIT type split: - `inbound-attachment` in channel-host: metadata-only (id, mime_type, filename, size_bytes, source_url, storage_key, extracted_text) - `attachment` in channel: raw bytes (filename, mime_type, data) on agent-response for outbound sending Outbound features (from PR nearai#409): - `on-broadcast` WIT export for proactive messages without prior inbound - Telegram: multipart sendPhoto/sendDocument with auto photo→document fallback for files >10MB - wrapper.rs: `call_on_broadcast`, `read_attachments` from disk, attachment params threaded through `call_on_respond` - HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit, path traversal protection, SSRF-safe redirect following) - Message tool: allow /tmp/ paths for attachments alongside base_dir - Credential env var fallback in inject_channel_credentials Channel updates: - All 4 channels implement on_broadcast (Telegram full, others stub) - Telegram: polling_enabled config, adjusted poll timeout - Inbound attachment types renamed to InboundAttachment in all channels Tests: 1965 passing (9 new), 0 clippy warnings [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add audio transcription pipeline and extensible WIT attachment design Add host-side transcription middleware (OpenAI Whisper) that detects audio attachments with inline data on incoming messages and transcribes them automatically. Refactor WIT inbound-attachment to use extras-json and a store-attachment-data host function instead of typed fields, so future attachment properties (dimensions, codec, etc.) don't require WIT changes that invalidate all channel plugins. - Add src/transcription/ module: TranscriptionProvider trait, TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider - Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL - Wire middleware into agent message loop via AgentDeps - WIT: replace data + duration-secs with extras-json + store-attachment-data - Host: parse extras-json for well-known keys, merge stored binary data - Telegram: download voice files via store-attachment-data, add duration to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder - Add reqwest multipart feature for Whisper API uploads - 5 regression tests for transcription middleware Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: wire attachment processing into LLM pipeline with multimodal image support Attachments on incoming messages are now augmented into user text via XML tags before entering the turn system, and images with data are passed as multimodal content parts (base64 data URIs) to LLM providers. This enables audio transcripts, document text, and image content to reach the LLM without changes to ChatMessage serialization or provider interfaces. - Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests - Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde - Carry image_content_parts transiently on Turn (skipped in serialization) - Update nearai_chat and rig_adapter to serialize multimodal content - Add 3 e2e tests verifying attachments flow through the full agent loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, version bumps, and Telegram voice test - Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs, e2e_attachments.rs - Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram, whatsapp) to satisfy version-bump CI check - Fix Telegram test_extract_attachments_voice: add missing required `duration` field to voice fixture JSON Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook - Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with store-attachment-data) - Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match - Fix Telegram test_extract_attachments_voice: gate voice download behind #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests, update assertions for generated filename and extras_json duration - Add @0.3.0 linker stubs in wit_compat.rs - Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when WIT or extension sources are staged - Symlink commit-msg regression hook into .githooks/ [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract voice download from extract_attachments into handle_message Move download_voice_file + store_attachment_data calls out of extract_attachments into a separate download_and_store_voice function called from handle_message. This keeps extract_attachments as a pure data-mapping function with no host calls, making it fully testable in native unit tests without #[cfg(target_arch)] gates. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Add path validation to read_attachments (restrict to /tmp/) preventing arbitrary file reads from compromised tools - Escape XML special characters in attachment filenames, MIME types, and extracted text to prevent prompt injection via tag spoofing - Percent-encode file_id in Telegram getFile URL to prevent query injection - Clone SecretString directly instead of expose_secret().to_string() Correctness fixes: - Fix store_attachment_data overwrite accounting: subtract old entry size before adding new to prevent inflated totals and false rejections - Use max(reported, stored_size) for attachment size accounting to prevent WASM channels from under-reporting size_bytes to bypass limits - Add application/octet-stream to MIME allowlist (channels default unknown types to this) Code quality: - Extract send_response helper in Telegram, deduplicating on_respond and on_broadcast - Rename misleading Discord test to test_parse_slash_command_interaction - Fix .githooks/commit-msg to use relative symlink (portable across machines) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool_upgrade command + fix TOCTOU in save_to path validation Add `tool_upgrade` — a new extension management tool that automatically detects and reinstalls WASM extensions with outdated WIT versions. Preserves authentication secrets during upgrade. Supports upgrading a single extension by name or all installed WASM tools/channels at once. Fix TOCTOU in `validate_save_to_path`: validate the path *before* creating parent directories, so traversal paths like `/tmp/../../etc/` cannot cause filesystem mutations outside /tmp before being rejected. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities tool.wit and channel.wit share the `near:agent` package namespace, so they must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and updates all capabilities files and registry entries to match. Fixes `cargo component build` failure: "package identifier near:agent@0.2.0 does not match previous package name of near:agent@0.3.0" [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move WIT file comments after package declaration WIT treats `//` comments before `package` as doc comments. When both tool.wit and channel.wit had header comments, the parser rejected them as "doc comments on multiple 'package' items". Move comments after the package declaration in both files. Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: display extension versions in gateway Extensions tab Add version field to InstalledExtension and RegistryEntry types, pipe through the web API (ExtensionInfo, RegistryEntryInfo), and render as a badge in the gateway UI for both installed and available extensions. For installed WASM extensions, version is read from the capabilities file with a fallback to the registry entry when the local file has no version (old installations). Bump all extension Cargo.toml and registry JSON versions from 0.1.0 to 0.2.0 to keep them in sync. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add document text extraction middleware for PDF, Office, and text files Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text, code files) so the LLM can reason about uploaded documents. Uses pdf-extract for PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files. Wired into the agent loop after transcription middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: download document files in Telegram channel for text extraction The DocumentExtractionMiddleware needs file bytes in the attachment `data` field, but only voice files were being downloaded. Document attachments (PDFs, DOCX, etc.) had empty `data` and a source_url with a credential placeholder that only works inside the WASM host's http_request. Add `download_and_store_documents()` that downloads non-voice, non-image, non-audio attachments via the existing two-step getFile→download flow and stores bytes via `store_attachment_data` for host-side extraction. Also rename `download_voice_file` → `download_telegram_file` since it's generic for any file_id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: allow Office MIME types and increase file download limit for Telegram Two issues preventing document extraction from Telegram: 1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the WASM host attachment allowlist — add application/vnd., application/msword, and application/rtf prefixes. 2. Telegram file downloads over 10 MB failed with "Response body too large" — set max_response_bytes to 20 MB in Telegram capabilities. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: report document extraction errors back to user instead of silently skipping - Bump max_response_bytes to 50 MB for Telegram file downloads - When document extraction fails (too large, download error, parse error), set extracted_text to a user-friendly error message instead of leaving it None. This ensures the LLM tells the user what went wrong. - On Telegram download failure, set extracted_text with the error so the user sees feedback even when the file never reaches the extraction middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: store extracted document text in workspace memory for search/recall After document extraction succeeds, write the extracted text to workspace memory at `documents/{date}/{filename}`. This enables: - Full-text and semantic search over past uploaded documents - Cross-conversation recall ("what did that PDF say?") - Automatic chunking and embedding via the workspace pipeline Documents are stored with metadata header (uploader, channel, date, MIME type). Error messages (extraction failures) are not stored — only successful extractions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, unused assignment warning - Run cargo fmt on document_extraction and agent_loop modules - Suppress unused_assignments warning on trace_llm_ref (used only behind #[cfg(feature = "libsql")]) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Remove SSRF-prone download() from DocumentExtractionMiddleware (nearai#13) - Sanitize filenames in workspace path to prevent directory traversal (nearai#11) - Pre-check file size before reading in WASM wrapper to prevent OOM (#2) - Percent-encode file_id in Telegram source URLs (#7) Correctness fixes: - Clear image_content_parts on turn end to prevent memory leak (#1) - Find first *successful* transcription instead of first overall (#3) - Enforce data.len() size limit in document extraction (nearai#10) - Use UTF-8 safe truncation with char_indices() (nearai#12) Robustness & code quality: - Add 120s timeout to OpenAI Whisper HTTP client (#5) - Trim trailing slash from Whisper base_url (#6) - Allow ~/.ironclaw/ paths in WASM wrapper (#8) - Return error from on_broadcast in Slack/Discord/WhatsApp (nearai#9) - Fix doc comment in HTTP tool (#4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting — cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address latest PR review — doc comments, error messages, version bumps - Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url) - Fix error message: "no inline data" instead of "no download URL" - Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client - Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unsupported profile: minimal from CI workflows [skip-regression-check] dtolnay/rust-toolchain@stable does not accept the 'profile' input (it was a parameter for the deprecated actions-rs/toolchain action). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: merge with latest main — resolve compilation errors and PR review nits - Add version: None to RegistryEntry/InstalledExtension test constructors - Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text) - Fix .contains() calls on MessageContent — use .as_text().unwrap() - Remove redundant trace_llm_ref = None assignment in test_rig - Check data size before clone in document extraction to avoid unnecessary allocation [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.