Skip to content

Commit d144484

Browse files
feat: WASM channel attachments with LLM pipeline integration (nearai#596)
* feat: add inbound attachment support to WASM channel system Add attachment record to WIT interface and implement inbound media parsing across all four channel implementations (Telegram, Slack, WhatsApp, Discord). Attachments flow from WASM channels through EmittedMessage to IncomingMessage with validation (size limits, MIME allowlist, count caps) at the host boundary. - Add `attachment` record to `emitted-message` in wit/channel.wit - Add `IncomingAttachment` struct to channel.rs and re-export - Add host-side validation (20MB total, 10 max, MIME allowlist) - Telegram: parse photo, document, audio, video, voice, sticker - Slack: parse file attachments with url_private - WhatsApp: parse image, audio, video, document with captions - Discord: backward-compatible empty attachments - Update FEATURE_PARITY.md section 7 - Add fixture-based tests per channel and host integration tests [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: integrate outbound attachment support and reconcile WIT types (nearai#409) Reconcile PR nearai#409's outbound attachment work with our inbound attachment support into a unified design: WIT type split: - `inbound-attachment` in channel-host: metadata-only (id, mime_type, filename, size_bytes, source_url, storage_key, extracted_text) - `attachment` in channel: raw bytes (filename, mime_type, data) on agent-response for outbound sending Outbound features (from PR nearai#409): - `on-broadcast` WIT export for proactive messages without prior inbound - Telegram: multipart sendPhoto/sendDocument with auto photo→document fallback for files >10MB - wrapper.rs: `call_on_broadcast`, `read_attachments` from disk, attachment params threaded through `call_on_respond` - HTTP tool: `save_to` param for binary downloads to /tmp/ (50MB limit, path traversal protection, SSRF-safe redirect following) - Message tool: allow /tmp/ paths for attachments alongside base_dir - Credential env var fallback in inject_channel_credentials Channel updates: - All 4 channels implement on_broadcast (Telegram full, others stub) - Telegram: polling_enabled config, adjusted poll timeout - Inbound attachment types renamed to InboundAttachment in all channels Tests: 1965 passing (9 new), 0 clippy warnings [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add audio transcription pipeline and extensible WIT attachment design Add host-side transcription middleware (OpenAI Whisper) that detects audio attachments with inline data on incoming messages and transcribes them automatically. Refactor WIT inbound-attachment to use extras-json and a store-attachment-data host function instead of typed fields, so future attachment properties (dimensions, codec, etc.) don't require WIT changes that invalidate all channel plugins. - Add src/transcription/ module: TranscriptionProvider trait, TranscriptionMiddleware, AudioFormat enum, OpenAI Whisper provider - Add src/config/transcription.rs: TRANSCRIPTION_ENABLED/MODEL/BASE_URL - Wire middleware into agent message loop via AgentDeps - WIT: replace data + duration-secs with extras-json + store-attachment-data - Host: parse extras-json for well-known keys, merge stored binary data - Telegram: download voice files via store-attachment-data, add duration to extras-json, add /file/bot to HTTP allowlist, voice-only placeholder - Add reqwest multipart feature for Whisper API uploads - 5 regression tests for transcription middleware Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: wire attachment processing into LLM pipeline with multimodal image support Attachments on incoming messages are now augmented into user text via XML tags before entering the turn system, and images with data are passed as multimodal content parts (base64 data URIs) to LLM providers. This enables audio transcripts, document text, and image content to reach the LLM without changes to ChatMessage serialization or provider interfaces. - Add src/agent/attachments.rs with augment_with_attachments() and 9 unit tests - Add ContentPart/ImageUrl types to llm::provider with OpenAI-compatible serde - Carry image_content_parts transiently on Turn (skipped in serialization) - Update nearai_chat and rig_adapter to serialize multimodal content - Add 3 e2e tests verifying attachments flow through the full agent loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, version bumps, and Telegram voice test - Fix cargo fmt formatting in attachments.rs, nearai_chat.rs, rig_adapter.rs, e2e_attachments.rs - Bump channel registry versions 0.1.0 → 0.2.0 (discord, slack, telegram, whatsapp) to satisfy version-bump CI check - Fix Telegram test_extract_attachments_voice: add missing required `duration` field to voice fixture JSON Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: bump WIT channel version to 0.3.0, fix Telegram voice test, add pre-commit hook - Bump wit/channel.wit package version 0.2.0 → 0.3.0 (interface changed with store-attachment-data) - Update WIT_CHANNEL_VERSION constant and registry wit_version fields to match - Fix Telegram test_extract_attachments_voice: gate voice download behind #[cfg(target_arch = "wasm32")] so host functions aren't called in native tests, update assertions for generated filename and extras_json duration - Add @0.3.0 linker stubs in wit_compat.rs - Add .githooks/pre-commit hook that runs scripts/check-version-bumps.sh when WIT or extension sources are staged - Symlink commit-msg regression hook into .githooks/ [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: extract voice download from extract_attachments into handle_message Move download_voice_file + store_attachment_data calls out of extract_attachments into a separate download_and_store_voice function called from handle_message. This keeps extract_attachments as a pure data-mapping function with no host calls, making it fully testable in native unit tests without #[cfg(target_arch)] gates. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Add path validation to read_attachments (restrict to /tmp/) preventing arbitrary file reads from compromised tools - Escape XML special characters in attachment filenames, MIME types, and extracted text to prevent prompt injection via tag spoofing - Percent-encode file_id in Telegram getFile URL to prevent query injection - Clone SecretString directly instead of expose_secret().to_string() Correctness fixes: - Fix store_attachment_data overwrite accounting: subtract old entry size before adding new to prevent inflated totals and false rejections - Use max(reported, stored_size) for attachment size accounting to prevent WASM channels from under-reporting size_bytes to bypass limits - Add application/octet-stream to MIME allowlist (channels default unknown types to this) Code quality: - Extract send_response helper in Telegram, deduplicating on_respond and on_broadcast - Rename misleading Discord test to test_parse_slash_command_interaction - Fix .githooks/commit-msg to use relative symlink (portable across machines) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool_upgrade command + fix TOCTOU in save_to path validation Add `tool_upgrade` — a new extension management tool that automatically detects and reinstalls WASM extensions with outdated WIT versions. Preserves authentication secrets during upgrade. Supports upgrading a single extension by name or all installed WASM tools/channels at once. Fix TOCTOU in `validate_save_to_path`: validate the path *before* creating parent directories, so traversal paths like `/tmp/../../etc/` cannot cause filesystem mutations outside /tmp before being rejected. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unify WIT package version to 0.3.0 across tool.wit and all capabilities tool.wit and channel.wit share the `near:agent` package namespace, so they must declare the same version. Bumps tool.wit from 0.2.0 to 0.3.0 and updates all capabilities files and registry entries to match. Fixes `cargo component build` failure: "package identifier near:agent@0.2.0 does not match previous package name of near:agent@0.3.0" [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move WIT file comments after package declaration WIT treats `//` comments before `package` as doc comments. When both tool.wit and channel.wit had header comments, the parser rejected them as "doc comments on multiple 'package' items". Move comments after the package declaration in both files. Also bumps tool registry versions to 0.2.0 to match the WIT 0.3.0 bump. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: display extension versions in gateway Extensions tab Add version field to InstalledExtension and RegistryEntry types, pipe through the web API (ExtensionInfo, RegistryEntryInfo), and render as a badge in the gateway UI for both installed and available extensions. For installed WASM extensions, version is read from the capabilities file with a fallback to the registry entry when the local file has no version (old installations). Bump all extension Cargo.toml and registry JSON versions from 0.1.0 to 0.2.0 to keep them in sync. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add document text extraction middleware for PDF, Office, and text files Extract text from document attachments (PDF, DOCX, PPTX, XLSX, RTF, plain text, code files) so the LLM can reason about uploaded documents. Uses pdf-extract for PDFs, zip+XML parsing for Office XML formats, and UTF-8 decode for text files. Wired into the agent loop after transcription middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: download document files in Telegram channel for text extraction The DocumentExtractionMiddleware needs file bytes in the attachment `data` field, but only voice files were being downloaded. Document attachments (PDFs, DOCX, etc.) had empty `data` and a source_url with a credential placeholder that only works inside the WASM host's http_request. Add `download_and_store_documents()` that downloads non-voice, non-image, non-audio attachments via the existing two-step getFile→download flow and stores bytes via `store_attachment_data` for host-side extraction. Also rename `download_voice_file` → `download_telegram_file` since it's generic for any file_id. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: allow Office MIME types and increase file download limit for Telegram Two issues preventing document extraction from Telegram: 1. PPTX/DOCX/XLSX MIME types (application/vnd.*) were dropped by the WASM host attachment allowlist — add application/vnd., application/msword, and application/rtf prefixes. 2. Telegram file downloads over 10 MB failed with "Response body too large" — set max_response_bytes to 20 MB in Telegram capabilities. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: report document extraction errors back to user instead of silently skipping - Bump max_response_bytes to 50 MB for Telegram file downloads - When document extraction fails (too large, download error, parse error), set extracted_text to a user-friendly error message instead of leaving it None. This ensures the LLM tells the user what went wrong. - On Telegram download failure, set extracted_text with the error so the user sees feedback even when the file never reaches the extraction middleware. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: store extracted document text in workspace memory for search/recall After document extraction succeeds, write the extracted text to workspace memory at `documents/{date}/{filename}`. This enables: - Full-text and semantic search over past uploaded documents - Cross-conversation recall ("what did that PDF say?") - Automatic chunking and embedding via the workspace pipeline Documents are stored with metadata header (uploader, channel, date, MIME type). Error messages (extraction failures) are not stored — only successful extractions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: CI failures — formatting, unused assignment warning - Run cargo fmt on document_extraction and agent_loop modules - Suppress unused_assignments warning on trace_llm_ref (used only behind #[cfg(feature = "libsql")]) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments — security, correctness, and code quality Security fixes: - Remove SSRF-prone download() from DocumentExtractionMiddleware (nearai#13) - Sanitize filenames in workspace path to prevent directory traversal (nearai#11) - Pre-check file size before reading in WASM wrapper to prevent OOM (#2) - Percent-encode file_id in Telegram source URLs (#7) Correctness fixes: - Clear image_content_parts on turn end to prevent memory leak (#1) - Find first *successful* transcription instead of first overall (#3) - Enforce data.len() size limit in document extraction (nearai#10) - Use UTF-8 safe truncation with char_indices() (nearai#12) Robustness & code quality: - Add 120s timeout to OpenAI Whisper HTTP client (#5) - Trim trailing slash from Whisper base_url (#6) - Allow ~/.ironclaw/ paths in WASM wrapper (#8) - Return error from on_broadcast in Slack/Discord/WhatsApp (nearai#9) - Fix doc comment in HTTP tool (#4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: formatting — cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address latest PR review — doc comments, error messages, version bumps - Fix DocumentExtractionMiddleware doc comment (no longer downloads from source_url) - Fix error message: "no inline data" instead of "no download URL" - Log error + fallback instead of silent unwrap_or_default on Whisper HTTP client - Bump all capabilities.json versions from 0.1.0 to 0.2.0 to match Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unsupported profile: minimal from CI workflows [skip-regression-check] dtolnay/rust-toolchain@stable does not accept the 'profile' input (it was a parameter for the deprecated actions-rs/toolchain action). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: merge with latest main — resolve compilation errors and PR review nits - Add version: None to RegistryEntry/InstalledExtension test constructors - Fix MessageContent type mismatches in nearai_chat tests (String → MessageContent::Text) - Fix .contains() calls on MessageContent — use .as_text().unwrap() - Remove redundant trace_llm_ref = None assignment in test_rig - Check data size before clone in document extraction to avoid unnecessary allocation [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 3079043 commit d144484

File tree

106 files changed

+5570
-419
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+5570
-419
lines changed

.githooks/commit-msg

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../scripts/commit-msg-regression.sh

.githooks/pre-commit

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
# Pre-commit hook: run version bump checks when WIT or extension sources change.
5+
# Install: git config core.hooksPath .githooks
6+
7+
# Only run the check if relevant files are staged
8+
STAGED=$(git diff --cached --name-only)
9+
10+
NEEDS_CHECK=false
11+
if echo "$STAGED" | grep -qE '^wit/|^channels-src/|^tools-src/'; then
12+
NEEDS_CHECK=true
13+
fi
14+
15+
if $NEEDS_CHECK; then
16+
echo "pre-commit: checking version bumps..."
17+
if ! ./scripts/check-version-bumps.sh; then
18+
echo ""
19+
echo "Commit blocked: version bump check failed."
20+
echo "Bump versions in the relevant registry JSON and/or WIT package declaration."
21+
echo "To bypass: git commit --no-verify"
22+
exit 1
23+
fi
24+
fi

.github/workflows/code_style.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ jobs:
1212
- name: Install Rust
1313
uses: dtolnay/rust-toolchain@stable
1414
with:
15-
profile: minimal
1615
components: rustfmt
1716
- name: Check formatting
1817
run: cargo fmt --all -- --check
@@ -36,7 +35,6 @@ jobs:
3635
- name: Install Rust
3736
uses: dtolnay/rust-toolchain@stable
3837
with:
39-
profile: minimal
4038
components: clippy
4139
- uses: Swatinem/rust-cache@v2
4240
with:
@@ -63,7 +61,6 @@ jobs:
6361
- name: Install Rust
6462
uses: dtolnay/rust-toolchain@stable
6563
with:
66-
profile: minimal
6764
components: clippy
6865
- uses: Swatinem/rust-cache@v2
6966
with:

.github/workflows/test.yml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ jobs:
2525
- name: Install Rust
2626
uses: dtolnay/rust-toolchain@stable
2727
with:
28-
profile: minimal
2928
targets: wasm32-wasip2
3029
- uses: Swatinem/rust-cache@v2
3130
with:
@@ -45,8 +44,6 @@ jobs:
4544
uses: actions/checkout@v6
4645
- name: Install Rust
4746
uses: dtolnay/rust-toolchain@stable
48-
with:
49-
profile: minimal
5047
- uses: Swatinem/rust-cache@v2
5148
- name: Run Telegram Channel Tests
5249
run: cargo test --manifest-path channels-src/telegram/Cargo.toml -- --nocapture
@@ -69,8 +66,6 @@ jobs:
6966
uses: actions/checkout@v6
7067
- name: Install Rust
7168
uses: dtolnay/rust-toolchain@stable
72-
with:
73-
profile: minimal
7469
- uses: Swatinem/rust-cache@v2
7570
with:
7671
key: windows-${{ matrix.name }}
@@ -86,7 +81,6 @@ jobs:
8681
- name: Install Rust
8782
uses: dtolnay/rust-toolchain@stable
8883
with:
89-
profile: minimal
9084
targets: wasm32-wasip2
9185
- uses: Swatinem/rust-cache@v2
9286
with:

Cargo.lock

Lines changed: 129 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ tokio-stream = { version = "0.1", features = ["sync"] }
4040
futures = "0.3"
4141

4242
# HTTP client
43-
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls-native-roots", "stream"] }
43+
reqwest = { version = "0.12", default-features = false, features = ["json", "multipart", "rustls-tls-native-roots", "stream"] }
4444

4545
# Serialization
4646
serde = { version = "1", features = ["derive"] }
@@ -147,6 +147,10 @@ bollard = "0.18"
147147
flate2 = "1"
148148
tar = "0.4"
149149

150+
# Document text extraction
151+
pdf-extract = "0.7"
152+
zip = { version = "2", default-features = false, features = ["deflate"] }
153+
150154
# HTTP proxy for sandboxed network access
151155
hyper = { version = "1.5", features = ["server", "http1", "http2"] }
152156
hyper-util = { version = "0.1", features = ["server", "tokio", "http1", "http2"] }

FEATURE_PARITY.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ This document tracks feature parity between IronClaw (Rust implementation) and O
119119
| Mention-based activation ||| bot_username + respond_to_all_group_messages |
120120
| Per-group tool policies ||| Allow/deny specific tools |
121121
| Thread isolation ||| Separate sessions per thread |
122-
| Per-channel media limits || 🚧 | Caption support for media; no size limits |
122+
| Per-channel media limits || | Attachment type in WIT; max 10 per msg, 20MB total, MIME allowlist |
123123
| Typing indicators || 🚧 | TUI + Telegram typing/actionable status prompts; richer parity pending |
124124
| Per-channel ackReaction config ||| Customizable acknowledgement reactions |
125125
| Group session priming ||| Member roster injected for context |
@@ -248,19 +248,32 @@ This document tracks feature parity between IronClaw (Rust implementation) and O
248248

249249
| Feature | OpenClaw | IronClaw | Priority | Notes |
250250
|---------|----------|----------|----------|-------|
251+
| WIT inbound-attachment type | N/A || P1 | `inbound-attachment` record in channel-host (id, mime_type, filename, size_bytes, source_url, storage_key, extracted_text) |
252+
| WIT outbound attachment type | N/A || P1 | `attachment` record in channel (filename, mime_type, data) on `agent-response` |
253+
| WIT on-broadcast export | N/A || P1 | Proactive message sending without prior incoming message |
254+
| IncomingMessage attachments | N/A || P1 | `IncomingAttachment` struct on `IncomingMessage`, populated from WASM channels |
255+
| OutgoingResponse attachments | N/A || P1 | File paths on `OutgoingResponse`, read from disk and sent as WIT attachments |
256+
| Attachment security (size/MIME) | N/A || P1 | Inbound: max 10, 20MB total, MIME allowlist. Outbound: 50MB total |
257+
| Telegram media parsing ||| P1 | Photo, document, audio, video, voice, sticker parsed and emitted as attachments |
258+
| Telegram media sending ||| P1 | sendPhoto/sendDocument multipart upload, auto photo→document fallback >10MB |
259+
| Slack file parsing ||| P1 | `files` array from Events API parsed into attachments |
260+
| WhatsApp media parsing ||| P1 | Image, audio, video, document parsed with caption as extracted_text |
261+
| Discord attachment parsing ||| P2 | Discord interaction payloads don't include file attachments (needs message events) |
262+
| HTTP tool save_to | N/A || P1 | Download binary files to /tmp/ for attachment sending (50MB limit, path traversal protection) |
263+
| Credential env var fallback | N/A || P2 | Channels can use env vars (e.g., TELEGRAM_BOT_TOKEN) when secrets store not configured |
251264
| Image processing (Sharp) ||| P2 | Resize, format convert |
252265
| Configurable image resize dims ||| P2 | Per-agent dimension config |
253266
| Multiple images per tool call ||| P2 | Single tool invocation, multiple images |
254267
| Audio transcription ||| P2 | |
255268
| Video support ||| P3 | |
256269
| PDF parsing ||| P2 | pdfjs-dist |
257-
| MIME detection || | P2 | |
270+
| MIME detection || | P2 | MIME allowlist in host validates attachment types |
258271
| Media caching ||| P3 | |
259272
| Vision model integration ||| P2 | Image understanding |
260273
| TTS (Edge TTS) ||| P3 | Text-to-speech |
261274
| TTS (OpenAI) ||| P3 | |
262275
| Incremental TTS playback ||| P3 | iOS progressive playback |
263-
| Sticker-to-image || | P3 | Telegram stickers |
276+
| Sticker-to-image || | P3 | Telegram stickers emitted as image/webp attachments |
264277

265278
### Owner: _Unassigned_
266279

channels-src/discord/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "discord-channel"
3-
version = "0.1.0"
3+
version = "0.2.0"
44
edition = "2021"
55
description = "Discord channel for IronClaw"
66
license = "MIT OR Apache-2.0"

channels-src/discord/discord.capabilities.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"version": "0.1.0",
3-
"wit_version": "0.2.0",
2+
"version": "0.2.0",
3+
"wit_version": "0.3.0",
44
"type": "channel",
55
"name": "discord",
66
"description": "Discord Gateway/Webhook channel for handling slash commands, buttons, and messages",

0 commit comments

Comments
 (0)