Skip to content

NIP-AE: agent engrams (kind:30174) — core memory injection + sprout mem CLI#593

Open
tlongwell-block wants to merge 9 commits into
mainfrom
sami/nip-ae-minimal
Open

NIP-AE: agent engrams (kind:30174) — core memory injection + sprout mem CLI#593
tlongwell-block wants to merge 9 commits into
mainfrom
sami/nip-ae-minimal

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

Summary

Implements NIP-AE (Agent Engrams, kind 30174) as the smallest viable surface across relay, CLI, and ACP harness. The goal: let an owner write a small, durable "core memory" that their agent reads once per new session, with no new daemons, no new crates, and minimal moving parts.

What this gives you

  • sprout mem set core "I am Sami. Be terse." — writes a NIP-44-encrypted, parameterized-replaceable note addressed to your agent.
  • The agent fetches that note at session creation and injects it as a prompt section between [System] and [Context]. One fetch per session, fail-open on every error path.
  • No core yet? The agent gets a short onboarding nudge instead, so it learns to ask the user about themselves and create one.

How it's structured

sprout-core::engram — pure primitives, no I/O. Shared by CLI and ACP.

  • Conversation key (NIP-44 v2 symmetric, so either party reads with their own seckey).
  • d-tag = lower_hex(HMAC-SHA256(K_c, "agent-memory/v1/d-tag\0" || slug)) — spec vectors pinned byte-for-byte.
  • Body parse/serialize with strict duplicate-key rejection at any depth (serde Visitor, not a hand-rolled scanner).
  • Envelope build + head selection (created_at desc, then event-id desc; tombstones honoured).

Relay — kind 30174 added to ALL_KINDS, the per-kind scope allowlist (UsersWrite, same group as KIND_READ_STATE), and is_global_only_kind. A new validate_engram_envelope rejects malformed events (≠1 d, ≠1 p, non-lowercase-hex d, empty content) before they reach NIP-33 replacement, so a bad event can't poison the storage head and become invisible to #p readers.

sprout mem CLIls | get | set | rm. Slug shorthand normalises foomem/foo; core is reserved and rm core is refused. set reads from - (stdin). submit_engram parses the relay's {accepted, message} so a duplicate: response (same-second NIP-33 dominated write) surfaces as a Conflict instead of a silent "wrote".

ACP harness — at new-session creation, one synchronous fetch + decrypt of the core engram, cached per channel in the rendered prompt section. Re-fetched only when the session is invalidated. On transport errors we return None (no section) rather than the onboarding nudge — a flaky relay shouldn't gaslight the agent into thinking its memory is empty.

Test plan

  • 13 engram unit tests in sprout-core, including the spec's K_c vector and three d-tag vectors verified byte-for-byte.
  • 6 envelope-validation tests in the relay.
  • 2 format_prompt injection tests in the ACP queue (core present → injected; absent → nudge).
  • Scope-allowlist coverage extended.
  • Live end-to-end against a local relay per TESTING.md:
    • ls (empty) → set coreget coreset foo (stdin) → set mem/barls (two entries) → rm fooget foo exits non-zero with tombstoned:ls shows only mem/barrm core refused → invalid slug rejected.
    • Fresh agent with core set: harness logs injected NIP-AE core section ... section_len=60.
    • Fresh agent without core: harness logs onboarding nudge.

Notes for review

  • Codex reviewed an earlier revision at 7/10 and flagged five issues; all are addressed in this PR. See commit for specifics: visitor-based dup detection, conflict surfacing, fail-closed-on-transport-error, envelope pre-validation, structured NotFound/Conflict exit codes.
  • Diffstat: 16 files, +1709 / -41. The bulk is sprout-core/src/engram.rs (~835 lines, mostly tests + spec vectors).
  • No new crates. No new daemons. The CLI uses the same SproutClient as everything else; the harness uses the existing RestClient::query.

Out of scope (intentionally)

  • Mid-session refresh of the core engram.
  • Owner-side UI for browsing/editing engrams (the CLI is enough for now; a desktop surface can come later using the same sprout-core::engram API).
  • Engram kinds beyond core / mem/* (the d-tag derivation is slug-agnostic; future kinds slot in without protocol changes).

Closes the NIP-AE implementation thread.

tlongwell-block pushed a commit that referenced this pull request May 15, 2026
Codex P2 findings on PR #593:

1. Non-NIP44 content slipped past the relay envelope check. A signed
   kind:30174 with valid d/p tags but content like 'x' won NIP-33
   replacement against a valid head and was then silently discarded by
   readers — silently erasing memory.

2. Uppercase-hex p tags were accepted. Readers query #p with the
   lowercase hex of the owner pubkey (byte-exact tag match), so an
   uppercase-tagged event that won replacement became invisible to
   readers — same bricking pattern.

Tighten validate_engram_envelope to:
- require lowercase hex for the p tag (consistent with the existing
  rule on d)
- validate that content is a syntactically plausible NIP-44 v2
  payload: standard base64 alphabet, length multiple of 4, decoded
  length >= 99 bytes, first decoded byte = 0x02 (version prefix).

Relay-side sanity check only — the MAC and decryption still happen at
the reader. The point is to refuse obvious junk before it can
supersede a valid head.

+6 regression tests; canonical-accepts fixture updated to use a real-
shape NIP-44 v2 sample.
tlongwell-block pushed a commit that referenced this pull request May 15, 2026
Second codex review pass on PR #593 flagged two more P2s:

1. (engram_fetch.rs) When the relay returns kind:30174 events addressed
   to the agent but none decrypt (wrong key, MAC failure, body schema
   mismatch, or an event injected by another party that happened to be
   p-tagged at this agent), the previous code returned Ok(None) — which
   the harness then renders as the onboarding nudge, inviting the agent
   to overwrite a real-but-unreadable core.

   Now distinguish three outcomes:
     - empty array            → Ok(None)     (confirmed absence; nudge)
     - >=1 event decrypts     → use winning head
     - non-empty, none decrypt→ Err          (fail closed; no section)

   Extracted the post-query decode logic into a pure decode_core_body()
   helper so it's unit-testable without mocking RestClient. Added 5
   tests: empty-array-absent, valid-core-returns-profile,
   undecryptable-is-err-not-absent (the regression), non-core-body-is-
   absent, unparseable-candidates-is-err.

2. (commands/mem.rs) The module doc comment claimed `sprout mem` and
   `sprout mem ls` were equivalent, but the clap wiring requires a
   subcommand so bare `sprout mem` exits with a usage error. Drop the
   false claim — bare-group-shows-help is the convention across the
   other 12 subcommand groups; adding a default action just for mem
   would be inconsistent.
tlongwell-block pushed a commit that referenced this pull request May 15, 2026
Third codex pass on PR #593 flagged that engram events were stored as
global, so any authenticated relay member could REQ
`{"kinds":[30174]}` and harvest:
- the encrypted ciphertext (no plaintext but still a fingerprint)
- the public `#p` (owner pubkey)
- the public `#d` (HMAC-derived per-slug fingerprint)
- timestamps (write-activity patterns)

Together that leaks who-pairs-with-which-agent + when they're active.
Strictly speaking the NIP-AE design encrypts content for confidentiality
but assumes the relay enforces read gating on the event metadata.

Add a new `engram_filters_authorized` predicate alongside the existing
`p_gated_filters_authorized`. A filter that can match KIND_AGENT_ENGRAM
must satisfy at least one of:
  - `authors` non-empty AND every entry == authed (agent reading own), or
  - `#p`     non-empty AND every entry == authed (owner reading addressed-to-self).
Specific-event-ids lookups (`ids: [...]`) are exempt — knowing the id
implies prior authorization.

Hook into all four read paths:
  - WS REQ (historical + live subscription registration)
  - WS COUNT
  - HTTP /query
  - HTTP /count

+9 unit tests:
  agent_querying_own, owner_querying, owner_no_authors, ids_lookup,
  skips_non_engram_kinds (positive); unrelated_reader,
  bare_kind_filter, wildcard_kind_filter, mixed_authors_with_unauthed
  (negative).
Implements NIP-AE (Agent Engrams, kind:30174) as the smallest viable
surface across relay, CLI, and ACP harness.

* sprout-core::engram — pure crypto + parsing primitives shared by CLI
  and ACP harness. Conversation key, d-tag HMAC, body parse/serialize
  with duplicate-key rejection, envelope build, head selection. Pinned
  spec vectors (K_c, three d-tags) verified byte-for-byte.

* Relay: kind 30174 added to ALL_KINDS, the per-kind scope allowlist
  (UsersWrite, same group as KIND_READ_STATE), and is_global_only_kind.
  NIP-33 plumbing (replace_parameterized_event) handles the rest.

* sprout mem CLI: ls/get/set/rm. Slug shorthand normalises 'foo' to
  'mem/foo'; 'core' is reserved. set reads stdin with '-'. Monotonic
  created_at + tombstone semantics per spec. Symmetric decrypt — either
  party (owner or agent) reads with their own seckey.

* ACP harness: at new-session creation, fire one synchronous fetch +
  decrypt of the core engram and cache the rendered prompt section per
  channel. If no core exists or any error occurs, inject the onboarding
  nudge so the agent learns to bootstrap itself. format_prompt() emits
  the section after [System] and before [Context]. No mid-session
  refresh — only re-fetched when a session is invalidated.

* Tests: 13 engram unit tests including spec vectors, 2 format_prompt
  injection tests, scope-allowlist coverage extended.

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Codex P2 findings on PR #593:

1. Non-NIP44 content slipped past the relay envelope check. A signed
   kind:30174 with valid d/p tags but content like 'x' won NIP-33
   replacement against a valid head and was then silently discarded by
   readers — silently erasing memory.

2. Uppercase-hex p tags were accepted. Readers query #p with the
   lowercase hex of the owner pubkey (byte-exact tag match), so an
   uppercase-tagged event that won replacement became invisible to
   readers — same bricking pattern.

Tighten validate_engram_envelope to:
- require lowercase hex for the p tag (consistent with the existing
  rule on d)
- validate that content is a syntactically plausible NIP-44 v2
  payload: standard base64 alphabet, length multiple of 4, decoded
  length >= 99 bytes, first decoded byte = 0x02 (version prefix).

Relay-side sanity check only — the MAC and decryption still happen at
the reader. The point is to refuse obvious junk before it can
supersede a valid head.

+6 regression tests; canonical-accepts fixture updated to use a real-
shape NIP-44 v2 sample.

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Second codex review pass on PR #593 flagged two more P2s:

1. (engram_fetch.rs) When the relay returns kind:30174 events addressed
   to the agent but none decrypt (wrong key, MAC failure, body schema
   mismatch, or an event injected by another party that happened to be
   p-tagged at this agent), the previous code returned Ok(None) — which
   the harness then renders as the onboarding nudge, inviting the agent
   to overwrite a real-but-unreadable core.

   Now distinguish three outcomes:
     - empty array            → Ok(None)     (confirmed absence; nudge)
     - >=1 event decrypts     → use winning head
     - non-empty, none decrypt→ Err          (fail closed; no section)

   Extracted the post-query decode logic into a pure decode_core_body()
   helper so it's unit-testable without mocking RestClient. Added 5
   tests: empty-array-absent, valid-core-returns-profile,
   undecryptable-is-err-not-absent (the regression), non-core-body-is-
   absent, unparseable-candidates-is-err.

2. (commands/mem.rs) The module doc comment claimed `sprout mem` and
   `sprout mem ls` were equivalent, but the clap wiring requires a
   subcommand so bare `sprout mem` exits with a usage error. Drop the
   false claim — bare-group-shows-help is the convention across the
   other 12 subcommand groups; adding a default action just for mem
   would be inconsistent.

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Third codex pass on PR #593 flagged that engram events were stored as
global, so any authenticated relay member could REQ
`{"kinds":[30174]}` and harvest:
- the encrypted ciphertext (no plaintext but still a fingerprint)
- the public `#p` (owner pubkey)
- the public `#d` (HMAC-derived per-slug fingerprint)
- timestamps (write-activity patterns)

Together that leaks who-pairs-with-which-agent + when they're active.
Strictly speaking the NIP-AE design encrypts content for confidentiality
but assumes the relay enforces read gating on the event metadata.

Add a new `engram_filters_authorized` predicate alongside the existing
`p_gated_filters_authorized`. A filter that can match KIND_AGENT_ENGRAM
must satisfy at least one of:
  - `authors` non-empty AND every entry == authed (agent reading own), or
  - `#p`     non-empty AND every entry == authed (owner reading addressed-to-self).
Specific-event-ids lookups (`ids: [...]`) are exempt — knowing the id
implies prior authorization.

Hook into all four read paths:
  - WS REQ (historical + live subscription registration)
  - WS COUNT
  - HTTP /query
  - HTTP /count

+9 unit tests:
  agent_querying_own, owner_querying, owner_no_authors, ids_lookup,
  skips_non_engram_kinds (positive); unrelated_reader,
  bare_kind_filter, wildcard_kind_filter, mixed_authors_with_unauthed
  (negative).

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
The round-4 fix added p_gated/engram_filters_authorized gates to the
WS REQ historical-delivery branch, WS COUNT, HTTP /query, and HTTP
/count — but missed the WS REQ NIP-50 search branch, which intercepts
before reaching the gate. Since kind:30174 envelopes are indexed in
Typesense (only NIP-17 gift wraps are skipped), an authenticated
relay member could send

    {"search":"*","kinds":[30174]}

and harvest every engram ciphertext + owner #p + slug #d fingerprint
on the relay, leaking the metadata the round-4 gate was specifically
written to protect.

Fix: move the two filter-auth checks above the search early-return.
The same reordering also closes the equivalent search-bypass for
the pre-existing P_GATED_KINDS (observer frames, member notifications)
which are likewise globally stored, indexed, and were previously
only gated on the non-search path.

+4 regression tests asserting the gate rejects search-shaped attack
filters and still allows authored search.

Found by codex review round 5.

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>

* origin/main:
  dev-mcp: add view_image tool (#602)
  fix(relay,desktop): only advertise NIP-43 when enforced; probe pairing by supported_nips (#601)
  fix(desktop): derive unread state from NIP-RS + relay catch-up only (#599)
  docs(testing): rewrite TESTING.md for current API and CLI-first workflow (#597)
  fix(agent): fix OpenAI-compat request body serialization and max_tokens (#595)
  feat(desktop): per-persona and per-agent env var overrides (#594)
tlongwell-block and others added 2 commits May 15, 2026 17:47
The HTTP /query NIP-50 search path in handle_bridge_search pushed only
kind/authors/time/channel into Typesense and applied a channel-access
post-filter, but did not enforce the rest of the requesting filter
against the fetched events. The WS NIP-50 path does (handlers/req.rs).

For NIP-AE this meant the engram read gate (which authorizes the
*filter*: kind=30174 with author=self or #p=self) was bypassed for
/query specifically: an authorized search like
  {"search":"foo","kinds":[30174],"#p":[owner_self]}
could return text-matching engram envelopes whose #p belongs to a
different owner (or an authors=[agent_self] search could return
events authored by other agents), because Typesense doesn't see #p
and the post-filter wasn't running.

Fix: extract the per-hit acceptance logic into search_hit_accepted()
and call sprout_core::filter::filters_match against the current
filter before channel-access and dedup. This mirrors the WS post-
filter at handlers/req.rs and locks the bridge to the same NIP-01
semantics.

Tests: three unit tests covering the leak — mismatched #p tag,
mismatched author, and channel scope — exercising the helper that
owns the fix. Full suites Mari named also green: engram (17),
engram_envelope (12), engram_gate (12), engram_fetch (5).

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Bring the PR up to date with origin/main. Two conflicts resolved:

* crates/sprout-acp/src/main.rs — main extracted the binary body into
  sprout-acp/src/lib.rs (commit 70cb53e, "Add Sprig all-in-one agent
  binary"). Took main's 3-line shim verbatim and replayed Sami's three
  hunks against lib.rs instead:
    - declare `mod engram_fetch;`
    - clone `startup_owner` for `OwnerCache::new` so we can also use it
      for the PromptContext below
    - thread `agent_keys` + `agent_owner_pubkey` into PromptContext
      construction (needed by the NIP-AE core fetch in pool.rs)

* Cargo.lock + desktop/src-tauri/Cargo.lock — took theirs; cargo check
  --workspace produced no further changes (all engram deps were already
  present on main).

Verified:
* cargo check --workspace — clean
* cargo clippy --workspace --all-targets -- -D warnings — clean
* cargo test -p sprout-core -p sprout-acp -p sprout-cli -p sprout-relay
  — all green, including the 17 engram tests, 5 engram_fetch tests
  (covering the d7842a0 fail-closed regression), and 13 format_prompt
  tests (the two new agent_core injection cases included).

Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Signed-off-by: Tyler Longwell <tlongwell@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant