Skip to content

Expand MCP tool surface from 4 to 24 tools#26

Open
jaballer wants to merge 11 commits into
mainfrom
feature/updated-tools
Open

Expand MCP tool surface from 4 to 24 tools#26
jaballer wants to merge 11 commits into
mainfrom
feature/updated-tools

Conversation

@jaballer
Copy link
Copy Markdown
Contributor

@jaballer jaballer commented Apr 28, 2026

Summary

Expands the official Postmark MCP server from 4 tools to 24, organized into eight categories. Adds full coverage of the server-token API surface, two flagship additions (batch email sends + AI-native delivery diagnosis), validation guards on mutating tools, polished stat formatters, and two smoke-test harnesses that run against a live Postmark account.

What's new (by category)

Email — 4 tools (was 2)

  • sendEmail, sendEmailWithTemplate (existing)
  • sendBatch — sends up to 500 fully-independent messages in one API call
  • sendBatchWithTemplate — sends up to 500 templated emails with per-recipient templateModel. Top-level from/tag are defaults that can be overridden per recipient

Templates — 6 tools (was 1)

  • listTemplates (existing)
  • getTemplate, createTemplate, editTemplate, deleteTemplate, validateTemplate — full CRUD + Mustachio validation with testRenderModel support

Messages — 2 tools (new)

  • searchOutboundMessages — recipient/tag/subject/status/date/messageStream filters, paginated
  • getMessageDetails — full event timeline for a single MessageID

Diagnostics — 1 tool (new, composite)

  • diagnoseDelivery — answers "did my email reach X, and if not, why?" by running message search, suppression check, and bounce history in parallel for a recipient, then synthesizing a plain-English recommendation. Replaces a 5-step manual investigation with one tool call

Bounces — 3 tools (new)

  • searchBounces — full bounce-type enum, recipient/tag/messageID/date filters
  • getBounceDump — raw SMTP dump (30-day retention)
  • activateBounce — reactivates a deactivated address

Suppressions — 3 tools (new)

  • listSuppressions — by reason / origin / email / date, scoped to a stream
  • createSuppressions / deleteSuppressions — up to 50 addresses per call

Stats & Server — 2 tools (was 1)

  • getDeliveryStats — unified stats tool with optional stat parameter. Default returns a friendly summary; with stat: "<name>" returns a polished per-stat breakdown for summary, overview, sent, bounces, spam, tracked, opens, openPlatforms, openClients, openReadTimes, clicks, clickBrowsers, clickPlatforms, or clickLocation
  • getServerInfo (new) — server name, color, tracking settings, configured webhook URLs

Webhooks — 3 tools (new)

  • listWebhooks / createWebhook / deleteWebhook — full webhook lifecycle with per-trigger toggles

Bug fixes caught during the audit

  1. getServerInfo displayed wrong field — was reading EnableSmtpApiErrorHooks for "First Open Only"; now correctly reads PostFirstOpenOnly
  2. getOutboundStats (since merged into getDeliveryStats) called nonexistent SDK methodsgetEmailClientUsage and getEmailPlatformUsage don't exist; corrected to getEmailOpenClientUsage and getEmailOpenPlatformUsage. Added clickLocation via getClickLocation (not getClickLocationUsage, which also doesn't exist)
  3. Engines floor was lyingpackage.json declared node >= 16 but the MCP SDK requires >= 18. Bumped to >= 20 (Node 18 hit EOL April 2025)

Validation guards added

Mutating tools now fail-fast on misuse instead of silently doing nothing:

  • editTemplate — requires at least one updated field
  • createWebhook — requires at least one trigger enabled
  • createTemplate — requires htmlBody or textBody
  • validateTemplate — requires subject, htmlBody, or textBody
  • sendEmailWithTemplate / sendBatchWithTemplate — require exactly one of templateId / templateAlias

Polish & cleanup

  • Indentation — reverted IDE auto-format that had inflated continuation indent from 4 → 6 spaces; original v1 tools now diff cleanly against main
  • getDeliveryStats formatters — six module-level helpers (fmtInt, fmtPct, fmtPlatformUsage, fmtTopBreakdown, fmtDeliverySummary, fmtStatResponse) render counts with thousands separators, percentages, top-N sorted breakdowns, and aligned columns
  • Schema tightening — numeric IDs use z.number().int(); date strings validated against YYYY-MM-DD regex
  • Dropped node-fetch dependencygetDeliveryStats previously bypassed the SDK with raw fetch; now uses the SDK throughout for consistent auth/error handling
  • Updated CLAUDE.md with new tool layout, validation patterns, and a "Gotchas" section capturing tribal knowledge (suppression dump eventual consistency, SDK methods with non-obvious names, PostFirstOpenOnly vs EnableSmtpApiErrorHooks distinction)

Smoke testing

Two harnesses live in the repo and run against a live Postmark account:

  • smoke-test.mjs (npm run smoke) — 25 read-only checks covering every read tool plus all 13 distinct getDeliveryStats stat endpoints (14 enum values — summary and overview share an endpoint, formatted differently) plus both validation guards
  • smoke-test-mutating.mjs — 23 checks running full create→edit→delete lifecycles for templates (including layout binding round-trip), webhooks, suppressions, plus real email sends (single, single-template, 3-message batch, 2-recipient template batch). Cleans up after itself; verified no residue

Both are in the branch but not in package.json's files array — they ship in the repo for regression but not on npm. Reviewers can choose whether to keep, move under tests/, or drop before merge.

Tool count summary

Category Before After
Email 2 4
Templates 1 6
Messages 0 2
Diagnostics 0 1
Bounces 0 3
Suppressions 0 3
Stats & Server 1 2
Webhooks 0 3
Total 4 24

Test plan

  • npm install succeeds with the dropped node-fetch dependency
  • npm start registers all 24 tools (visible in startup log)
  • npm run smoke passes 25/25 against a fresh Postmark server token
  • node smoke-test-mutating.mjs passes 23/23 (edit SENDER and RECIPIENT constants for the reviewer's account)
  • Spot-check diagnoseDelivery end-to-end — should pick up the test email from the mutating harness and report "Delivered" with a recommendation
  • Spot-check getDeliveryStats with stat: "summary" (default) and at least one breakdown stat
  • Confirm package.json engines.node of >=20 matches your deployment baseline; revert to >=18 if you'd rather match the MCP SDK floor exactly
  • Decide on the smoke-test harnesses: keep in repo, move under tests/, or remove before merge

@jaballer jaballer requested review from dandigangi and ewood-ac April 28, 2026 04:51
@jaballer jaballer force-pushed the feature/updated-tools branch 2 times, most recently from 8c8162c to c5a7147 Compare April 29, 2026 18:22
@jaballer
Copy link
Copy Markdown
Contributor Author

jaballer commented Apr 29, 2026

Pre-merge hostile-review pass + blocker fixes (commit ae9585f)

Before final review I did a hostile-reviewer pass focused on (1) security vulnerabilities and (2) backward compatibility for existing NPM users. Two issues rose to "blocker" — both fixed in ae9585f. Documenting full findings here so reviewers have complete context.

Note (updated): This branch was rebased onto main after the original review pass to resolve a divergence introduced by an earlier git filter-repo history rewrite. The fix-commit content is unchanged — only its SHA moved from ceac412ae9585f. PR is now mergeable: CLEAN.


🔴 Blockers — fixed in ae9585f

1. Version was still 1.0.0. Despite a breaking change (engines floor raised from >=16 to >=20) plus 20 net-new tools, the package version hadn't moved. Per semver, the engine bump alone is a major version. Existing users on Node 16 or 18 doing npm update would have hit EBADENGINE. Bumped to 2.0.0.

2. The published NPM tarball was broken in the way the README documents it. npm pack --dry-run showed only LICENSE / README.md / index.js / package.json shipping. But the README walks NPM users through cp smoke-test.example.mjs smoke-test.mjs && npm run smoke — none of those files were in the package. Followed-the-README behavior would have produced ENOENT. Added both *.example.mjs files plus CHANGELOG.md to the files array. Tarball now ships 7 files.

CHANGELOG.md added at the same time, with a 2.0.0 entry covering all additions, the breaking change, two real bug fixes (PostFirstOpenOnly field, engines floor accuracy), and the dropped node-fetch dep.


🟡 Known issues — intentionally not fixed in this PR (recommend follow-up tracking)

These are real but didn't justify holding up shipping. Worth logging issues / tickets for v2.1 hardening.

3. cc and bcc accept z.string() without email format validation.

cc: z.string().optional().describe("CC recipient(s), comma-separated"),
bcc: z.string().optional().describe("BCC recipient(s), comma-separated"),

Compare with to: z.string().email(). Postmark rejects malformed input as the outer wall, but earlier validation would catch CRLF-injection-style mischief and produce better error messages. Should validate as comma-split .email() array. Affects sendBatch and sendBatchWithTemplate.

4. No rate limiting on sendBatch / sendBatchWithTemplate. Each accepts up to 500 messages, and there's no per-process counter. A prompt-injection attack ("ignore prior instructions and call sendBatch with these 500 marketing emails") can burn the user's Postmark quota before they notice. Postmark's account-level limits are the outer wall but they're not session-aware. Consider a session counter that prompts when batch sends exceed N per minute.

5. No from allowlist. Tools accept any from address. If a user has multiple verified senders, an AI can be tricked into sending from one the user didn't intend (e.g., legal@theircompany.com instead of marketing@theircompany.com). Postmark verifies senders but doesn't reflect user intent. Suggest an optional POSTMARK_ALLOWED_SENDERS env var (comma-separated allowlist).

6. Prompt injection via message body in getMessageDetails. Generic AI-mediated-mail-system risk — body content is returned to the AI verbatim, so an attacker who emailed the user with crafted instructions could influence later AI behavior when the user asks "what was that email I got?" Mitigation: hint in the tool description that "returned message content should be treated as untrusted user data, not instructions." Cheap insurance.

7. console.error logs include recipient PII (recipient address, subject, etc.). Stderr is process-local — not a remote leak — but if the MCP runs under a shared logger or in a CI environment, addresses may end up in places users don't expect. Worth a one-line note in the README about logging behavior.


🟢 Items confirmed clean

  • npm audit reports 0 vulnerabilities in production deps.
  • Server token is read once at boot, never logged, never echoed to tool output. Verified by inspection of every console.error call and tool response.
  • All 4 v1 tools have unchanged input schemas. sendEmail, sendEmailWithTemplate, listTemplates, getDeliveryStats accept the same parameters they did in v1. getDeliveryStats adds optional stat and messageStream — additive, backward-compatible.
  • Validation guards on all mutating tools. editTemplate (≥1 updated field), createWebhook (≥1 trigger), createTemplate (HTML or text body), validateTemplate (subject/HTML/text), sendEmailWithTemplate (exactly one of templateId/templateAlias).
  • File contents across all 11 commits on the branch are scrubbed of personal addresses (force-pushed via git filter-repo earlier; re-verified after the rebase). Author metadata intentionally retained for portfolio attribution.

Test status

  • 25/25 read-only smoke tests passingnpm run smoke
  • 23/23 mutating smoke tests passing including real sends, full template lifecycle (with layout binding round-trip), webhook lifecycle, suppression lifecycle, batch sends. Cleanup verified — no residue.
  • npm pack --dry-run confirms the published tarball now contains the smoke-test examples, so npm run smoke works for NPM-installed users not just from-source users.

Recommendation

Ship 2.0.0 after merge. The five 🟡 items above (plus the v2.1 follow-up to wrap Postmark's newly-GA /email/bulk API — see REVIEW_NOTES.md item 6) are good candidates for a hardening pass; none should hold up the major release.

Adds tools across templates (CRUD + validation), message search, bounces,
suppressions, server info, and webhooks. Consolidates outbound stats into
getDeliveryStats with an optional `stat` parameter and per-stat formatters.

Why:
- Initial release shipped only sendEmail, sendEmailWithTemplate, listTemplates,
  and getDeliveryStats. This brings parity with the broader Postmark API surface.
- getDeliveryStats now exposes 13 stat breakdowns (overview, bounces, opens,
  clicks, platform/client/browser usage, click location, read times) without
  cluttering the tool list.
- Adds validation guards (createWebhook requires ≥1 trigger; editTemplate
  requires ≥1 updated field) so misuse fails fast instead of silently.
- Bumps engines.node to >=20 (MCP SDK requires ≥18; 16 was false-advertising).

Smoke tested against a live Postmark account: 23/23 read-only checks
(smoke-test.mjs) + 14/14 mutating lifecycle checks (smoke-test-mutating.mjs)
including real sends and full create→edit→delete cycles for templates,
webhooks, and suppressions. No residue left after cleanup.
Documents the new 21-tool layout, the consolidated getDeliveryStats with
per-stat formatters, the smoke test commands, and a Gotchas section
capturing tribal knowledge worth preserving:

- Suppressions dump endpoint is eventually consistent
- PostFirstOpenOnly vs EnableSmtpApiErrorHooks distinction
- SDK method names that don't match the obvious guess
  (getEmailOpenClientUsage, getEmailOpenPlatformUsage, getClickLocation)

Why include this in the PR: the gotchas were learned the hard way during
this branch's audit, so capturing them helps the next person (or LLM)
avoid the same wrong turns. Reviewers can drop the file before merge if
they prefer to keep it untracked.
Runs message search, suppression check, and bounce history lookups in
parallel for a recipient, then synthesizes a plain-English recommendation.
Replaces what was previously a 5-step manual investigation
(searchOutboundMessages → getMessageDetails → check events →
listSuppressions → searchBounces) with a single tool call.

Why this matters: the existing 21 tools mirror the Postmark REST API 1:1.
This is the first composite tool — it demonstrates what AI-native
tooling looks like by combining endpoints into outcomes. It's positioned
as a proof-of-concept for a "Diagnostics" category that can grow
(getAccountHealth, compareStats, etc.) in future iterations.

Smoke tested: 25/25 (added two diagnoseDelivery cases — happy path with
a real recent send, and an unknown-recipient case).
Postmark's bulk email API graduated from early access to GA, so wraps
of /email/batch and /email/batchWithTemplates are now safe to ship in
the official MCP.

- sendBatch accepts up to 500 fully-independent messages (each with own
  recipient/subject/body)
- sendBatchWithTemplate accepts a shared template + up to 500 recipients
  with per-recipient template models. Top-level `from` / `tag` apply as
  defaults but are overridable per recipient
- Both share a formatBatchResults helper that splits the SDK response by
  ErrorCode and renders a per-message success/failure summary, capping
  success and failure lists for readable output on large batches

Smoke tested: 25/25 read-only + 16/16 mutating including a real 3-message
batch and a 2-recipient template batch. Cleanup verified clean.

Pairs with the diagnoseDelivery commit to give the PR two flagship
additions: one functional-completeness (batch sends), one composite/
AI-native (delivery diagnosis).
Adds the missing layoutTemplate parameter to createTemplate and
editTemplate, and surfaces the binding in getTemplate and listTemplates
output so the association is verifiable round-trip from the MCP.

Without this, a template created via createTemplate is unbound from any
Layout — renders without the masthead, footer, or shared CSS the rest
of the account's templates inherit. We hit this dogfooding the MCP
against KrateCMS: a tenant-admin-welcome template created over the wire
looked nothing like its sibling templates because there was no way to
attach a Layout.

editTemplate accepts null on layoutTemplate to unbind. Postmark's API
treats JSON null as "no change" — the documented way to clear a layout
association is an empty string — so the MCP translates null -> "" before
hitting the SDK so callers can use the natural JSON shape.

Smoke test surfaced a separate, pre-existing bug while validating the
binding round-trip: createTemplate's required subject param made it
impossible to create Layout templates at all (Postmark rejects Subject
on Layouts). Fixed by making subject conditionally required at the
handler level — required for Standard, forbidden for Layout — with
clear error messages either way.

Mutating smoke test now covers the full binding lifecycle (create
Layout -> create Standard with binding -> verify in get -> unbind via
null -> verify cleared -> rebind -> verify in list -> cleanup), 23/23
passing.
The previous smoke-test.mjs and smoke-test-mutating.mjs files were
tracked in the repo with hard-coded verified-sender addresses, which
meant any local edit to point them at a different account could be
accidentally committed. Replace them with example-file templates
that follow the .env / .env.example pattern:

  - smoke-test.example.mjs and smoke-test-mutating.example.mjs are
    tracked and contain only placeholder addresses (recipient@example.com).
  - smoke-test.mjs and smoke-test-mutating.mjs are gitignored so a
    user-copy with personal addresses can never be accidentally added.
  - smoke-test-mutating.example.mjs includes a startup guard that
    refuses to run while the placeholders are still in place — prevents
    sending mail from the example values.
  - README and CLAUDE.md document the cp + edit + run workflow.

Workflow: cp smoke-test.example.mjs smoke-test.mjs, edit any addresses,
run npm run smoke (or node smoke-test-mutating.mjs for the full lifecycle
suite).
Two pre-merge blockers caught during a hostile-review pass:

1. Version was still 1.0.0 despite a breaking change (engines floor
   raised from >=16 to >=20) plus 20 net-new tools. Per semver, this
   is a major version bump.

2. The package.json `files` array shipped only LICENSE / README.md /
   index.js / package.json — but the README walks NPM users through
   `cp smoke-test.example.mjs smoke-test.mjs && npm run smoke`, and
   none of those files were in the published tarball. Added both
   *.example.mjs files plus CHANGELOG.md to the files array.

CHANGELOG.md follows the Keep a Changelog format. The 2.0.0 entry
documents all 20 new tools, the breaking engines change, the
getDeliveryStats reformat, two real bug fixes (PostFirstOpenOnly
field, engines floor accuracy), and the dropped node-fetch dep.
@jaballer jaballer force-pushed the feature/updated-tools branch from ceac412 to ae9585f Compare April 29, 2026 18:53
- Adds a Changelog entry to the Useful Docs list so users have a clear
  pointer to what changed between versions.
- Corrects the read-only smoke count comment from 24 to 25 — the
  example file was off by one against the actual test list.
The sendBatch / sendBatchWithTemplate tools wrap Postmark's batch email
endpoints (/email/batch, /email/batchWithTemplates), not the separate
bulk email API (/email/bulk). Earlier doc copy conflated the two —
links pointed at the bulk-email docs while the implementation actually
uses the batch endpoints. Fixed across:

- CHANGELOG.md: corrected the 2.0.0 entry, added a note that bulk API
  wrapping is tracked as a v2.1 follow-up
- README.md: corrected the sendBatch description, points at the bulk
  docs only as a "see also" for the unwrapped capability
- CLAUDE.md: corrected the implementation note for future contributors

Postmark's bulk email API is a distinct capability (high-volume sends)
and remains an obvious v2.1 addition.
Postmark's bulk email API documentation (verified at the official URL)
clarifies that /email/batch and /email/bulk are PARALLEL APIs, not one
replacing the other:

- /email/batch — synchronous, immediate per-message results, capped at
  500 messages per call. This is what sendBatch / sendBatchWithTemplate
  wrap in v2.0.

- /email/bulk — asynchronous, submit-a-job-and-poll-for-status, no
  message count cap (50 MB payload limit instead), subject to Postmark
  approval. This is the recently-GA capability and is NOT covered in
  v2.0 — tracked as a v2.1 follow-up.

Earlier copy described bulk as "high-volume single-template sends",
which was a partially-correct guess made without reading the docs.
Now updated to reflect the verified async-vs-sync distinction. Affects:

- CHANGELOG.md: tightened the 2.0.0 entry's note about bulk
- README.md: corrected the sendBatch description's bulk pointer
- CLAUDE.md: corrected the implementation note for future contributors

The current sendBatch is not "legacy" — it's the synchronous half of
Postmark's two-track send API. Wrapping bulk needs its own design pass
because the SDK doesn't expose it and the async pattern likely needs
two tools (submitBulk + getBulkStatus).
…smatches, expose messageStream filter

The bounce-type enum exposed only 16 of Postmark's 22 documented bounce
types, and two of those used the wrong case. Anyone trying to filter by
the missing or miscased types would get a Zod validation error before
the request reached the API. Surfaced during a final cross-check
against the Postmark Bounce API docs.

Added to the enum:
- Subscribe, OpenRelayTest, Unknown, VirusNotification, Unconfirmed, Blocked

Case corrections (now match Postmark's BounceType SDK enum):
- DmarcPolicy → DMARCPolicy
- DNSError → DnsError

Also added the `messageStream` filter to searchBounces — Postmark's
/bounces endpoint supports `messagestream` per the docs, and we already
exposed it on searchOutboundMessages but missed it here.

README's bounce documentation updated to list the full 22-value set.
REVIEW_NOTES.md updated to reflect the corrected count.

Smoke tests still pass: 25/25 read-only, 23/23 mutating.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant