Evaluate anything through a synthetic audience before it goes live. Zero dependencies. Self-configuring.
Synthetic Panel creates a matrix of personas × contexts × metrics — simulating how different people react to your content in different situations. It runs inside Claude Code, spawns parallel evaluation agents, and produces a scored report with actionable risks and strengths.
Requires Claude Code (CLI, desktop app, or IDE extension). The evaluation engine spawns 5 parallel agents — a feature only available in Claude Code. This skill will not work in ChatGPT, Claude.ai web chat, or other LLM interfaces without agent spawning support.
Clone the skill into your project's .claude/skills/ directory:
git clone https://github.com/Todmy/synthetic-panel.git .claude/skills/synthetic-panelOr for personal use across all projects:
git clone https://github.com/Todmy/synthetic-panel.git ~/.claude/skills/synthetic-panelThe skill appears as /synthetic-panel in Claude Code.
/synthetic-panel
The skill detects missing data and starts an interactive setup:
What are you evaluating?
1. Social media posts (LinkedIn, Twitter/X, newsletters)
2. Code (PRs, architecture decisions)
3. Product features / proposals
4. Marketing copy (ads, emails, landing pages)
5. Custom — I'll describe my use case
Pick a preset or describe your domain. The wizard generates personas, contexts, and metrics — ready in under a minute.
Test social media posts, newsletter drafts, or blog articles before publishing. The panel simulates your actual audience segments — who stops scrolling, who comments, who shares, who ignores.
/synthetic-panel "Full-stack developer goes outdated. Now, there's something new..."
Reach Score: 67/100 | Buyer Score: 86/100
STRENGTHS:
Sr Dev→Product (7%) — ego 4.5. Will comment about outsource reality.
Dev Advocate (3%) — Will repost with their own take.
RISKS:
HR Manager (3%) — Developer jargon excludes them entirely.
Simulate how your team would review a PR. Different reviewers catch different things — a security engineer spots vulnerabilities, a junior dev flags readability, an architect questions design choices.
/synthetic-panel "PR: Refactor auth middleware to support OAuth2 + SAML..."
Quality Score: 58/100 | Risk Score: 72/100
STRENGTHS:
Tech Lead — architectural_fit 7.2. Clean separation of concerns.
DevOps — deployment impact low, no infra changes needed.
RISKS:
Security Engineer — security_concerns 8.1. Token storage needs review.
QA — test_coverage_need 7.5. No integration tests for SAML flow.
Before pitching a feature to stakeholders, test how different roles react. CTOs care about feasibility, PMs about roadmap fit, sales about deal enablement, support about ticket reduction.
/synthetic-panel "Proposal: AI-powered search that learns from user behavior..."
Approval Score: 45/100 | Priority Score: 38/100
STRENGTHS:
End User — user_impact 6.8. Solves a daily frustration.
Sales Rep — competitive_advantage 5.9. Differentiator in demos.
RISKS:
CTO — feasibility 3.2. "Where's the technical spec? This is a wish list."
Support — concerns: "Who handles when AI search gives wrong results?"
Test headlines, email subject lines, landing page copy, or ad variations across customer segments. Find out which message resonates before spending ad budget.
/synthetic-panel "Stop losing deals to competitors who demo better than you."
Impact Score: 52/100 | Conversion Score: 61/100
STRENGTHS:
VP Sales persona — emotional_resonance 7.1. "This is my exact problem."
Startup founder — click_intent 6.3. Wants to see the product.
RISKS:
Enterprise buyer — trust_signal 2.8. "Sounds like every other SaaS pitch."
Test how investors, board members, or conference attendees would react to your slides or talking points.
Before sending a company-wide announcement (reorg, policy change, new tool rollout), simulate how different teams react.
Test whether your job posting attracts the right candidates or scares them away. Simulate how senior devs, career changers, and passive candidates perceive it.
If multiple people will judge your work, the panel can simulate them first. Define your personas, contexts, and metrics — the engine handles the rest.
1. Parallel Agents — Your personas are split into batches of 4. Each batch runs as a parallel Claude Code agent, evaluating content across all contexts simultaneously. A full 20×20 panel completes in ~2 minutes.
2. Action-Value Scoring — Not all engagement is equal. Each metric has a configurable weight reflecting its real-world impact:
action_value = Σ(metric × weight)
For social media: a comment (×5) drives more reach than a like (×1). For code review: requesting changes (×8) is a stronger signal than approving (×1). You define the weights.
3. Hook Gate — Evaluators can't engage with content they didn't read. If attention metrics score below threshold, action scores are proportionally suppressed:
hook_gate = min(1.0, attention / threshold)
4. Log Scaling — First few engaged personas matter most. Diminishing returns beyond the core audience:
score = k × ln(1 + predicted_reach)
5. Calibration — Feed real performance data after each evaluation. The panel parses numbers from natural language, computes a weighted_score, and tracks synthetic↔real correlation:
weighted_score = reactions×1 + comments×5 + reposts×10 + saves×2
Weights are configurable via action_weights in panel/data/config.json. Minimum 4 posts with both synthetic and real scores needed for correlation. Ask "How accurate is the panel?" to get a Pearson correlation report (r > 0.7 = strong, 0.5–0.7 = moderate, < 0.5 = needs more data).
Tested on 12 social media posts: Spearman rank correlation 0.951 (11 of 12 posts ranked within ±1 of real performance).
Everything is conversational. No separate commands needed:
"Why did the CTO score low?" → analyzes evaluation data
"Here's a revised version: ..." → re-evaluates + shows comparison
"Add real metrics: 45K views, 89 likes" → calibrates against reality
"How accurate is the panel?" → shows correlation report
"Add a persona: QA Engineer" → updates panel config
"Switch to code review preset" → reconfigures the panel
"Show history" → lists past evaluations
/synthetic-panel setup
Switch presets, add personas, change metrics, adjust weights.
.claude/skills/synthetic-panel/ ← the skill (cloned from this repo)
├── SKILL.md ← routing + init wizard (~500 words)
└── engine.md ← evaluation engine + presets (loaded on demand)
panel/ ← auto-generated by init wizard
├── data/
│ ├── config.json ← domain settings, weights, score names
│ ├── personas.json ← evaluator profiles
│ ├── contexts.json ← situational modifiers
│ └── metrics.json ← evaluation dimensions with anchors
└── evaluations/ ← scorecard reports (auto-generated)
All data files are plain JSON — edit them directly or through conversation.
Each operation consumes tokens. Numbers below are upper bounds measured on real runs with 20 personas × 15 contexts.
| Operation | Tokens (up to) | What happens |
|---|---|---|
| Setup (Init Wizard) | ~150K | Reads presets, generates 4 JSON files (config, personas, contexts, metrics), creates directories |
| One evaluation run | ~400K | 5 parallel agents (~60K each) + orchestration, aggregation, report generation |
| Calibration (per post) | ~60K | Single agent evaluates all personas × contexts, computes reach/buyer scores |
Estimated cost per evaluation run:
- Claude Opus: up to ~$8
- Claude Sonnet: up to ~$2
Full calibration (12 posts): up to 720K tokens total ($12 Opus, ~$3 Sonnet).
Token usage scales with panel size. Fewer personas or contexts = fewer tokens.
- Claude Code (CLI, desktop app, or IDE extension)
- That's it.
MIT