feat(blueprints): create blueprints/users/Varunrnair/sakhi-expert-maternal-health-benchmark.yml by Varunrnair · Pull Request #28 · weval-org/configs

Varunrnair · 2026-06-08T14:44:47Z

Blueprint Contribution

Blueprint Details

Blueprint ID: sakhi-expert-maternal-health-benchmark
Category/Focus: public-health, maternal-health, healthcare-safety
Models to test: CORE

What This Blueprint Tests

Evaluates model understanding of maternal health topics using 149 doctor-validated questions, with clinician-reviewed reference answers across English, Hindi, and Marathi (447 evaluation cases total).
Tests whether responses align with expert-curated, evidence-based guidance on pregnancy, antenatal care, maternal complications, and related clinical scenarios, representing the expert track of the Sakhi benchmark.
Assesses the model's ability to communicate maternal health information accurately, responsibly, and precisely when evaluated against theme-specific clinical rubrics and supporting source citations.
Measures consistency on high-stakes maternal health questions where incorrect, incomplete, or ambiguous guidance could contribute to real-world harm, including in rural and semi-urban healthcare contexts.
Evaluates multilingual parity by measuring whether response quality remains consistent across English, Hindi, and Marathi rather than concentrating performance in a single language.

Checklist

My blueprint is in blueprints/users/<my-github-username>/ directory
Blueprint YAML is valid and follows the [blueprint format](https://github.com/weval-org/configs/blob/main/README.md)
Each prompt has a meaningful, descriptive id (e.g., france-capital-test, not p1 or auto-generated)
Blueprint has clear success criteria (should assertions with specific criteria)
I've used $not_* functions instead of should_not blocks where applicable
I've tested the blueprint locally if possible (pnpm cli run <path-to-blueprint>)
I agree to dedicate my contribution to the public domain under CC0 1.0 Universal

Notes

This blueprint expands CivicEval's public-health coverage by evaluating maternal health questions curated and validated by clinical experts. The benchmark focuses on evidence-based maternal health guidance across English, Hindi, and Marathi, enabling assessment of both clinical accuracy and multilingual consistency on high-impact public-health topics.

Automated Evaluation: This PR will trigger an automated evaluation with cost-controlled limits (max 10 prompts, CORE models only). Full evaluation runs automatically after merge.

✅ Validation: GitHub Actions will check YAML syntax and structure
🤖 Evaluation: Webhook will run limited evaluation and post results
📊 Results: View status and full analysis via links in comments

…ernal-health-benchmark.yml on new branch

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

weval-bot · 2026-06-08T14:44:51Z

❌ Blueprint validation failed

blueprints/users/Varunrnair/sakhi-expert-maternal-health-benchmark.yml: Failed to fetch blueprint content

nojibe · 2026-06-11T23:13:35Z

Hi @Varunrnair — thanks for this contribution. Also, great meeting you today!

The automated evaluation failed with Failed to fetch blueprint content, and we've tracked down why: the blueprint file is ~1.14 MB, and the GitHub Contents API doesn't return inline content for files over 1 MB. Our fetcher therefore receives empty content and the eval can't run.

Suggested fix: split the blueprint by language, so each file stays well under 1 MB:

blueprints/users/Varunrnair/sakhi-expert-maternal-health-en.yml
blueprints/users/Varunrnair/sakhi-expert-maternal-health-hi.yml
blueprints/users/Varunrnair/sakhi-expert-maternal-health-mr.yml

Each file keeps its own config header (you can adjust title/description/tags per language, e.g. add a language tag). This has a side benefit too: per-language results make the multilingual-parity comparison you describe much easier to read on the dashboard.

We're also planning a fix on the app side so large blueprints are handled more gracefully in the future, but the split above will unblock this PR right away.

Thanks again! 🙏

feat(blueprints): create blueprints/users/Varunrnair/sakhi-expert-mat…

45c1fd0

…ernal-health-benchmark.yml on new branch

claude Bot reviewed Jun 8, 2026

View reviewed changes

This was referenced Jun 11, 2026

feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml #27

Open

fix: report clear size-limit errors for oversized blueprints in webhooks weval-org/app#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blueprints): create blueprints/users/Varunrnair/sakhi-expert-maternal-health-benchmark.yml #28

feat(blueprints): create blueprints/users/Varunrnair/sakhi-expert-maternal-health-benchmark.yml #28
Varunrnair wants to merge 1 commit into
weval-org:mainfrom
Varunrnair:proposal/sakhi-expert-maternal-health-benchmark-1780927875934

Varunrnair commented Jun 8, 2026

Uh oh!

claude Bot left a comment

Uh oh!

weval-bot Bot commented Jun 8, 2026

Uh oh!

nojibe commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Varunrnair commented Jun 8, 2026

Blueprint Contribution

Blueprint Details

What This Blueprint Tests

Checklist

Notes

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

weval-bot Bot commented Jun 8, 2026

Uh oh!

nojibe commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nojibe commented Jun 11, 2026 •

edited

Loading