feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml by Varunrnair · Pull Request #27 · weval-org/configs

Varunrnair · 2026-06-08T14:39:25Z

Blueprint Contribution

Blueprint Details

Blueprint ID: sakhi-non-expert-maternal-health-benchmark
Category/Focus: public-health, maternal-health, healthcare-safety
Models to test: CORE

What This Blueprint Tests

Evaluates model understanding of maternal health topics using 231 community-sourced questions, with reference answers reviewed by healthcare nonprofit staff and ASHA workers, across English, Hindi, and Marathi (693 evaluation cases total).
Tests whether responses align with deployment-oriented reference answers that reflect how maternal health questions are asked and answered in community health settings, rather than only textbook or clinical formulations.
Assesses the model's ability to provide accurate, practical, and culturally appropriate guidance on pregnancy, maternal care, nutrition, infection prevention, and related health topics without promoting misinformation or unsafe advice.
Measures consistency on public-health-relevant questions where incorrect information could contribute to real-world harm, using theme-specific rubrics comparable to the expert evaluation track.
Evaluates multilingual performance on question distributions representative of field deployment contexts, including ASHA/home-visit concerns, healthcare access barriers, and local health practices.

Checklist

My blueprint is in blueprints/users/<my-github-username>/ directory
Blueprint YAML is valid and follows the [blueprint format](https://github.com/weval-org/configs/blob/main/README.md)
Each prompt has a meaningful, descriptive id (e.g., france-capital-test, not p1 or auto-generated)
Blueprint has clear success criteria (should assertions with specific criteria)
I've used $not_* functions instead of should_not blocks where applicable
I've tested the blueprint locally if possible (pnpm cli run <path-to-blueprint>)
I agree to dedicate my contribution to the public domain under CC0 1.0 Universal

Notes

This blueprint expands CivicEval's public-health coverage by focusing on maternal health, a domain where inaccurate or misleading model outputs may have significant real-world consequences. The evaluation is intended to measure factual accuracy, responsible communication, and alignment with established maternal health guidance.

Automated Evaluation: This PR will trigger an automated evaluation with cost-controlled limits (max 10 prompts, CORE models only). Full evaluation runs automatically after merge.

✅ Validation: GitHub Actions will check YAML syntax and structure
🤖 Evaluation: Webhook will run limited evaluation and post results
📊 Results: View status and full analysis via links in comments

…-maternal-health-benchmark.yml on new branch

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

weval-bot · 2026-06-08T14:39:29Z

❌ Blueprint validation failed

blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml: Failed to fetch blueprint content

nojibe · 2026-06-11T23:16:43Z

Hi @Varunrnair — same issue here as on #28: the automated evaluation failed with Failed to fetch blueprint content because this file is ~1.6 MB, and the GitHub Contents API doesn't return inline content for files over 1 MB, so our fetcher receives empty content.

Suggested fix: split the blueprint by language so each file stays well under 1 MB:

blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-en.yml
blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-hi.yml
blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-mr.yml

Each file keeps its own config header (adjust title/description/tags per language). As a bonus, per-language results make the multilingual-parity comparison easier to read on the dashboard.

You can push the split files to this same branch and the PR will update in place. Thanks again! 🙏

feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert…

1eecebf

…-maternal-health-benchmark.yml on new branch

claude Bot reviewed Jun 8, 2026

View reviewed changes

nojibe mentioned this pull request Jun 11, 2026

fix: report clear size-limit errors for oversized blueprints in webhooks weval-org/app#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml#27

feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml#27
Varunrnair wants to merge 1 commit into
weval-org:mainfrom
Varunrnair:proposal/sakhi-non-expert-maternal-health-benchmark-1780928494972

Varunrnair commented Jun 8, 2026

Uh oh!

claude Bot left a comment

Uh oh!

weval-bot Bot commented Jun 8, 2026

Uh oh!

nojibe commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Varunrnair commented Jun 8, 2026

Blueprint Contribution

Blueprint Details

What This Blueprint Tests

Checklist

Notes

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

weval-bot Bot commented Jun 8, 2026

Uh oh!

nojibe commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants