Skip to content

feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml#27

Open
Varunrnair wants to merge 1 commit into
weval-org:mainfrom
Varunrnair:proposal/sakhi-non-expert-maternal-health-benchmark-1780928494972
Open

feat(blueprints): create blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml#27
Varunrnair wants to merge 1 commit into
weval-org:mainfrom
Varunrnair:proposal/sakhi-non-expert-maternal-health-benchmark-1780928494972

Conversation

@Varunrnair

Copy link
Copy Markdown
Contributor

Blueprint Contribution

Blueprint Details

  • Blueprint ID: sakhi-non-expert-maternal-health-benchmark
  • Category/Focus: public-health, maternal-health, healthcare-safety
  • Models to test: CORE

What This Blueprint Tests

  • Evaluates model understanding of maternal health topics using 231 community-sourced questions, with reference answers reviewed by healthcare nonprofit staff and ASHA workers, across English, Hindi, and Marathi (693 evaluation cases total).
  • Tests whether responses align with deployment-oriented reference answers that reflect how maternal health questions are asked and answered in community health settings, rather than only textbook or clinical formulations.
  • Assesses the model's ability to provide accurate, practical, and culturally appropriate guidance on pregnancy, maternal care, nutrition, infection prevention, and related health topics without promoting misinformation or unsafe advice.
  • Measures consistency on public-health-relevant questions where incorrect information could contribute to real-world harm, using theme-specific rubrics comparable to the expert evaluation track.
  • Evaluates multilingual performance on question distributions representative of field deployment contexts, including ASHA/home-visit concerns, healthcare access barriers, and local health practices.

Checklist

  • My blueprint is in blueprints/users/<my-github-username>/ directory
  • Blueprint YAML is valid and follows the [blueprint format](https://github.com/weval-org/configs/blob/main/README.md)
  • Each prompt has a meaningful, descriptive id (e.g., france-capital-test, not p1 or auto-generated)
  • Blueprint has clear success criteria (should assertions with specific criteria)
  • I've used $not_* functions instead of should_not blocks where applicable
  • I've tested the blueprint locally if possible (pnpm cli run <path-to-blueprint>)
  • I agree to dedicate my contribution to the public domain under CC0 1.0 Universal

Notes

This blueprint expands CivicEval's public-health coverage by focusing on maternal health, a domain where inaccurate or misleading model outputs may have significant real-world consequences. The evaluation is intended to measure factual accuracy, responsible communication, and alignment with established maternal health guidance.


Automated Evaluation: This PR will trigger an automated evaluation with cost-controlled limits (max 10 prompts, CORE models only). Full evaluation runs automatically after merge.

  • Validation: GitHub Actions will check YAML syntax and structure
  • 🤖 Evaluation: Webhook will run limited evaluation and post results
  • 📊 Results: View status and full analysis via links in comments

…-maternal-health-benchmark.yml on new branch

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@weval-bot

weval-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

Blueprint validation failed

  • blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-benchmark.yml: Failed to fetch blueprint content

@nojibe

nojibe commented Jun 11, 2026

Copy link
Copy Markdown

Hi @Varunrnair — same issue here as on #28: the automated evaluation failed with Failed to fetch blueprint content because this file is ~1.6 MB, and the GitHub Contents API doesn't return inline content for files over 1 MB, so our fetcher receives empty content.

Suggested fix: split the blueprint by language so each file stays well under 1 MB:

  • blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-en.yml
  • blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-hi.yml
  • blueprints/users/Varunrnair/sakhi-non-expert-maternal-health-mr.yml

Each file keeps its own config header (adjust title/description/tags per language). As a bonus, per-language results make the multilingual-parity comparison easier to read on the dashboard.

You can push the split files to this same branch and the PR will update in place. Thanks again! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants