Skip to content

feat: NUMA-aware model sharding package for POWER8 llama.cpp (issue #2277)#1799

Merged
Scottcjn merged 1 commit into
Scottcjn:mainfrom
createkr:feat/issue2277-power8-numa-sharding
Mar 25, 2026
Merged

feat: NUMA-aware model sharding package for POWER8 llama.cpp (issue #2277)#1799
Scottcjn merged 1 commit into
Scottcjn:mainfrom
createkr:feat/issue2277-power8-numa-sharding

Conversation

@createkr
Copy link
Copy Markdown
Contributor

Implements a NUMA-aware sharding package for llama.cpp with layer routing, benchmark harness, reproducible presets, validation reports, and integration docs.\n\nThis submission is the working sharding package intended for the partial payout track while waiting for direct POWER8 hardware benchmark execution.\n\nValidation:\n- artifact package + benchmark harness included\n- tuning presets included\n- integration docs included\n\nCloses #2277

@github-actions github-actions Bot added documentation Improvements or additions to documentation BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) labels Mar 23, 2026
@createkr
Copy link
Copy Markdown
Contributor Author

RTC1d48d848a5aa5ecf2c5f01aa5fb64837daaf2f35

@github-actions github-actions Bot added the size/XL PR: 500+ lines label Mar 23, 2026
@Scottcjn Scottcjn merged commit c296d67 into Scottcjn:main Mar 25, 2026
4 of 6 checks passed
kuanglaodi2-sudo added a commit to kuanglaodi2-sudo/Rustchain that referenced this pull request Mar 26, 2026
…m benchmarks, GGUF analyzer

Enhanced implementation for Scottcjn/rustchain-bounties Scottcjn#2277

New additions:
- ggml_numa_bindings.py: Python ctypes bindings for NUMA sharding API
  - GGMLNUMABindings class with full C API wrapper
  - Pure Python fallbacks when native library unavailable
  - recommend_shard_map() auto-generator for any model/layer count
  - CLI: topology, recommend, analyze commands
- benchmark_numa.ps1: Cross-platform PowerShell benchmark harness
  - Works on Linux, macOS, Windows (PowerShell 7+)
  - Auto-detects NUMA topology and POWER8 architecture
  - Supports compare/baseline/numa modes
- gguf_analyze.py: GGUF model tensor analyzer
  - Parses GGUF magic/version, extracts tensor metadata
  - Per-layer memory footprint analysis
  - Auto-generates NUMA shard recommendations
  - JSON and text output modes
- power8_llama2_70b.json: Preset for LLaMA 2 70B (80 layers, 4-node)
- power8_mixtral_8x7b.json: Preset for Mixtral-8x7B MoE (44 layers)
- Updated README.md and FINAL_SUMMARY.md with new additions

Based on merged PR Scottcjn#1799 (createkr) implementation.

Bounty: Scottcjn/rustchain-bounties Scottcjn#2277
Wallet: C4c7r9WPsnEe6CUfegMU9M7ReHD1pWg8qeSfTBoRcLbg
@Scottcjn
Copy link
Copy Markdown
Owner

Scottcjn commented Apr 2, 2026

Transfer confirmed — this was included in the batch settlement of 1,091 RTC to @createkr's wallet.

Copy link
Copy Markdown
Contributor

@FlintLeng FlintLeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #1799 — Review (bounty #2277):

NUMA-Aware Model Sharding for POWER8 llama.cpp — final summary. Complete bounty submission with implementation details. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) documentation Improvements or additions to documentation size/XL PR: 500+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants