feat: NUMA-Aware Model Sharding v1.1 for POWER8 llama.cpp (#2277)#1862
Closed
kuanglaodi2-sudo wants to merge 1 commit into
Closed
feat: NUMA-Aware Model Sharding v1.1 for POWER8 llama.cpp (#2277)#1862kuanglaodi2-sudo wants to merge 1 commit into
kuanglaodi2-sudo wants to merge 1 commit into
Conversation
…m benchmarks, GGUF analyzer Enhanced implementation for Scottcjn/rustchain-bounties Scottcjn#2277 New additions: - ggml_numa_bindings.py: Python ctypes bindings for NUMA sharding API - GGMLNUMABindings class with full C API wrapper - Pure Python fallbacks when native library unavailable - recommend_shard_map() auto-generator for any model/layer count - CLI: topology, recommend, analyze commands - benchmark_numa.ps1: Cross-platform PowerShell benchmark harness - Works on Linux, macOS, Windows (PowerShell 7+) - Auto-detects NUMA topology and POWER8 architecture - Supports compare/baseline/numa modes - gguf_analyze.py: GGUF model tensor analyzer - Parses GGUF magic/version, extracts tensor metadata - Per-layer memory footprint analysis - Auto-generates NUMA shard recommendations - JSON and text output modes - power8_llama2_70b.json: Preset for LLaMA 2 70B (80 layers, 4-node) - power8_mixtral_8x7b.json: Preset for Mixtral-8x7B MoE (44 layers) - Updated README.md and FINAL_SUMMARY.md with new additions Based on merged PR Scottcjn#1799 (createkr) implementation. Bounty: Scottcjn/rustchain-bounties Scottcjn#2277 Wallet: C4c7r9WPsnEe6CUfegMU9M7ReHD1pWg8qeSfTBoRcLbg
Contributor
|
Welcome to RustChain! Thanks for your first pull request. Before we review, please make sure:
Bounty tiers: Micro (1-10 RTC) | Standard (20-50) | Major (75-100) | Critical (100-150) A maintainer will review your PR soon. Thanks for contributing! |
Owner
|
Closing — PowerShell benchmark script for POWER8/Linux is architecturally wrong (POWER8 runs Linux, not Windows). The ctypes libnuma bindings won't integrate with the actual llama.cpp NUMA code without significant work. If you want to contribute to POWER8, SSH to the real hardware and benchmark there. |
Contributor
Code Review — PR #1862Reviewer: FlintLeng ✅ LGTM— FlintLeng |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NUMA-Aware Model Sharding for POWER8 llama.cpp — Enhancement PR
Bounty: #2277 | Reward: 250 RTC | Wallet: C4c7r9WPsnEe6CUfegMU9M7ReHD1pWg8qeSfTBoRcLbg
What Was Implemented
This PR enhances the NUMA-aware model sharding implementation for IBM POWER8 S824 systems, building on the foundation of PR #1799 (merged). The original implementation is in
numa_sharding/.New Additions (v1.1)
1. Python Bindings —
src/ggml_numa_bindings.pyPython ctypes bindings for the C NUMA sharding API. Provides:
GGMLNUMABindingsclass with full C API wrapperget_numa_topology()— detect system NUMA configurationrecommend_shard_map(layers, nodes)— auto-generate optimal shard map for any modelanalyze_model_tensors()— group tensors by NUMA nodetopology,recommend,analyze2. Cross-Platform PowerShell Benchmark —
benchmarks/benchmark_numa.ps1PowerShell 7+ benchmark harness that works on Linux, macOS, and Windows:
compare(baseline vs NUMA),baseline,numa3. GGUF Model Analyzer —
scripts/gguf_analyze.pyAnalyzes GGUF model files and generates optimal NUMA shard recommendations:
GGML_NUMA_SHARD_MAPfor any modelpython scripts/gguf_analyze.py --model model.gguf --json # Outputs: per-layer breakdown + recommended NUMA map4. Model-Specific Presets
power8_llama2_70b.jsonpower8_mixtral_8x7b.jsonCore Implementation (from PR #1799)
The core NUMA sharding implementation (already merged in main) includes:
src/ggml-numa-shard.h— Header-only API with GGUF tensor parsingsrc/ggml-numa-shard.c— Extended C implementation with statisticsbenchmarks/benchmark_numa.sh— Bash benchmark scriptbenchmarks/compare_results.py— Result analysispresets/power8_s824.json— POWER8 S824 optimal presetpresets/power8_default.json— Generic POWER8 presetpresets/dual_socket_x86.json— x86 dual-socket presetdocs/— Architecture, integration, troubleshooting docsreports/— Validation and performance analysisConfiguration
Expected Performance
Files Changed
Bounty: #2277 | Reward: 250 RTC | Wallet: C4c7r9WPsnEe6CUfegMU9M7ReHD1pWg8qeSfTBoRcLbg