Scottcjn · kuanglaodi2-sudo · Mar 26, 2026
diff --git a/numa_sharding/FINAL_SUMMARY.md b/numa_sharding/FINAL_SUMMARY.md
@@ -357,7 +357,90 @@ The NUMA-aware model sharding implementation for POWER8 llama.cpp is complete an
 
 ---
 
-*Final Summary Version: 1.0.0*  
-*Date: 2026-03-23*  
+*Final Summary Version: 1.1.0*  
+*Date: 2026-03-26*  
 *Bounty: Scottcjn/rustchain-bounties #2277*  
 *Status: Ready for Hardware Validation*
+
+---
+
+## Enhanced Additions (v1.1.0)
+
+### New Files Added
+
+| File | Purpose |
+|------|---------|
+| `src/ggml_numa_bindings.py` | Python ctypes bindings for the C library |
+| `benchmarks/benchmark_numa.ps1` | Cross-platform PowerShell benchmark script |
+| `scripts/gguf_analyze.py` | GGUF model tensor analyzer |
+| `presets/power8_llama2_70b.json` | LLaMA 2 70B preset (80 layers) |
+| `presets/power8_mixtral_8x7b.json` | Mixtral-8x7B MoE preset |
+
+### Python Bindings (`ggml_numa_bindings.py`)
+
+Provides Python-level access to NUMA sharding:
+- `GGMLNUMABindings` class with full C API wrapper
+- Pure Python fallbacks when native library unavailable
+- `get_numa_topology()` - detect system NUMA config
+- `recommend_shard_map(layers, nodes)` - auto-generate optimal map
+- `analyze_model_tensors()` - group tensors by NUMA node
+- CLI: `python ggml_numa_bindings.py topology|recommend|analyze`
+
+```python
+from ggml_numa_bindings import GGMLNUMABindings, recommend_shard_map
+
+numa = GGMLNUMABindings()
+numa.init("0-8:1,9-20:3,21-31:2")
+node = numa.assign_tensor("blk.15.attn_q.weight")
+# → 3 (Node 3, attention layer)
+
+# Auto-generate for any model
+shard_map = recommend_shard_map(num_layers=80, num_nodes=4)
+# → "0-20:1,21-53:3,54-79:2"
+```
+
+### PowerShell Benchmark (`benchmark_numa.ps1`)
+
+Cross-platform benchmark harness:
+- Works on Linux, macOS, and Windows (PowerShell 7+)
+- Supports `compare`, `baseline`, and `numa` modes
+- JSON output parsing with fallback grep
+- NUMA topology auto-detection
+- POWER8-aware defaults
+
+```powershell
+# Windows/Linux/macOS
+pwsh benchmark_numa.ps1 -ModelPath model.gguf -Mode compare -Threads 64
+
+# NUMA-sharded only
+pwsh benchmark_numa.ps1 -ModelPath model.gguf -Mode numa -NUMAConfig "0-8:1,9-20:3,21-31:2"
+```
+
+### GGUF Model Analyzer (`scripts/gguf_analyze.py`)
+
+Analyzes GGUF model files to generate optimal shard maps:
+- GGUF magic/version detection
+- Tensor name extraction from binary
+- Per-layer memory footprint estimation
+- Auto-generates NUMA shard recommendations
+- JSON and text output modes
+
+```bash
+python scripts/gguf_analyze.py --model tinyllama-1.1b.Q4_K_M.gguf --json
+# Output: recommended NUMA map + per-layer breakdown
+
+python scripts/gguf_analyze.py --model llama-2-70b.Q4_K_M.gguf
+# Output: text report with layer groupings for 80-layer model
+```
+
+### Model-Specific Presets
+
+**`power8_llama2_70b.json`** - LLaMA 2 70B (80 layers):
+- Node 1: Layers 0-20 (early/embedding)
+- Node 3: Layers 21-53 (attention-heavy middle)
+- Node 2: Layers 54-79 (FFN-heavy late)
+
+**`power8_mixtral_8x7b.json`** - Mixtral MoE (44 layers):
+- Handles expert routing awareness across nodes
+- MoE-specific routing strategy documentation
+- Expert distribution across all 4 nodes
diff --git a/numa_sharding/README.md b/numa_sharding/README.md
@@ -253,15 +253,21 @@ void *ptr = GGML_NUMA_MALLOC(size, node);
 numa_sharding/
 ├── src/
 │   ├── ggml-numa-shard.h      # Header-only API (main deliverable)
-│   └── ggml-numa-shard.c      # Extended implementation
+│   ├── ggml-numa-shard.c      # Extended C implementation
+│   └── ggml_numa_bindings.py  # Python ctypes bindings (NEW)
 ├── benchmarks/
-│   ├── benchmark_numa.sh      # Automated benchmark script
+│   ├── benchmark_numa.sh      # Automated benchmark script (bash)
+│   ├── benchmark_numa.ps1     # Cross-platform PowerShell benchmark (NEW)
 │   ├── compare_results.py     # Result analysis script
 │   └── expected_results.json  # Expected baseline numbers
 ├── presets/
 │   ├── power8_s824.json       # POWER8 S824 tuning preset
 │   ├── power8_default.json    # Generic POWER8 preset
+│   ├── power8_llama2_70b.json # LLaMA 2 70B preset (NEW)
+│   ├── power8_mixtral_8x7b.json # Mixtral MOE preset (NEW)
 │   └── dual_socket_x86.json   # x86 dual-socket preset
+├── scripts/
+│   └── gguf_analyze.py        # GGUF model tensor analyzer (NEW)
 ├── reports/
 │   ├── validation_report.md   # Validation results
 │   └── performance_analysis.md # Detailed performance analysis
@@ -341,6 +347,93 @@ This implementation is provided as part of the rustchain-bounties program.
 
 ---
 
-**Version:** 1.0.0  
-**Date:** 2026-03-23  
+**Version:** 1.1.0  
+**Date:** 2026-03-26  
 **Bounty:** Scottcjn/rustchain-bounties #2277
+
+---
+
+## Python Integration
+
+### Python Bindings (`src/ggml_numa_bindings.py`)
+
+Python ctypes bindings for NUMA sharding:
+
+```python
+from ggml_numa_bindings import GGMLNUMABindings, recommend_shard_map, get_numa_topology
+
+# Detect system topology
+topo = get_numa_topology()
+print(f"NUMA nodes: {topo['num_nodes']}")  # → 4
+
+# Auto-generate shard map for any model
+shard_map = recommend_shard_map(num_layers=80, num_nodes=4)
+# → "0-20:1,21-53:3,54-79:2"
+
+# Use bindings
+numa = GGMLNUMABindings()
+numa.init(shard_map)
+node = numa.assign_tensor("blk.15.attn_q.weight", layer_idx=15)
+# → 3
+numa.bind(addr=0x1000, size=4096, node=node)
+numa.cleanup()
+```
+
+CLI:
+```bash
+python src/ggml_numa_bindings.py topology
+python src/ggml_numa_bindings.py recommend --layers 32 --nodes 4
+```
+
+---
+
+## GGUF Model Analyzer
+
+Analyze any GGUF model to generate optimal NUMA shard maps:
+
+```bash
+# Text report
+python scripts/gguf_analyze.py --model model.gguf
+
+# JSON output
+python scripts/gguf_analyze.py --model model.gguf --json
+
+# With verbose
+python scripts/gguf_analyze.py --model model.gguf --verbose
+```
+
+Output includes per-layer memory footprint, tensor type distribution, and recommended NUMA node assignments.
+
+---
+
+## Cross-Platform Benchmark
+
+### PowerShell (Windows/Linux/macOS)
+
+```powershell
+pwsh benchmarks/benchmark_numa.ps1 -ModelPath model.gguf -Mode compare
+pwsh benchmarks/benchmark_numa.ps1 -ModelPath model.gguf -Mode numa -NUMAConfig "0-8:1,9-20:3,21-31:2"
+```
+
+### Shell (Linux/macOS)
+
+```bash
+bash benchmarks/benchmark_numa.sh -m model.gguf --compare
+```
+
+---
+
+## Model-Specific Presets
+
+Additional presets beyond v1.0:
+
+| Preset | Model | Layers | Nodes |
+|--------|-------|--------|-------|
+| `power8_llama2_70b.json` | LLaMA 2 70B | 80 | 4 |
+| `power8_mixtral_8x7b.json` | Mixtral-8x7B MoE | 44 | 4 |
+
+Load a preset:
+```bash
+export GGML_NUMA_SHARD_MAP=$(jq -r '.numa_shard_config.value' \
+    presets/power8_llama2_70b.json)
+```