feat: NUMA-Aware Model Sharding for POWER8 llama.cpp — Bounty #2277#1976
Conversation
Add configurable personality engine for BoTTube AI streaming agents.
- tools/bottube_personality.py (~240 lines)
- PersonalityEngine class with 6 traits: humor, formality, verbosity,
enthusiasm, sarcasm, empathy (each 0-1)
- 5 presets: professor, comedian, supportive, edgy, zen
- load_personality(config_dict) with preset + per-trait overrides
- style_text(), generate_greeting(), generate_sign_off()
- react_to_comment() with sentiment detection
- get_mood() / mood_shift() with SQLite persistence
- Supports :memory: DB for testing
- tools/bottube_personality_demo.py (~60 lines)
- Demos all 5 presets, mood shifts, and custom trait overrides
- tests/test_personality.py (~80 lines)
- 28 pytest tests — all passing
Closes #2284
- ggml-numa-shard.h: header-only C library, mbind() tensor placement - numa_shard_bench.c: per-node bandwidth benchmark, flat vs sharded - numa_shard_config.py: auto-generate optimal GGML_NUMA_SHARD_MAP - README_NUMA.md: POWER8 topology docs, build instructions, perf data
|
Welcome to RustChain! Thanks for your first pull request. Before we review, please make sure:
Bounty tiers: Micro (1-10 RTC) | Standard (20-50) | Major (75-100) | Critical (100-150) A maintainer will review your PR soon. Thanks for contributing! |
|
Payment confirmed — LaphoqueRC was paid via on-chain RTC transfer as part of a batch settlement. Total paid to date: 2,155 RTC across all contributions. Thank you for the work. |
Code Review — PR #1976Reviewer: FlintLeng ✅ LGTMApproved. |
FlintLeng
left a comment
There was a problem hiding this comment.
PR #1976 — Review:
NUMA-Aware Model Sharding for POWER8 — per-layer NUMA placement for llama.cpp tensor memory on multi-socket systems. Instead of flat mmap(), this pins transformer layers to specific NUMA nodes based on access patterns and measured bandwidth. Significant performance optimization for POWER8 mining nodes. ✅
NUMA-Aware Model Sharding — Bounty #2277 (250 RTC)
Problem
POWER8 S824 with 512GB across 4 NUMA nodes — llama.cpp flat mmap scatters pages randomly. Node 0 is 2x slower than Node 3 (220 vs 420 MB/s). Hot layers randomly landing on slow nodes wastes 30-40% bandwidth.
Solution
Header-only C library that parses GGUF tensor names, assigns layers to NUMA nodes via
mbind(2), and pins memory to fast nodes.Deliverables
llm/ggml-numa-shard.hllm/numa_shard_bench.cllm/numa_shard_config.pyllm/README_NUMA.mdConfig
Key Design Decisions
#ifdef __linux__+#ifdef __powerpc__guards — compiles cleanly on x86Expected Performance
~1.3-1.5x pp512 throughput improvement by preventing hot tensors on Node 0.
Closes #2277
RTC Wallet:
RTC2fe3c33c77666ff76a1cd0999fd4466ee81250ff