d-o-hub · d-o-hub · May 13, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
@@ -1,6 +1,6 @@
 ---
 name: benchmarking-perf
-description: "Run and analyze criterion benchmarks for performance-sensitive changes. Use when optimizing hot paths, validating perf targets, or comparing baselines. Covers SIMD, connection pooling, batch APIs, and caching."
+description: "Run and analyze criterion benchmarks for performance-sensitive changes. Use when optimizing hot paths, validating perf targets, or comparing baselines."
 ---
 
 # Benchmarking & Performance
@@ -26,72 +26,48 @@ description: "Run and analyze criterion benchmarks for performance-sensitive cha
 | 10,000 | ~1.6ms |
 | 50,000 | ~3.7ms |
 
+Excellent scalability: 500x more concepts only 3x slower.
+
 ## Workflow
 
 ### 1. Save Baseline Before Changes
 ```bash
 export CARGO_TERM_PROGRESS_WHEN=never
-cargo bench --bench benchmark -- --save-baseline before
-```
-
-### 2. Compare Against Baseline
-```bash
-cargo bench --bench benchmark -- --baseline before
-```
-
-### 3. Interpret Results
-- Green = faster, Red = slower
-- Changes > 5% in hot paths require investigation
-
-## SIMD Optimization
-
-```rust
-#[cfg(feature = "simd")]
-use std::simd::u128x2;
-
-pub fn cosine_similarity_simd(&self, other: &Self) -> f32 {
-    // Use u128x2 for parallel operations
-    // Fall back to scalar for WASM/non-SIMD targets
-}
+cargo bench --bench benchmark -- --save-baseline before 2>&1 | grep -E "(^test |time:|Benchmark|bench_|[0-9]+\.[0-9]+ µs)" | tail -30
 ```
 
-Always provide scalar fallback for non-SIMD targets. Gate with feature flags.
-
-## Connection Pooling
-
-Use `deadpool` for async connection pooling, gated for remote Turso only.
-Keep per-operation model for local SQLite.
+### 2. Make Changes
 
-## Batch API Pattern
-
-```rust
-pub async fn inject_concepts(&self, concepts: &[(String, HVec10240)]) -> Result<()> {
-    // Validate all inputs first → Batch insert → Single transaction for DB
-}
+### 3. Compare Against Baseline
+```bash
+export CARGO_TERM_PROGRESS_WHEN=never
+cargo bench --bench benchmark -- --baseline before 2>&1 | grep -E "(time:|Benchmark|bench_|change:)" | tail -30
 ```
 
-## Caching Pattern
-- Prefer cached values as `Arc<[T]>` for cheap `Arc::clone` hits
-- Avoid keying caches via temporary `Vec` materializations; hash fixed-size words/arrays directly
-- Cache hit rate target: >80% for repeated access patterns
+### 4. Interpret Results
+- Look for `time: [lower upper]` in criterion output.
+- Green = faster, Red = slower.
+- Changes > 5% in hot paths require investigation.
 
 ## Adding a New Benchmark
 Edit `benches/benchmark.rs`. Follow existing patterns:
 ```rust
 fn bench_my_operation(c: &mut Criterion) {
+    // Setup outside the closure
     let data = prepare_data();
+
     c.bench_function("my_operation", |b| {
-        b.iter(|| my_operation(black_box(&data)))
+        b.iter(|| {
+            // Only the measured code here
+            black_box(my_operation(black_box(&data)))
+        })
     });
 }
 ```
 Add to `criterion_group!` at the bottom.
 
 ## Gotchas
-- Never `--baseline` without first `--save-baseline` with the same name
-- Don't capture mutable state by reference in criterion closures
-- Use `black_box()` on inputs AND outputs to prevent dead-code elimination
-- Reservoir benchmarks use `new_seeded(..., 42)` for reproducibility
-
-## LOC Constraint
-All files must remain ≤ 500 lines. Refactor to new modules if needed.
+- Never `--baseline` without first `--save-baseline` with the same name.
+- Don't capture mutable state by reference in criterion closures.
+- Use `black_box()` on inputs AND outputs to prevent dead-code elimination.
+- Reservoir benchmarks use `new_seeded(..., 42)` for reproducibility.
@@ -0,0 +1,149 @@
+---
+name: iterative-refinement
+description: "Test-fix-validate loops for complex changes: run tests, identify failures, fix, validate, repeat until green. Use for red-green-refactor cycles."
+---
+
+# Iterative Refinement
+
+Test-fix-validate loops for complex changes requiring multiple iterations.
+
+## When to Use
+
+- Complex changes spanning multiple files
+- Refactoring with test coverage
+- New features requiring TDD approach
+- Bug fixes needing verification
+
+## Do NOT Use
+
+- Single-file simple changes
+- Documentation-only updates
+- Changes without existing test coverage
+
+## Process
+
+```
+┌─────────────────────────────────────────────────┐
+│  RED:    Run tests, identify failures           │
+│  GREEN:  Apply minimal fix to pass              │
+│  REFACTOR: Optimize while keeping green         │
+│  REPEAT until coverage passes and clean         │
+└─────────────────────────────────────────────────┘
+```
+
+## Loop Parameters
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `MAX_ITERATIONS` | 10 | Maximum refinement cycles |
+| `COVERAGE_THRESHOLD` | 80% | Minimum coverage to accept |
+| `BENCH_BASELINE` | main | Branch to compare benchmarks |
+
+## Step-by-Step Execution
+
+### Phase 1: RED - Identify Failures
+
+```bash
+# Run full test suite
+cargo test --all-features --quiet 2>&1 | tee test-output.txt
+
+# Parse failures
+grep -E "^test .* FAILED" test-output.txt
+
+# For specific test debugging
+cargo test --test <test_name> -- --nocapture
+```
+
+### Phase 2: GREEN - Apply Fix
+
+1. Analyze failure output
+2. Identify root cause (not symptom)
+3. Apply minimal fix to make test pass
+4. Do NOT refactor yet
+
+```bash
+# Quick validation
+cargo check --message-format=short
+cargo test --all-features --quiet
+```
+
+### Phase 3: REFACTOR - Optimize
+
+Once tests pass:
+1. Identify code smells
+2. Apply refactoring patterns
+3. Run tests after each change
+4. Verify no regression
+
+### Phase 4: VALIDATE
+
+```bash
+# Full gate sequence
+./scripts/validate.sh
+
+# Coverage check (if configured)
+cargo tarpaulin --all-features --out Stdout
+
+# Benchmark comparison
+cargo bench --bench benchmark -- --baseline main
+```
+
+## Failure Categories
+
+| Category | Pattern | Approach |
+|----------|---------|----------|
+| **Logic error** | Assertion mismatch | Fix implementation logic |
+| **Type error** | Compilation failure | Fix types, add conversions |
+| **Timeout** | Test exceeds limit | Optimize algorithm |
+| **Race condition** | Flaky test | Add synchronization |
+| **Environment** | Missing dependency | Fix setup, add config |
+
+## Iteration Tracking
+
+Track each iteration:
+
+| Iteration | Phase | Action | Result |
+|-----------|-------|--------|--------|
+| 1 | RED | Run tests | 3 failures |
+| 1 | GREEN | Fix imports | Passes |
+| 2 | REFACTOR | Extract function | Passes |
+| 3 | VALIDATE | Coverage check | 85% covered |
+
+## Exit Criteria
+
+Stop when ALL conditions are met:
+- [ ] All tests pass
+- [ ] Coverage meets threshold
+- [ ] No clippy warnings
+- [ ] No performance regression (>10%)
+- [ ] LOC gates satisfied
+
+## Example Workflow
+
+```
+User: "Refactor the reservoir module"
+
+Agent: [Iterative Refinement activated]
+Iteration 1:
+  RED: cargo test → 2 failures in reservoir
+  GREEN: Fix borrow checker issue
+  Result: Tests pass
+
+Iteration 2:
+  REFACTOR: Extract metrics to separate file
+  RED: cargo test → 0 failures
+  Result: Still green
+
+Iteration 3:
+  VALIDATE: ./scripts/validate.sh
+  Result: All gates pass, 92% coverage
+
+Final: Refactoring complete after 3 iterations
+```
+
+## Anti-Patterns to Avoid
+
+- Writing tests to pass buggy code
+- Skipping failing tests instead of fixing
+- Large refactors without incremental commits
+- Ignoring performance regressions