Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 23 additions & 47 deletions .agents/skills/benchmarking-perf/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: benchmarking-perf
description: "Run and analyze criterion benchmarks for performance-sensitive changes. Use when optimizing hot paths, validating perf targets, or comparing baselines. Covers SIMD, connection pooling, batch APIs, and caching."
description: "Run and analyze criterion benchmarks for performance-sensitive changes. Use when optimizing hot paths, validating perf targets, or comparing baselines."
---

# Benchmarking & Performance
Expand All @@ -26,72 +26,48 @@ description: "Run and analyze criterion benchmarks for performance-sensitive cha
| 10,000 | ~1.6ms |
| 50,000 | ~3.7ms |

Excellent scalability: 500x more concepts only 3x slower.

## Workflow

### 1. Save Baseline Before Changes
```bash
export CARGO_TERM_PROGRESS_WHEN=never
cargo bench --bench benchmark -- --save-baseline before
```

### 2. Compare Against Baseline
```bash
cargo bench --bench benchmark -- --baseline before
```

### 3. Interpret Results
- Green = faster, Red = slower
- Changes > 5% in hot paths require investigation

## SIMD Optimization

```rust
#[cfg(feature = "simd")]
use std::simd::u128x2;

pub fn cosine_similarity_simd(&self, other: &Self) -> f32 {
// Use u128x2 for parallel operations
// Fall back to scalar for WASM/non-SIMD targets
}
cargo bench --bench benchmark -- --save-baseline before 2>&1 | grep -E "(^test |time:|Benchmark|bench_|[0-9]+\.[0-9]+ µs)" | tail -30
```

Always provide scalar fallback for non-SIMD targets. Gate with feature flags.

## Connection Pooling

Use `deadpool` for async connection pooling, gated for remote Turso only.
Keep per-operation model for local SQLite.
### 2. Make Changes

## Batch API Pattern

```rust
pub async fn inject_concepts(&self, concepts: &[(String, HVec10240)]) -> Result<()> {
// Validate all inputs first → Batch insert → Single transaction for DB
}
### 3. Compare Against Baseline
```bash
export CARGO_TERM_PROGRESS_WHEN=never
cargo bench --bench benchmark -- --baseline before 2>&1 | grep -E "(time:|Benchmark|bench_|change:)" | tail -30
```

## Caching Pattern
- Prefer cached values as `Arc<[T]>` for cheap `Arc::clone` hits
- Avoid keying caches via temporary `Vec` materializations; hash fixed-size words/arrays directly
- Cache hit rate target: >80% for repeated access patterns
### 4. Interpret Results
- Look for `time: [lower upper]` in criterion output.
- Green = faster, Red = slower.
- Changes > 5% in hot paths require investigation.

## Adding a New Benchmark
Edit `benches/benchmark.rs`. Follow existing patterns:
```rust
fn bench_my_operation(c: &mut Criterion) {
// Setup outside the closure
let data = prepare_data();

c.bench_function("my_operation", |b| {
b.iter(|| my_operation(black_box(&data)))
b.iter(|| {
// Only the measured code here
black_box(my_operation(black_box(&data)))
})
});
}
```
Add to `criterion_group!` at the bottom.

## Gotchas
- Never `--baseline` without first `--save-baseline` with the same name
- Don't capture mutable state by reference in criterion closures
- Use `black_box()` on inputs AND outputs to prevent dead-code elimination
- Reservoir benchmarks use `new_seeded(..., 42)` for reproducibility

## LOC Constraint
All files must remain ≤ 500 lines. Refactor to new modules if needed.
- Never `--baseline` without first `--save-baseline` with the same name.
- Don't capture mutable state by reference in criterion closures.
- Use `black_box()` on inputs AND outputs to prevent dead-code elimination.
- Reservoir benchmarks use `new_seeded(..., 42)` for reproducibility.
149 changes: 149 additions & 0 deletions .agents/skills/iterative-refinement/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
name: iterative-refinement
description: "Test-fix-validate loops for complex changes: run tests, identify failures, fix, validate, repeat until green. Use for red-green-refactor cycles."
---

# Iterative Refinement

Test-fix-validate loops for complex changes requiring multiple iterations.

## When to Use

- Complex changes spanning multiple files
- Refactoring with test coverage
- New features requiring TDD approach
- Bug fixes needing verification

## Do NOT Use

- Single-file simple changes
- Documentation-only updates
- Changes without existing test coverage

## Process

```
┌─────────────────────────────────────────────────┐
│ RED: Run tests, identify failures │
│ GREEN: Apply minimal fix to pass │
│ REFACTOR: Optimize while keeping green │
│ REPEAT until coverage passes and clean │
└─────────────────────────────────────────────────┘
```
Comment thread
d-o-hub marked this conversation as resolved.

## Loop Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `MAX_ITERATIONS` | 10 | Maximum refinement cycles |
| `COVERAGE_THRESHOLD` | 80% | Minimum coverage to accept |
| `BENCH_BASELINE` | main | Branch to compare benchmarks |

## Step-by-Step Execution

### Phase 1: RED - Identify Failures

```bash
# Run full test suite
cargo test --all-features --quiet 2>&1 | tee test-output.txt

# Parse failures
grep -E "^test .* FAILED" test-output.txt

# For specific test debugging
cargo test --test <test_name> -- --nocapture
```

### Phase 2: GREEN - Apply Fix

1. Analyze failure output
2. Identify root cause (not symptom)
3. Apply minimal fix to make test pass
4. Do NOT refactor yet

```bash
# Quick validation
cargo check --message-format=short
cargo test --all-features --quiet
```

### Phase 3: REFACTOR - Optimize

Once tests pass:
1. Identify code smells
2. Apply refactoring patterns
3. Run tests after each change
4. Verify no regression

### Phase 4: VALIDATE

```bash
# Full gate sequence
./scripts/validate.sh

# Coverage check (if configured)
cargo tarpaulin --all-features --out Stdout

# Benchmark comparison
cargo bench --bench benchmark -- --baseline main
```

## Failure Categories

| Category | Pattern | Approach |
|----------|---------|----------|
| **Logic error** | Assertion mismatch | Fix implementation logic |
| **Type error** | Compilation failure | Fix types, add conversions |
| **Timeout** | Test exceeds limit | Optimize algorithm |
| **Race condition** | Flaky test | Add synchronization |
| **Environment** | Missing dependency | Fix setup, add config |

## Iteration Tracking

Track each iteration:

| Iteration | Phase | Action | Result |
|-----------|-------|--------|--------|
| 1 | RED | Run tests | 3 failures |
| 1 | GREEN | Fix imports | Passes |
| 2 | REFACTOR | Extract function | Passes |
| 3 | VALIDATE | Coverage check | 85% covered |

## Exit Criteria

Stop when ALL conditions are met:
- [ ] All tests pass
- [ ] Coverage meets threshold
- [ ] No clippy warnings
- [ ] No performance regression (>10%)
- [ ] LOC gates satisfied

## Example Workflow

```
User: "Refactor the reservoir module"

Agent: [Iterative Refinement activated]
Iteration 1:
RED: cargo test → 2 failures in reservoir
GREEN: Fix borrow checker issue
Result: Tests pass

Iteration 2:
REFACTOR: Extract metrics to separate file
RED: cargo test → 0 failures
Result: Still green

Iteration 3:
VALIDATE: ./scripts/validate.sh
Result: All gates pass, 92% coverage

Final: Refactoring complete after 3 iterations
```

## Anti-Patterns to Avoid

- Writing tests to pass buggy code
- Skipping failing tests instead of fixing
- Large refactors without incremental commits
- Ignoring performance regressions
Loading
Loading