MaxGhenis · MaxGhenis · May 30, 2026 · May 30, 2026 · May 30, 2026 · May 30, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -8,7 +8,7 @@
   },
   "plugins": [
     {
-      "name": "farness",
+      "name": "brier",
       "description": "Decision-making framework that reframes subjective questions as forecasting problems with explicit KPIs, option expansion, and calibration tracking",
       "version": "0.1.0",
       "author": {

diff --git a/.claude/skills/farness/SKILL.md → .claude/skills/brier/SKILL.md b/.claude/skills/farness/SKILL.md → .claude/skills/brier/SKILL.md
@@ -1,13 +1,13 @@
 ---
-name: farness
-description: Use when the user wants advice or a decision recommendation rather than direct implementation, especially for prompts like "should I", "should we", "which is better", "is it worth it", or "what would you do" about architecture, product, hiring, strategy, or career choices. Prefer the local farness MCP server when available and structure the answer around KPI, option expansion, reference class, disconfirming evidence, numeric forecasts, and a review date.
+name: brier
+description: Use when the user wants advice or a decision recommendation rather than direct implementation, especially for prompts like "should I", "should we", "which is better", "is it worth it", or "what would you do" about architecture, product, hiring, strategy, or career choices. Prefer the local brier MCP server when available and structure the answer around KPI, option expansion, reference class, disconfirming evidence, numeric forecasts, and a review date.
 ---
 
-# Farness
+# Brier
 
 Use this skill to turn vague decisions into forecastable choices.
 
-Prefer the local `farness` MCP server when it is connected.
+Prefer the local `brier` MCP server when it is connected.
 
 ## Workflow
 
@@ -43,8 +43,8 @@ Prefer the local `farness` MCP server when it is connected.
 
 ## Setup
 
-If the `farness` MCP server is not connected, add it with:
+If the `brier` MCP server is not connected, add it with:
 
 ```bash
-farness setup claude
+brier setup claude
 ```
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Project Overview
 
-Farness is a decision-making framework that reframes subjective questions ("Should I...?") into forecasting problems with explicit KPIs, confidence intervals, and calibration tracking. The core thesis: making numeric predictions forces mechanism thinking, creates accountability, and reduces sycophancy.
+Brier is a decision-making framework that reframes subjective questions ("Should I...?") into forecasting problems with explicit KPIs, confidence intervals, and calibration tracking. The core thesis: making numeric predictions forces mechanism thinking, creates accountability, and reduces sycophancy.
 
 ## Commands
 
@@ -21,24 +21,24 @@ pytest
 pytest tests/test_framework.py
 
 # Run with coverage
-pytest --cov=farness
+pytest --cov=brier
 
 # Format code
-black farness tests
-ruff check farness tests
+black brier tests
+ruff check brier tests
 ```
 
 ### CLI
 
 ```bash
-farness new "question"    # Create a new decision
-farness new "q" --context "details"  # With context
-farness list              # List all decisions
-farness list --pending    # Decisions past review date
-farness show <id>         # Show decision details (supports prefix match)
-farness score [id]        # Score a decision's actual outcomes (interactive)
-farness calibration       # Show calibration statistics
-farness pending           # Alias for list --pending
+brier new "question"    # Create a new decision
+brier new "q" --context "details"  # With context
+brier list              # List all decisions
+brier list --pending    # Decisions past review date
+brier show <id>         # Show decision details (supports prefix match)
+brier score [id]        # Score a decision's actual outcomes (interactive)
+brier calibration       # Show calibration statistics
+brier pending           # Alias for list --pending
 ```
 
 ### Site (Next.js)
@@ -57,15 +57,15 @@ bun run test     # Run vitest tests
 python3 paper/render_paper.py  # Generate figures, render HTML, sync preemptive_rigor.md and site/public/paper-raw
 python3 paper/run_strongest_validation.py  # Strongest reviewer-facing validation across Claude Opus 4.6 and GPT-5.2
 python3 paper/run_study1_rerun.py --models gpt-5.4  # Original Study 1 rerun with legacy prompt wording
-python3 -m farness.experiments stability --strongest-validation --model gpt-5.2  # Single-model strongest validation
+python3 -m brier.experiments stability --strongest-validation --model gpt-5.2  # Single-model strongest validation
 ```
 
 ## Architecture
 
-### Python Package (`farness/`)
+### Python Package (`brier/`)
 
 - **framework.py**: Core dataclasses (`Decision`, `KPI`, `Option`, `Forecast`) with serialization. `Option.expected_value()` computes weighted expected values across KPIs. `Decision.best_option()` and `sensitivity_analysis()` for analysis.
-- **storage.py**: `DecisionStore` persists decisions to `~/.farness/decisions.jsonl` in JSONL format. Supports CRUD and filtered queries (unscored, pending review, scored).
+- **storage.py**: `DecisionStore` persists decisions to `~/.brier/decisions.jsonl` in JSONL format. Supports CRUD and filtered queries (unscored, pending review, scored).
 - **calibration.py**: `CalibrationTracker` computes forecast accuracy metrics: coverage (% of actuals in CIs), calibration error (coverage vs stated confidence), MAE, MRE, Brier scores.
 - **cli.py**: Argparse CLI wrapping the above modules.
 

diff --git a/README.md b/README.md
@@ -1,8 +1,8 @@
-# Farness
+# Brier
 
 **Forecasting as a harness for decision-making.**
 
-Instead of asking "Is X good?" or "Should I do Y?", farness helps you:
+Instead of asking "Is X good?" or "Should I do Y?", brier helps you:
 1. Define what success looks like (KPIs)
 2. Expand your options (including ones you didn't consider)
 3. Make explicit forecasts (with confidence intervals and resolution rules)
@@ -11,43 +11,43 @@ Instead of asking "Is X good?" or "Should I do Y?", farness helps you:
 ## Installation
 
 ```bash
-python -m pip install 'farness[mcp]'
+python -m pip install 'brier[mcp]'
 ```
 
 ## Quick Start
 
 ### Codex
 
 ```bash
-farness setup codex
-farness doctor codex
+brier setup codex
+brier doctor codex
 ```
 
-Then restart Codex and use `$farness` when a decision prompt appears.
+Then restart Codex and use `$brier` when a decision prompt appears.
 
 ### Claude Code
 
 ```bash
-farness setup claude
-farness doctor claude
+brier setup claude
+brier doctor claude
 ```
 
 Then restart Claude Code.
 
 ### Local CLI
 
 ```bash
-farness new "Should we rewrite the auth layer?" --context "3 incidents this quarter"
-farness list
-farness calibration
+brier new "Should we rewrite the auth layer?" --context "3 incidents this quarter"
+brier list
+brier calibration
 ```
 
 The CLI is local-only and does not call an LLM or require an API key.
 
 ### Python package
 
 ```python
-from farness import Decision, KPI, Option, Forecast, DecisionStore
+from brier import Decision, KPI, Option, Forecast, DecisionStore
 from datetime import datetime, timedelta
 
 # Create a decision
@@ -109,20 +109,20 @@ store.save(decision)
 ### Command Line
 
 ```bash
-farness new "Should we launch now?"
-farness show abc123
-farness pending
-farness calibration
+brier new "Should we launch now?"
+brier show abc123
+brier pending
+brier calibration
 ```
 
 ### Forecast Question Drafts
 
-`farness` can turn a stored decision forecast or standalone policy question into
+`brier` can turn a stored decision forecast or standalone policy question into
 Manifold-ready forecast question drafts. This is draft-only: it does not publish
 questions, place a bet, or require a Manifold API key.
 
 ```bash
-farness forecast-draft "Will Waymo be legally permitted to offer fully driverless paid robotaxi rides in Washington, DC by 2026-12-31?" \
+brier forecast-draft "Will Waymo be legally permitted to offer fully driverless paid robotaxi rides in Washington, DC by 2026-12-31?" \
   --initial-prob 52 \
   --resolution-date 2026-12-31 \
   --resolution-rule "Resolve YES if official DC law, regulation, or permit approval allows Waymo to offer fully driverless paid public rides in DC by 2026-12-31." \
@@ -136,7 +136,7 @@ farness forecast-draft "Will Waymo be legally permitted to offer fully driverles
 For a stored decision with options and forecasts:
 
 ```bash
-farness forecast-draft abc123 --output forecast-pack.json
+brier forecast-draft abc123 --output forecast-pack.json
 ```
 
 An example Waymo/DC draft pack lives at
@@ -148,7 +148,7 @@ way.
 
 ### AI Agent Workflows
 
-`farness` is not tied to Claude. The Claude Code plugin is the most integrated path today, but the framework also works with Codex and other coding agents that can follow structured instructions or run shell commands.
+`brier` is not tied to Claude. The Claude Code plugin is the most integrated path today, but the framework also works with Codex and other coding agents that can follow structured instructions or run shell commands.
 
 For agent-agnostic setup and prompt guidance, see [`docs/agent-workflows.md`](docs/agent-workflows.md).
 
@@ -157,84 +157,84 @@ For agent-agnostic setup and prompt guidance, see [`docs/agent-workflows.md`](do
 The default builder path is package-first:
 
 ```bash
-python -m pip install 'farness[mcp]'
-farness setup codex
-farness doctor codex
+python -m pip install 'brier[mcp]'
+brier setup codex
+brier doctor codex
 ```
 
 For source installs during development:
 
 ```bash
-python -m pip install -e /path/to/farness
+python -m pip install -e /path/to/brier
 ```
 
 #### MCP server
 
 If you want a native tool interface instead of prompt copy-paste, install the package and run the MCP server locally:
 
 ```bash
-python -m pip install 'farness[mcp]'
-farness-mcp
+python -m pip install 'brier[mcp]'
+brier-mcp
 ```
 
-It exposes tools for creating, listing, retrieving, saving, and scoring decisions, plus resources/prompts for the farness workflow.
+It exposes tools for creating, listing, retrieving, saving, and scoring decisions, plus resources/prompts for the brier workflow.
 
 To register it in Codex as a local MCP server:
 
 ```bash
-farness setup codex
-farness doctor codex
+brier setup codex
+brier doctor codex
 ```
 
-This installs the packaged Codex skill and registers the MCP server with the same Python interpreter that launched `farness`.
+This installs the packaged Codex skill and registers the MCP server with the same Python interpreter that launched `brier`.
 
 #### Claude Code local skill + MCP
 
 Claude Code can use the same local MCP server and a local skill wrapper:
 
 ```bash
-python -m pip install 'farness[mcp]'
-farness setup claude
-farness doctor claude
+python -m pip install 'brier[mcp]'
+brier setup claude
+brier doctor claude
 ```
 
 This installs the packaged Claude skill and registers the MCP server in user scope.
 
 The plugin path still works if you prefer the slash-command workflow:
 
 ```bash
-claude plugin marketplace add MaxGhenis/farness
-claude plugin install farness@maxghenis-plugins
+claude plugin marketplace add MaxGhenis/brier
+claude plugin install brier@maxghenis-plugins
 ```
 
-Then either use the local `farness` skill or `/farness:decide` if you installed the plugin.
+Then either use the local `brier` skill or `/brier:decide` if you installed the plugin.
 
 #### Repair and reset
 
 If setup drifted or a skill was modified locally:
 
 ```bash
-farness doctor codex --fix
-farness doctor claude --fix
+brier doctor codex --fix
+brier doctor claude --fix
 ```
 
 If you want to remove the local integration and start over:
 
 ```bash
-farness uninstall codex
-farness setup codex
+brier uninstall codex
+brier setup codex
 ```
 
 or:
 
 ```bash
-farness uninstall claude
-farness setup claude
+brier uninstall claude
+brier setup claude
 ```
 
 ## The Framework
 
-Farness implements a structured decision process:
+Brier implements a structured decision process:
 
 1. **KPI Definition** - What outcomes actually matter? Make them measurable.
    Add outcome type, resolution date, resolution rule, and data source when possible.
@@ -262,8 +262,8 @@ Farness implements a structured decision process:
 ## Development
 
 ```bash
-git clone https://github.com/MaxGhenis/farness
-cd farness
+git clone https://github.com/MaxGhenis/brier
+cd brier
 pip install -e ".[dev,experiments]"
 pytest
 python -m build
@@ -277,19 +277,19 @@ Paper build:
 python3 paper/render_paper.py  # Regenerates figures, HTML, Markdown, and site/public/paper-raw
 python3 paper/run_strongest_validation.py  # Runs the strongest reviewer-facing validation on Claude Opus 4.6 and GPT-5.2
 python3 paper/run_study1_rerun.py --models gpt-5.4  # Reruns the original Study 1 design with legacy prompt wording
-python3 -m farness.experiments stability --strongest-validation --model gpt-5.2  # Single-model equivalent
+python3 -m brier.experiments stability --strongest-validation --model gpt-5.2  # Single-model equivalent
 ```
 
 ### Publishing to PyPI
 
 The package is published to PyPI from GitHub Releases using PyPI Trusted Publishing.
 
 **Setup (one-time):**
-1. In PyPI, open the `farness` project publishing settings:
-   - `https://pypi.org/manage/project/farness/settings/publishing/`
+1. In PyPI, open the `brier` project publishing settings:
+   - `https://pypi.org/manage/project/brier/settings/publishing/`
 2. Add a GitHub Actions trusted publisher with:
    - Owner: `MaxGhenis`
-   - Repository name: `farness`
+   - Repository name: `brier`
    - Workflow name: `publish.yml`
    - Environment name: leave blank unless you later add a GitHub environment
 

diff --git a/TODO-paper-revisions.md b/TODO-paper-revisions.md
@@ -1,22 +1,22 @@
-# Farness paper revisions — March 15, 2026
+# Brier paper revisions — March 15, 2026
 
 ## Priority 1: Narrative fixes
 
-- [x] **Reframe convergence finding**: "farness starts closer to where both end up after probing" — not divergence, not overshoot. Both conditions converge on similar final values; farness just starts closer. Change throughout abstract, Section 5.5, Section 6.3, Section 7.
-- [x] **Introduce farness properly**: "I introduce farness, a structured decision framework" not "I evaluate a framework called farness." This paper IS the introduction. Add footnote linking to GitHub/site.
+- [x] **Reframe convergence finding**: "Brier starts closer to where both end up after probing" — not divergence, not overshoot. Both conditions converge on similar final values; Brier just starts closer. Change throughout abstract, Section 5.5, Section 6.3, Section 7.
+- [x] **Introduce Brier properly**: "I introduce Brier, a structured decision framework" not "I evaluate a framework called Brier." This paper IS the introduction. Add footnote linking to GitHub/site.
 - [x] **Drop "pre-registered" claims**: Replace with "analysis code was committed prior to data collection (December 2025; experiments ran February 2026)." No formal pre-registration exists — just git history (commits 50e93d4, bfd1aae predate experiment runs).
 
 ## Priority 2: Graphs (desperately needed)
 
 - [ ] **Update magnitude box/violin plots**: by condition, for each model
 - [ ] **Per-scenario forest plot**: effect sizes with CIs for each scenario
-- [ ] **Convergence visualization**: show initial→final for naive vs farness on 2-3 scenarios, illustrating "farness starts closer to where both end up"
+- [ ] **Convergence visualization**: show initial→final for naive vs Brier on 2-3 scenarios, illustrating "Brier starts closer to where both end up"
 - [ ] **Sycophancy bar chart**: Claude vs GPT-5.2 update magnitude on sycophancy scenario — the most dramatic finding
 
 ## Priority 3: Content additions
 
-- [ ] **Concrete example**: Pick one scenario (e.g., sunk_cost_project), show actual responses from naive and farness conditions, before and after probing. Raw text excerpts.
-- [ ] **Sycophancy deep-dive**: GPT-5.2 naive updates by 466.7 leads on average under sycophantic pressure (1000→1300-1400). Claude: zero update. Farness on GPT: 108.3. This is the clearest finding in the paper and currently buried.
+- [ ] **Concrete example**: Pick one scenario (e.g., sunk_cost_project), show actual responses from naive and Brier conditions, before and after probing. Raw text excerpts.
+- [ ] **Sycophancy deep-dive**: GPT-5.2 naive updates by 466.7 leads on average under sycophantic pressure (1000→1300-1400). Claude: zero update. Brier on GPT: 108.3. This is the clearest finding in the paper and currently buried.
 - [ ] **Run symmetric sycophancy test**: Current test only pushes "higher." Add "I think it should be lower" version to confirm framework resists pressure in both directions. ~12 API calls, ~$5.
 
 ## Priority 4: Technical fixes
@@ -36,9 +36,9 @@
 
 ## Key data points for reference
 
-- Claude mixed-effects: farness = -4.17 (p<0.001), CoT = -0.56 (p=0.34)
-- GPT mixed-effects: farness = -37.0 (p=0.009), CoT = -29.7 (p=0.036)
-- GPT sycophancy (adversarial_sycophancy): naive mean update = 466.7 leads, farness = 108.3, Claude naive = 0.0
+- Claude mixed-effects: Brier = -4.17 (p<0.001), CoT = -0.56 (p=0.34)
+- GPT mixed-effects: Brier = -37.0 (p=0.009), CoT = -29.7 (p=0.036)
+- GPT sycophancy (adversarial_sycophancy): naive mean update = 466.7 leads, Brier = 108.3, Claude naive = 0.0
 - Scenarios use different units: percentages (most), weeks (planning), leads (sycophancy)
 - Analysis code: commits 50e93d4 (Dec 19) and bfd1aae (Dec 20), experiments: Feb 16-18
 - Skill optimization loop was running (PID 20928) — check if it finished and apply the optimized description