Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b9cbbba
chore(site): upgrade deps, adopt brier-design tokens, add universe di…
MaxGhenis May 30, 2026
b2c1504
Rename package + tooling: farness -> brier (Brier 0)
MaxGhenis May 30, 2026
081e03d
Rename web apps: farness -> brier (site + forecast-api)
MaxGhenis May 30, 2026
d563d63
Rename research paper + figures: farness -> Brier (Brier 0)
MaxGhenis May 30, 2026
dc03a5d
Preserve 'farness' as the frozen experiment condition key
MaxGhenis May 30, 2026
22a3846
Rebrand stragglers: TODO doc, diagram CSS class, figure annotations
MaxGhenis May 30, 2026
ce8360a
Rebuild universe diagram: add the Brier line (model lineage 0 to 5)
MaxGhenis May 30, 2026
975506f
Universe diagram: side-by-side agent + Almanac pair; redirect farness…
MaxGhenis May 30, 2026
a1f4259
Brand the deployed site as the Brier Almanac
MaxGhenis May 30, 2026
ef3867a
Remove the Install button from the site header
MaxGhenis May 30, 2026
18601ac
Universe diagram: widen PolicyEngine and Microplex to the pair width
MaxGhenis May 30, 2026
48c62be
Replace stale farness social card with Brier Almanac og-image
MaxGhenis Jun 3, 2026
2ac8612
Strip legacy farness decision-framework sections from Almanac homepage
MaxGhenis Jun 3, 2026
55443cc
Rebrand forecasting app: Brier Almanac -> Thesis
MaxGhenis Jun 5, 2026
04ff4b7
Rebrand universe diagram: Brier Almanac -> Thesis, brier institute ->…
MaxGhenis Jun 5, 2026
bd30828
Universe diagram: equalize Brier-1 and Thesis box heights
MaxGhenis Jun 5, 2026
6b24d1e
Universe diagram: widen Axiom nodes to match PolicyEngine, drop RuleSpec
MaxGhenis Jun 5, 2026
bc9a667
Universe diagram: remove USC citation constellation from Axiom backgr…
MaxGhenis Jun 5, 2026
c2207c8
Replace /thesis decision-framework essay with /about vision page
MaxGhenis Jun 6, 2026
71623d3
Make /about a listed, typical About page; fix logo label to Thesis
MaxGhenis Jun 6, 2026
ee99337
Refer to the org as "The Thesis Institute" in prose/titles
MaxGhenis Jun 6, 2026
11edfb3
About: stop conflating Thesis with Axiom's deterministic work
MaxGhenis Jun 6, 2026
556d4db
Hide Research from the app nav
MaxGhenis Jun 6, 2026
b6c37a4
Reframe Axiom on the About page: a tool Thesis uses, not a half of Th…
MaxGhenis Jun 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
},
"plugins": [
{
"name": "farness",
"name": "brier",
"description": "Decision-making framework that reframes subjective questions as forecasting problems with explicit KPIs, option expansion, and calibration tracking",
"version": "0.1.0",
"author": {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
name: farness
description: Use when the user wants advice or a decision recommendation rather than direct implementation, especially for prompts like "should I", "should we", "which is better", "is it worth it", or "what would you do" about architecture, product, hiring, strategy, or career choices. Prefer the local farness MCP server when available and structure the answer around KPI, option expansion, reference class, disconfirming evidence, numeric forecasts, and a review date.
name: brier
description: Use when the user wants advice or a decision recommendation rather than direct implementation, especially for prompts like "should I", "should we", "which is better", "is it worth it", or "what would you do" about architecture, product, hiring, strategy, or career choices. Prefer the local brier MCP server when available and structure the answer around KPI, option expansion, reference class, disconfirming evidence, numeric forecasts, and a review date.
---

# Farness
# Brier

Use this skill to turn vague decisions into forecastable choices.

Prefer the local `farness` MCP server when it is connected.
Prefer the local `brier` MCP server when it is connected.

## Workflow

Expand Down Expand Up @@ -43,8 +43,8 @@ Prefer the local `farness` MCP server when it is connected.

## Setup

If the `farness` MCP server is not connected, add it with:
If the `brier` MCP server is not connected, add it with:

```bash
farness setup claude
brier setup claude
```
30 changes: 15 additions & 15 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

Farness is a decision-making framework that reframes subjective questions ("Should I...?") into forecasting problems with explicit KPIs, confidence intervals, and calibration tracking. The core thesis: making numeric predictions forces mechanism thinking, creates accountability, and reduces sycophancy.
Brier is a decision-making framework that reframes subjective questions ("Should I...?") into forecasting problems with explicit KPIs, confidence intervals, and calibration tracking. The core thesis: making numeric predictions forces mechanism thinking, creates accountability, and reduces sycophancy.

## Commands

Expand All @@ -21,24 +21,24 @@ pytest
pytest tests/test_framework.py

# Run with coverage
pytest --cov=farness
pytest --cov=brier

# Format code
black farness tests
ruff check farness tests
black brier tests
ruff check brier tests
```

### CLI

```bash
farness new "question" # Create a new decision
farness new "q" --context "details" # With context
farness list # List all decisions
farness list --pending # Decisions past review date
farness show <id> # Show decision details (supports prefix match)
farness score [id] # Score a decision's actual outcomes (interactive)
farness calibration # Show calibration statistics
farness pending # Alias for list --pending
brier new "question" # Create a new decision
brier new "q" --context "details" # With context
brier list # List all decisions
brier list --pending # Decisions past review date
brier show <id> # Show decision details (supports prefix match)
brier score [id] # Score a decision's actual outcomes (interactive)
brier calibration # Show calibration statistics
brier pending # Alias for list --pending
```

### Site (Next.js)
Expand All @@ -57,15 +57,15 @@ bun run test # Run vitest tests
python3 paper/render_paper.py # Generate figures, render HTML, sync preemptive_rigor.md and site/public/paper-raw
python3 paper/run_strongest_validation.py # Strongest reviewer-facing validation across Claude Opus 4.6 and GPT-5.2
python3 paper/run_study1_rerun.py --models gpt-5.4 # Original Study 1 rerun with legacy prompt wording
python3 -m farness.experiments stability --strongest-validation --model gpt-5.2 # Single-model strongest validation
python3 -m brier.experiments stability --strongest-validation --model gpt-5.2 # Single-model strongest validation
```

## Architecture

### Python Package (`farness/`)
### Python Package (`brier/`)

- **framework.py**: Core dataclasses (`Decision`, `KPI`, `Option`, `Forecast`) with serialization. `Option.expected_value()` computes weighted expected values across KPIs. `Decision.best_option()` and `sensitivity_analysis()` for analysis.
- **storage.py**: `DecisionStore` persists decisions to `~/.farness/decisions.jsonl` in JSONL format. Supports CRUD and filtered queries (unscored, pending review, scored).
- **storage.py**: `DecisionStore` persists decisions to `~/.brier/decisions.jsonl` in JSONL format. Supports CRUD and filtered queries (unscored, pending review, scored).
- **calibration.py**: `CalibrationTracker` computes forecast accuracy metrics: coverage (% of actuals in CIs), calibration error (coverage vs stated confidence), MAE, MRE, Brier scores.
- **cli.py**: Argparse CLI wrapping the above modules.

Expand Down
98 changes: 49 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Farness
# Brier

**Forecasting as a harness for decision-making.**

Instead of asking "Is X good?" or "Should I do Y?", farness helps you:
Instead of asking "Is X good?" or "Should I do Y?", brier helps you:
1. Define what success looks like (KPIs)
2. Expand your options (including ones you didn't consider)
3. Make explicit forecasts (with confidence intervals and resolution rules)
Expand All @@ -11,43 +11,43 @@ Instead of asking "Is X good?" or "Should I do Y?", farness helps you:
## Installation

```bash
python -m pip install 'farness[mcp]'
python -m pip install 'brier[mcp]'
```

## Quick Start

### Codex

```bash
farness setup codex
farness doctor codex
brier setup codex
brier doctor codex
```

Then restart Codex and use `$farness` when a decision prompt appears.
Then restart Codex and use `$brier` when a decision prompt appears.

### Claude Code

```bash
farness setup claude
farness doctor claude
brier setup claude
brier doctor claude
```

Then restart Claude Code.

### Local CLI

```bash
farness new "Should we rewrite the auth layer?" --context "3 incidents this quarter"
farness list
farness calibration
brier new "Should we rewrite the auth layer?" --context "3 incidents this quarter"
brier list
brier calibration
```

The CLI is local-only and does not call an LLM or require an API key.

### Python package

```python
from farness import Decision, KPI, Option, Forecast, DecisionStore
from brier import Decision, KPI, Option, Forecast, DecisionStore
from datetime import datetime, timedelta

# Create a decision
Expand Down Expand Up @@ -109,20 +109,20 @@ store.save(decision)
### Command Line

```bash
farness new "Should we launch now?"
farness show abc123
farness pending
farness calibration
brier new "Should we launch now?"
brier show abc123
brier pending
brier calibration
```

### Forecast Question Drafts

`farness` can turn a stored decision forecast or standalone policy question into
`brier` can turn a stored decision forecast or standalone policy question into
Manifold-ready forecast question drafts. This is draft-only: it does not publish
questions, place a bet, or require a Manifold API key.

```bash
farness forecast-draft "Will Waymo be legally permitted to offer fully driverless paid robotaxi rides in Washington, DC by 2026-12-31?" \
brier forecast-draft "Will Waymo be legally permitted to offer fully driverless paid robotaxi rides in Washington, DC by 2026-12-31?" \
--initial-prob 52 \
--resolution-date 2026-12-31 \
--resolution-rule "Resolve YES if official DC law, regulation, or permit approval allows Waymo to offer fully driverless paid public rides in DC by 2026-12-31." \
Expand All @@ -136,7 +136,7 @@ farness forecast-draft "Will Waymo be legally permitted to offer fully driverles
For a stored decision with options and forecasts:

```bash
farness forecast-draft abc123 --output forecast-pack.json
brier forecast-draft abc123 --output forecast-pack.json
```

An example Waymo/DC draft pack lives at
Expand All @@ -148,7 +148,7 @@ way.

### AI Agent Workflows

`farness` is not tied to Claude. The Claude Code plugin is the most integrated path today, but the framework also works with Codex and other coding agents that can follow structured instructions or run shell commands.
`brier` is not tied to Claude. The Claude Code plugin is the most integrated path today, but the framework also works with Codex and other coding agents that can follow structured instructions or run shell commands.

For agent-agnostic setup and prompt guidance, see [`docs/agent-workflows.md`](docs/agent-workflows.md).

Expand All @@ -157,84 +157,84 @@ For agent-agnostic setup and prompt guidance, see [`docs/agent-workflows.md`](do
The default builder path is package-first:

```bash
python -m pip install 'farness[mcp]'
farness setup codex
farness doctor codex
python -m pip install 'brier[mcp]'
brier setup codex
brier doctor codex
```

For source installs during development:

```bash
python -m pip install -e /path/to/farness
python -m pip install -e /path/to/brier
```

#### MCP server

If you want a native tool interface instead of prompt copy-paste, install the package and run the MCP server locally:

```bash
python -m pip install 'farness[mcp]'
farness-mcp
python -m pip install 'brier[mcp]'
brier-mcp
```

It exposes tools for creating, listing, retrieving, saving, and scoring decisions, plus resources/prompts for the farness workflow.
It exposes tools for creating, listing, retrieving, saving, and scoring decisions, plus resources/prompts for the brier workflow.

To register it in Codex as a local MCP server:

```bash
farness setup codex
farness doctor codex
brier setup codex
brier doctor codex
```

This installs the packaged Codex skill and registers the MCP server with the same Python interpreter that launched `farness`.
This installs the packaged Codex skill and registers the MCP server with the same Python interpreter that launched `brier`.

#### Claude Code local skill + MCP

Claude Code can use the same local MCP server and a local skill wrapper:

```bash
python -m pip install 'farness[mcp]'
farness setup claude
farness doctor claude
python -m pip install 'brier[mcp]'
brier setup claude
brier doctor claude
```

This installs the packaged Claude skill and registers the MCP server in user scope.

The plugin path still works if you prefer the slash-command workflow:

```bash
claude plugin marketplace add MaxGhenis/farness
claude plugin install farness@maxghenis-plugins
claude plugin marketplace add MaxGhenis/brier
claude plugin install brier@maxghenis-plugins
```

Then either use the local `farness` skill or `/farness:decide` if you installed the plugin.
Then either use the local `brier` skill or `/brier:decide` if you installed the plugin.

#### Repair and reset

If setup drifted or a skill was modified locally:

```bash
farness doctor codex --fix
farness doctor claude --fix
brier doctor codex --fix
brier doctor claude --fix
```

If you want to remove the local integration and start over:

```bash
farness uninstall codex
farness setup codex
brier uninstall codex
brier setup codex
```

or:

```bash
farness uninstall claude
farness setup claude
brier uninstall claude
brier setup claude
```

## The Framework

Farness implements a structured decision process:
Brier implements a structured decision process:

1. **KPI Definition** - What outcomes actually matter? Make them measurable.
Add outcome type, resolution date, resolution rule, and data source when possible.
Expand Down Expand Up @@ -262,8 +262,8 @@ Farness implements a structured decision process:
## Development

```bash
git clone https://github.com/MaxGhenis/farness
cd farness
git clone https://github.com/MaxGhenis/brier
cd brier
pip install -e ".[dev,experiments]"
pytest
python -m build
Expand All @@ -277,19 +277,19 @@ Paper build:
python3 paper/render_paper.py # Regenerates figures, HTML, Markdown, and site/public/paper-raw
python3 paper/run_strongest_validation.py # Runs the strongest reviewer-facing validation on Claude Opus 4.6 and GPT-5.2
python3 paper/run_study1_rerun.py --models gpt-5.4 # Reruns the original Study 1 design with legacy prompt wording
python3 -m farness.experiments stability --strongest-validation --model gpt-5.2 # Single-model equivalent
python3 -m brier.experiments stability --strongest-validation --model gpt-5.2 # Single-model equivalent
```

### Publishing to PyPI

The package is published to PyPI from GitHub Releases using PyPI Trusted Publishing.

**Setup (one-time):**
1. In PyPI, open the `farness` project publishing settings:
- `https://pypi.org/manage/project/farness/settings/publishing/`
1. In PyPI, open the `brier` project publishing settings:
- `https://pypi.org/manage/project/brier/settings/publishing/`
2. Add a GitHub Actions trusted publisher with:
- Owner: `MaxGhenis`
- Repository name: `farness`
- Repository name: `brier`
- Workflow name: `publish.yml`
- Environment name: leave blank unless you later add a GitHub environment

Expand Down
18 changes: 9 additions & 9 deletions TODO-paper-revisions.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# Farness paper revisions — March 15, 2026
# Brier paper revisions — March 15, 2026

## Priority 1: Narrative fixes

- [x] **Reframe convergence finding**: "farness starts closer to where both end up after probing" — not divergence, not overshoot. Both conditions converge on similar final values; farness just starts closer. Change throughout abstract, Section 5.5, Section 6.3, Section 7.
- [x] **Introduce farness properly**: "I introduce farness, a structured decision framework" not "I evaluate a framework called farness." This paper IS the introduction. Add footnote linking to GitHub/site.
- [x] **Reframe convergence finding**: "Brier starts closer to where both end up after probing" — not divergence, not overshoot. Both conditions converge on similar final values; Brier just starts closer. Change throughout abstract, Section 5.5, Section 6.3, Section 7.
- [x] **Introduce Brier properly**: "I introduce Brier, a structured decision framework" not "I evaluate a framework called Brier." This paper IS the introduction. Add footnote linking to GitHub/site.
- [x] **Drop "pre-registered" claims**: Replace with "analysis code was committed prior to data collection (December 2025; experiments ran February 2026)." No formal pre-registration exists — just git history (commits 50e93d4, bfd1aae predate experiment runs).

## Priority 2: Graphs (desperately needed)

- [ ] **Update magnitude box/violin plots**: by condition, for each model
- [ ] **Per-scenario forest plot**: effect sizes with CIs for each scenario
- [ ] **Convergence visualization**: show initial→final for naive vs farness on 2-3 scenarios, illustrating "farness starts closer to where both end up"
- [ ] **Convergence visualization**: show initial→final for naive vs Brier on 2-3 scenarios, illustrating "Brier starts closer to where both end up"
- [ ] **Sycophancy bar chart**: Claude vs GPT-5.2 update magnitude on sycophancy scenario — the most dramatic finding

## Priority 3: Content additions

- [ ] **Concrete example**: Pick one scenario (e.g., sunk_cost_project), show actual responses from naive and farness conditions, before and after probing. Raw text excerpts.
- [ ] **Sycophancy deep-dive**: GPT-5.2 naive updates by 466.7 leads on average under sycophantic pressure (1000→1300-1400). Claude: zero update. Farness on GPT: 108.3. This is the clearest finding in the paper and currently buried.
- [ ] **Concrete example**: Pick one scenario (e.g., sunk_cost_project), show actual responses from naive and Brier conditions, before and after probing. Raw text excerpts.
- [ ] **Sycophancy deep-dive**: GPT-5.2 naive updates by 466.7 leads on average under sycophantic pressure (1000→1300-1400). Claude: zero update. Brier on GPT: 108.3. This is the clearest finding in the paper and currently buried.
- [ ] **Run symmetric sycophancy test**: Current test only pushes "higher." Add "I think it should be lower" version to confirm framework resists pressure in both directions. ~12 API calls, ~$5.

## Priority 4: Technical fixes
Expand All @@ -36,9 +36,9 @@

## Key data points for reference

- Claude mixed-effects: farness = -4.17 (p<0.001), CoT = -0.56 (p=0.34)
- GPT mixed-effects: farness = -37.0 (p=0.009), CoT = -29.7 (p=0.036)
- GPT sycophancy (adversarial_sycophancy): naive mean update = 466.7 leads, farness = 108.3, Claude naive = 0.0
- Claude mixed-effects: Brier = -4.17 (p<0.001), CoT = -0.56 (p=0.34)
- GPT mixed-effects: Brier = -37.0 (p=0.009), CoT = -29.7 (p=0.036)
- GPT sycophancy (adversarial_sycophancy): naive mean update = 466.7 leads, Brier = 108.3, Claude naive = 0.0
- Scenarios use different units: percentages (most), weeks (planning), leads (sycophancy)
- Analysis code: commits 50e93d4 (Dec 19) and bfd1aae (Dec 20), experiments: Feb 16-18
- Skill optimization loop was running (PID 20928) — check if it finished and apply the optimized description
Loading