Skip to content

Commit 60919e4

Browse files
committed
initial commit for whitespace code
1 parent bbf8c3a commit 60919e4

188 files changed

Lines changed: 1942 additions & 37261 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The ONLY reason you should stop is if you have a blocking question for the user.
44

55
---
66

7-
# TrimTrailingWhitespace - AI Agent Guide
7+
# Whitespace - AI Agent Guide
88

99
Smart case-aware search and replace across code and files with atomic apply and undo.
1010

@@ -22,7 +22,7 @@ They should be mentioned in any package files, copyright notices, etc.
2222

2323
## Deliverables
2424

25-
- Rust core library and CLI (`trim-trailing-whitespace-core`, `trim-trailing-whitespace-cli`)
25+
- Rust core library and CLI (`whitespace-core`, `whitespace-cli`)
2626
- VS Code extension (TypeScript) that shells out to the CLI
2727
- MCP server (TypeScript) that wraps the CLI and exposes tools for Cursor or other MCP clients
2828

@@ -45,16 +45,16 @@ They should be mentioned in any package files, copyright notices, etc.
4545
- JSON for plan and history
4646
- TypeScript for VS Code extension and MCP wrapper
4747
- `ts-rs` for automatic TypeScript type generation from Rust structs
48-
- Types are generated as ambient `.d.ts` files in `trim-trailing-whitespace-core/bindings/`
48+
- Types are generated as ambient `.d.ts` files in `whitespace-core/bindings/`
4949
- VS Code and MCP projects include these bindings directly in their tsconfig
5050
- Optional Node bindings in future via `napi-rs` if needed
5151

5252
## Functional scope
5353

5454
- Case styles: snake_case, kebab-case, camelCase, PascalCase, SCREAMING_SNAKE_CASE, Title Case, Train-Case, dot.case
55-
- Plan: generate all old variants, map to new variants, create a single search program, scan once, write `.trim-trailing-whitespace/plan.json`
55+
- Plan: generate all old variants, map to new variants, create a single search program, scan once, write `.whitespace/plan.json`
5656
- Apply: update file contents, then rename files and directories, all atomically
57-
- Undo and redo: `.trim-trailing-whitespace/history.json` with checksums
57+
- Undo and redo: `.whitespace/history.json` with checksums
5858
- Conflicts: re-validate hunks, auto-resolve simple formatting shifts, stop on real conflicts unless forced
5959
- Respect ignore files by default (`.gitignore`, `.ignore`, `.rgignore`, `.rnignore`), allow include and exclude globs
6060
- Exclude binary files by default
@@ -68,10 +68,10 @@ They should be mentioned in any package files, copyright notices, etc.
6868

6969
## Repo layout
7070

71-
- `trim-trailing-whitespace-core` - core logic
72-
- `trim-trailing-whitespace-cli` - CLI frontend
73-
- `trim-trailing-whitespace-mcp` - MCP server
74-
- `trim-trailing-whitespace-vscode` - VS Code extension
71+
- `whitespace-core` - core logic
72+
- `whitespace-cli` - CLI frontend
73+
- `whitespace-mcp` - MCP server
74+
- `whitespace-vscode` - VS Code extension
7575
- `docs` - Starlight documentation
7676

7777
## Agent roles and behavior
@@ -97,27 +97,27 @@ They should be mentioned in any package files, copyright notices, etc.
9797
- Snapshot tests for plans and diffs
9898
- Fuzz tests for regex generation to prevent backtracking issues
9999
- Cross platform tests including Windows path edge cases
100-
- Lock signal integration tests rely on the `wait_for_lock_state` polling helper (see `trim-trailing-whitespace-cli/src/test_lock_signals.rs`); avoid reintroducing fixed sleeps that cause macOS flakes
100+
- Lock signal integration tests rely on the `wait_for_lock_state` polling helper (see `whitespace-cli/src/test_lock_signals.rs`); avoid reintroducing fixed sleeps that cause macOS flakes
101101

102102
## CLI contract
103103

104-
Binary: `trim-trailing-whitespace`
104+
Binary: `whitespace`
105105

106106
Commands:
107107

108-
- `trim-trailing-whitespace plan <old> <new> [opts]`
108+
- `whitespace plan <old> <new> [opts]`
109109
- `--include` `--exclude` `--respect-gitignore` (default true, respects all ignore files)
110110
- `--rename-files` `--rename-dirs` (default true)
111111
- `--styles=<list>`
112112
- `--preview table|diff|matches|summary|none` (human-readable preview)
113113
- `--output summary|json` (machine-readable output)
114114
- `--plan-out`
115115
- `-u/-uu/-uuu` (unrestricted levels to control ignore file handling)
116-
- `trim-trailing-whitespace apply [--plan PATH | --id ID] [--atomic true] [--commit]`
117-
- `trim-trailing-whitespace undo <id>`
118-
- `trim-trailing-whitespace redo <id>`
119-
- `trim-trailing-whitespace history [--limit N]`
120-
- `trim-trailing-whitespace status`
116+
- `whitespace apply [--plan PATH | --id ID] [--atomic true] [--commit]`
117+
- `whitespace undo <id>`
118+
- `whitespace redo <id>`
119+
- `whitespace history [--limit N]`
120+
- `whitespace status`
121121

122122
Exit codes:
123123

@@ -128,15 +128,15 @@ Exit codes:
128128

129129
## Data formats
130130

131-
`.trim-trailing-whitespace/plan.json`
131+
`.whitespace/plan.json`
132132

133133
- `{ id, created_at, old, new, styles[], includes[], excludes[], matches[], renames[], stats, version }`
134134

135-
`.trim-trailing-whitespace/history.json`
135+
`.whitespace/history.json`
136136

137137
- append only with checksums and revert info
138138

139-
`.trim-trailing-whitespace/config.toml`
139+
`.whitespace/config.toml`
140140

141141
- Project configuration including atomic identifiers
142142
- `atomic = ["DocSpring", "GitHub", "GitHub"]` - identifiers treated as indivisible units
@@ -151,7 +151,7 @@ Exit codes:
151151
6. For file and directory names, detect and schedule renames with depth ordering
152152
7. Emit `plan.json` and fast summary stats
153153

154-
- `--ignore-ambiguous` now prunes ambiguous identifiers (plain words that map to multiple styles) before matching or renaming; make sure new search variants respect this toggle and keep tests in `trim-trailing-whitespace-core/tests/ignore_ambiguous_test.rs` passing.
154+
- `--ignore-ambiguous` now prunes ambiguous identifiers (plain words that map to multiple styles) before matching or renaming; make sure new search variants respect this toggle and keep tests in `whitespace-core/tests/ignore_ambiguous_test.rs` passing.
155155

156156
- Implementation note: `scan_repository_multi` pre-filters candidate files with an `AhoCorasick` automaton, processes them in parallel via `rayon`, and only runs the expensive compound identifier scan on lines discovered by direct variant hits or token heuristics. When adjusting matching logic, keep the `token_line_hits` bookkeeping in sync with the `additional_lines` fed into `find_enhanced_matches`.
157157

@@ -163,12 +163,12 @@ Boundary rules
163163

164164
## Ignore file handling
165165

166-
TrimTrailingWhitespace respects multiple ignore file formats:
166+
Whitespace respects multiple ignore file formats:
167167

168168
- `.gitignore` - Standard Git ignore patterns
169169
- `.ignore` - Generic ignore file (compatible with ripgrep)
170170
- `.rgignore` - Ripgrep-specific ignore patterns
171-
- `.rnignore` - TrimTrailingWhitespace-specific ignore patterns (useful for excluding files from renaming without affecting Git)
171+
- `.rnignore` - Whitespace-specific ignore patterns (useful for excluding files from renaming without affecting Git)
172172

173173
The unrestricted levels (`-u` flag) control ignore behavior:
174174

@@ -195,7 +195,7 @@ The unrestricted levels (`-u` flag) control ignore behavior:
195195

196196
- Node TypeScript wrapper around the CLI
197197
- Tools: `plan`, `apply`, `undo`, `history`, `preview`
198-
- Installed via `npx`, expects `trim-trailing-whitespace` on PATH
198+
- Installed via `npx`, expects `whitespace` on PATH
199199
- Codex CLI uses `~/.codex/config.toml`; ensure docs include the TOML MCP example
200200

201201
## Coding standards
@@ -226,7 +226,7 @@ The unrestricted levels (`-u` flag) control ignore behavior:
226226
## Package Manager Requirements
227227

228228
- **ALWAYS use pnpm, NEVER npm** - All JavaScript/TypeScript projects in this repo use pnpm
229-
- This applies to all subdirectories: trim-trailing-whitespace-mcp, trim-trailing-whitespace-vscode, trim-trailing-whitespace-core, docs
229+
- This applies to all subdirectories: whitespace-mcp, whitespace-vscode, whitespace-core, docs
230230
- When showing commands, always use `pnpm` not `npm`
231231
- Examples: `pnpm install`, `pnpm test`, `pnpm build`
232232

@@ -265,24 +265,24 @@ If the user insists after your pushback, then proceed, but always voice concerns
265265

266266
## Testing Guidelines
267267

268-
- **ALWAYS use the --dry-run flag when testing the trim-trailing-whitespace CLI** to avoid creating unwanted plan files and modifications
269-
- When running test commands with trim-trailing-whitespace, use: `./target/debug/trim-trailing-whitespace plan ... --dry-run`
270-
- This prevents the creation of `.trim-trailing-whitespace/plan.json` files during testing
268+
- **ALWAYS use the --dry-run flag when testing the whitespace CLI** to avoid creating unwanted plan files and modifications
269+
- When running test commands with whitespace, use: `./target/debug/whitespace plan ... --dry-run`
270+
- This prevents the creation of `.whitespace/plan.json` files during testing
271271

272272
### CI Self-Hosting Testing
273273

274-
- **CRITICAL: Never use "trim-trailing-whitespace" or project name patterns in test content**
275-
- Our CI includes e2e tests that rename the entire project (trim-trailing-whitespace → <alternative-protected-string> → trim-trailing-whitespace)
274+
- **CRITICAL: Never use "whitespace" or project name patterns in test content**
275+
- Our CI includes e2e tests that rename the entire project (whitespace → <alternative-protected-string> → whitespace)
276276
- Any test content containing the project name will be modified during CI, potentially breaking tests if you are not careful.
277-
- **Mostly use generic names like "testword", "module", "config" instead of "trim-trailing-whitespace"**
277+
- **Mostly use generic names like "testword", "module", "config" instead of "whitespace"**
278278
- **Use "renamed_renaming_tool" NOT the alternative protected string in tests**
279279
- The alternative protected string is only allowed in files matching `.rnignore` entries:
280280
- `.github/workflows/`
281281
- `docs/src/content/docs/index.mdx`
282282
- `docs/src/assets/case-studies/`
283283
- `docs/src/content/docs/case-studies/`
284284
- All other test files should use "renamed_renaming_tool" as the target replacement string
285-
- This prevents CI failures when trim-trailing-whitespace tests itself and ensures clean self-hosting testing
285+
- This prevents CI failures when whitespace tests itself and ensures clean self-hosting testing
286286

287287
## DO NOT REDIRECT STDERR
288288

BLOG_POST.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Blog Post Outline: How I Built the World’s Fastest Whitespace Trimmer
2+
3+
## Working Title
4+
- How I Built the World’s Fastest Whitespace Trimmer
5+
6+
## Hook & Motivation
7+
- lefthook-driven workflow; editor auto-trim usually good enough.
8+
- Recent surge of generated/AI-written code sneaking in trailing spaces and missing final newlines.
9+
- Existing solutions (bash script, Python pre-commit hook, npm package, other Rust tool) would work but feel unsatisfying.
10+
- Reading Bun’s performance blog post rekindled the idea: build something absurdly fast for fun and for the blog.
11+
12+
## Background & Context
13+
- DocSpring focus: useful open source tools/blog posts for developers; no interest in marketing fluff.
14+
- Personal curiosity about algorithms/performance despite limited formal CS background.
15+
- Renamify boilerplate: copied repo to `trim-trailing-whitespace`, used Renamify to rewrite identifiers.
16+
- Initial cleanup: delete MCP server, VS Code extension, legacy docs, all original Rust code; keep Taskfiles, lefthook, CI scaffolding.
17+
18+
## Early Ideas & Questions
19+
- Bloom filter cache: auto-sized, tuned on DocSpring repo; handle false positives via resizes.
20+
- Where to store cache? Options: temp dir, `.git`, OS cache dirs.
21+
- Compiling `.gitignore`: convert patterns into compact machine-friendly representation; rebuild on mtime changes/additions.
22+
- Per-file delta detection: subdivide files into segments; can we avoid reading every byte each run?
23+
- Quick research (ChatGPT): deleting arbitrary bytes without rewriting is mostly impossible; file rewrite is still best.
24+
25+
## Defining the Scope
26+
- Deliverables: Rust core crate + CLI only, whitespace transformations only.
27+
- Respect Git’s ignore configuration (`.gitignore`, `.git/info/exclude`, global excludes).
28+
- No IDE/MCP integrations; project is a one-off experiment tied to the blog post.
29+
- Performance-first mindset: compile-time feature flags to toggle optimisations, benchmarking pipeline to show improvements.
30+
31+
## Architecture Plan Highlights
32+
- `whitespace-core`: transcode/trim logic, line ending normalization, tab/space conversion, SIMD implementations.
33+
- `whitespace-cli`: Clap-based binary, recursive walker, binary detection, `--check` mode, feature toggles.
34+
- Cache design:
35+
- Cache paths per OS; repo ID via blake3-128 of canonical root + volume info.
36+
- Files: `keys.bin` (sorted u128 keys), `meta.bin` (header, OS journal checkpoints, repo stats), `lock` guard.
37+
- Key formula: blake3-128(path, dev, ino, size, mtime_ns, ctime_ns, mode).
38+
- Racy guard: rescan when `mtime_sec` == `index_write_time_sec`.
39+
- Change detection: macOS FSEvents, Windows USN Journal, Linux fallback walk.
40+
41+
## Implementation Strategy
42+
- Build naive baseline implementation first (byte-by-byte, no cache, always rewrite).
43+
- Add SIMD and other optimisations behind cargo features.
44+
- Introduce compile-time feature matrix (baseline → simd → parallel walk → cache → mmap) for benchmarks.
45+
- Use hyperfine to benchmark each tier on `/Users/ndbroadbent/code/docspring` (cold vs warm cache).
46+
- Document results with charts comparing stages (baseline vs optimised).
47+
48+
## Content Sections (Rough)
49+
1. Introduction + why the tool exists.
50+
2. Scaffolding and cleaning the repo.
51+
3. Exploring performance ideas in the car-side brain dump (Bloom filters, gitignore compilation, cache placement).
52+
4. Lessons from talking to AI (file rewrite reality, SIMD suggestions, other micro-optimisations).
53+
5. Designing the definitive cache (keys, meta, OS journals, racy guard).
54+
6. Naive baseline implementation and first measurements.
55+
7. Iterative optimisations with feature flags + benchmark plots.
56+
8. Integrating with lefthook and everyday workflow.
57+
9. Final thoughts (was it worth it? fun factor? what’s next optional, but keep minimal until work is done).
58+
59+
## Notes & TODOs
60+
- Insert references/links: Bun performance blog, API client post, lefthook, DocSpring.
61+
- Collect benchmark data with hyperfine once implementations land.
62+
- Generate diagrams for cache structure (`keys.bin`, `meta.bin`).
63+
- Keep tone conversational, emphasise experimentation over production dogma.

Cargo.toml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
[workspace]
2-
members = ["trim-trailing-whitespace-core", "trim-trailing-whitespace-cli"]
2+
members = ["whitespace-core", "whitespace-cli"]
33
resolver = "2"
44

55
[workspace.package]
66
version = "0.4.1"
77
edition = "2021"
88
authors = ["DocSpring"]
99
license = "MIT"
10-
repository = "https://github.com/DocSpring/trim-trailing-whitespace"
11-
homepage = "https://github.com/DocSpring/trim-trailing-whitespace"
10+
repository = "https://github.com/DocSpring/whitespace"
11+
homepage = "https://github.com/DocSpring/whitespace"
1212

1313
[workspace.dependencies]
1414
# Core dependencies
@@ -31,6 +31,7 @@ nu-ansi-term = "0.50"
3131
tempfile = "3"
3232
sha2 = "0.10"
3333
clap = { version = "4.5", features = ["derive", "cargo", "env"] }
34+
once_cell = "1.19"
3435

3536
# Dev dependencies
3637
proptest = "1.4"

0 commit comments

Comments
 (0)