Skip to content

docs(zeb-139): KL-retrofit findings — attractor holds (cheap-win confound)#258

Open
jenglund wants to merge 2 commits into
mainfrom
zeblith/zeb-139-findings
Open

docs(zeb-139): KL-retrofit findings — attractor holds (cheap-win confound)#258
jenglund wants to merge 2 commits into
mainfrom
zeblith/zeb-139-findings

Conversation

@jenglund
Copy link
Copy Markdown
Contributor

@jenglund jenglund commented Apr 20, 2026

Summary

Findings doc + spec status update for ZEB-139. The 4-cell matrix run completed; verdict is attractor HOLDS under λ=0.5 KL+CE on the cross-arch TinyLlama setup.

TL;DR

  • Cells 1+2 (no-router baseline) reproduce ZEB-136 to 4 decimals (val_loss=4.5546). Sanity check passes.
  • Cells 3+4 (router-on, KL=0.5, real vs shuffled) produced val_loss=4.5636 vs 4.5637, Δ-diff = −0.0001 nats — essentially zero.
  • Forensic: cross_run_cos engram_logits = +0.9999 between cells 3 and 4. Smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution.
  • All five spec §7 fingerprint thresholds say the attractor HOLDS:
Metric Threshold (broken if…) ZEB-136 (no KL) ZEB-139 (KL=0.5)
engram_logit_entropy < log V − 0.1 = 10.27 10.3735 10.3467
α outside [0.14, 0.20] 0.1644 0.1762
Cross-run cosine < 0.7 +0.7979 +0.9999
Δ-diff ≥ +0.001 nats +0.0002 −0.0001
W_align ‖·‖_F > 2× init 1.91 1.35

What this contributes to the bigger picture

Per spec §11 outer matrix, ZEB-139 fills the "KL-retrofit Holds" row. ZEB-138 (same-arch teacher, CE-only) is the orthogonal axis still pending KRILE's Harmony-474M handoff. Once both rows land:

ZEB-139 ZEB-138 Combined
Holds (this PR) Breaks Teacher-arch dominates; pursue same-arch + capgap, deprioritize KL+CE
Holds (this PR) Holds Structural ceiling at 40M (Gemini §7 steelman); next axis is multi-layer non-linear W_align or backbone unfreezing

Bonus diagnostic (worth flagging)

Notably, max LM-head row cos jumped from 0.22 (ZEB-136 without KL) to 0.78 (ZEB-139 with KL). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos=1.0). The KL term is doing what it was designed to do (push the router toward the teacher's distribution); the teacher's distribution just turns out to be roughly position-independent at the corpus average, so the result is content-blind. A future variant could try a temperature on the router-side softmax (spec §12 q2) or a per-token KL mask that down-weights frequent-token positions to force per-position attention.

Operational note (worth capturing for next operator)

First matrix attempt failed at cell 3 because the local main repo dir was checked out on the stale zeblith/zeb-138-same-arch-teacher branch (predates PR #257 by several commits). The venv's ct87 editable install therefore imported a train.py without the new --engram-skip-to-logit / --kl-lambda flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently from zeta_ctrl_2048, so no chaining was lost). Doc captures the recipe so the next operator doesn't repeat.

Test plan

  • Findings doc renders cleanly (markdown headers, tables, code blocks)
  • All numbers in the doc trace to the source artifacts (val_loss from CSVs / training logs, fingerprint metrics from forensics/router_on_kl.txt, ZEB-136 baselines from /home/zebli/work/LOCAL/zeb136/)
  • Spec doc STATUS line updated from "DRAFT — blocked on PR feat(oracle): harmony:<path> teacher URI for ZEB-138 same-arch oracle extraction #254" to "COMPLETE — see findings"

Doc-only PR; no code changes, no tests to run.

🤖 Generated with Claude Code


Note

Low Risk
Low risk because this PR only adds/updates documentation and does not change runtime code paths or data handling.

Overview
Adds a new findings writeup docs/findings/2026-04-19-zeb-139-kl-retrofit.md documenting the completed ZEB-139 4-cell KL+CE experiment (including a λ=0.9 closure run) and its key outcome: the maximum-entropy attractor holds with evidence of a content-independent “cheap-win” collapse.

Updates the ZEB-139 design spec docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md status from draft/blocked to complete, linking to the findings and summarizing the final verdict + discriminator metric.

Reviewed by Cursor Bugbot for commit b4a371b. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive experiment findings document detailing methodology, results matrix, diagnostic metrics, and forensic artifacts for the KL-retrofit study.
    • Updated the experimental specification from draft to complete with quantitative validation (near-identical cross-run behavior) and closure notes for the KL-retrofit axis, plus recommended next steps.

…-win confound)

ZEB-139 4-cell matrix run on TinyLlama oracle + sidecar produced by
PR #255. KL+CE training (λ=0.5) at cells 3+4 did NOT escape the
maximum-entropy attractor. cross_run_cos between real-oracle and
shuffled-oracle KL+router cells = +0.9999 — the smoking gun for the
cheap-win confound: KL forces both routers to the same
content-independent average distribution rather than learning
per-position content routing.

Per spec §11 outer matrix: ZEB-139 contribution is "Holds". Combined
with ZEB-138's pending verdict, points to either teacher-arch
dominance (if ZEB-138 breaks) or the structural-ceiling steelman
(if ZEB-138 also holds).

Sanity checks all passed:

- Cell 1+2 (no-router baseline) reproduces ZEB-136's val_loss to 4
  decimals (4.5546 vs ZEB-136's 4.5546 / 4.5544)
- Oracle PCA explained_variance_ratio_total = 0.9338690864205668,
  bit-identical to ZEB-136's stored value (proves the GPU-side
  index_add_ accumulator from PR #255's perf fix produces the same
  Welford means as the original CPU path)
- Sidecar shape (10000, 32000) bf16, 10000/10000 rows populated,
  shape-matched to engram_table

Operational note in the doc: first matrix attempt failed at cell 3
because the local main repo dir was on a stale branch
(zeblith/zeb-138-same-arch-teacher); the venv's ct87 editable install
therefore imported a train.py without the new flags. Resolved by
git checkout main && git pull, then re-ran cells 3+4 only (each cell
init's independently, no chaining was lost). Doc captures the recipe
to avoid recurrence.

Spec doc STATUS line updated from "DRAFT — blocked on PR #254"
to "COMPLETE — see findings".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 20, 2026

PR author is in the excluded authors list.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 20, 2026

CodeAnt AI is reviewing your PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

This PR adds a new findings document reporting the ZEB-139 “KL-Retrofit Objective-Axis” experiment and updates the design spec status from DRAFT to COMPLETE, recording quantitative results (including cross_run_cos ≈ +0.9999), experiment matrix, forensic metrics, and a λ=0.9 closure run.

Changes

Cohort / File(s) Summary
ZEB-139 KL-Retrofit Documentation
docs/findings/2026-04-19-zeb-139-kl-retrofit.md, docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md
Added a new findings page with setup, merged-pr prerequisites, 4-cell experiment matrix, val_loss and diagnostic matrices, KL trajectory and forensic metrics; updated spec header from DRAFT → COMPLETE and recorded attractor confirmation (cross_run_cos = +0.9999).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

size:XL

Poem

🐰 Hop, hop, the findings arrive so spry,

KL and routers twirl beneath the sky,
Real or shuffled, outputs match in tune,
Cross-run cosines gleam like a moon,
A retrofit tale from a curious bunny’s rune.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: documenting ZEB-139 KL-retrofit findings showing the attractor holds at λ=0.5, with the parenthetical noting the key insight (content-independent confound). It is concise, directly related to the core changeset, and uses descriptive language.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zeblith/zeb-139-findings

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

ZEB-139 KL-retrofit findings: attractor holds, cheap-win confound confirmed

📝 Documentation

Grey Divider

Walkthroughs

Description
• Comprehensive findings document for ZEB-139 KL-retrofit experiment on TinyLlama
• Confirms attractor HOLDS at λ=0.5 with cross_run_cos=+0.9999 indicating cheap-win confound
• KL forces both real and shuffled routers to identical content-independent average distribution
• Spec status updated from DRAFT to COMPLETE with verdict locked in for outer matrix
Diagram
flowchart LR
  A["ZEB-139 Experiment<br/>4-cell matrix<br/>KL+CE training"] --> B["Cell 3+4 Results<br/>val_loss: 4.5636 vs 4.5637<br/>Δ-diff: -0.0001 nats"]
  B --> C["Forensic Analysis<br/>cross_run_cos: +0.9999<br/>KL trajectory: 1.27→1.22"]
  C --> D["Verdict: Attractor HOLDS<br/>Cheap-win confound confirmed<br/>KL forces average distribution"]
  D --> E["Spec §11 Matrix<br/>Awaiting ZEB-138 result<br/>Determines next axis"]
Loading

Grey Divider

File Changes

1. docs/findings/2026-04-19-zeb-139-kl-retrofit.md 📝 Documentation +166/-0

Complete ZEB-139 findings with attractor verdict and confound analysis

• New comprehensive findings document (166 lines) documenting complete ZEB-139 experiment results
• 4-cell matrix results showing router-on KL+CE cells converged to identical val_loss (4.5636 vs
 4.5637)
• Forensic analysis revealing cross_run_cos=+0.9999 between real and shuffled oracle routers,
 confirming cheap-win confound
• Five fingerprint metrics all confirm attractor HOLDS, with detailed threshold comparisons to
 ZEB-136
• KL trajectory analysis showing monotonic decrease (1.27→1.22 nats) but content-independent
 convergence
• Open questions section with recommendations for λ-sweep, same-arch teacher follow-up, and
 multi-layer W_align exploration
• Operational notes documenting stale-checkout issue and resolution procedure

docs/findings/2026-04-19-zeb-139-kl-retrofit.md


2. docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md 📝 Documentation +1/-1

Spec status update to COMPLETE with findings reference

• Updated STATUS header from DRAFT to COMPLETE with link to findings document
• Added verdict summary: attractor HOLDS at λ=0.5 with cross_run_cos=+0.9999 confirmation
• Documented cheap-win confound mechanism and matrix cell assignment
• Noted pending ZEB-138 result dependency for outer matrix completion

docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 20, 2026

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Local-only artifact references 🐞 Bug ⚙ Maintainability
Description
The findings doc hard-codes artifacts and helper scripts as living under an author-specific absolute
path, which makes the write-up non-reproducible for other readers and future operators. As written,
the doc can’t be followed from a clean repo checkout without out-of-band knowledge of that local
directory layout.
Code

docs/findings/2026-04-19-zeb-139-kl-retrofit.md[R147-158]

+## Artifacts
+
+All under `/home/zebli/work/LOCAL/zeb139/`:
+
+- **Oracle**: `artifacts/oracle_tinyllama_10k.safetensors` (4.9 MB, [10K, 128] f32) and `_shuffled_seed0.safetensors`
+- **Teacher-logits sidecar**: `artifacts/oracle_tinyllama_10k_teacher_logits.safetensors` (611 MB, [10K, 32K] bf16) and `_shuffled_seed0_teacher_logits.safetensors`
+- **Stats**: `artifacts/oracle_tinyllama_10k.safetensors.stats.json` (PCA explained variance, populated rows, hash seeds)
+- **Per-cell training logs (CSV)**: `logs/run{1..4}_*.csv` (200 rows × 36 columns each, including the new `kl_loss` column)
+- **Per-cell checkpoints**: `checkpoints/zeb139_router_{off,on}_{real,shuf}{,_kl}/checkpoint.pt`
+- **Forensic outputs**: `forensics/router_off_no_kl.txt` (full 10-probe battery, pair A) and `forensics/router_on_kl.txt` (skip-to-logit diagnostics, pair B)
+- **Scripts**: `scripts/shuffle_oracle_and_sidecar.py`, `scripts/run_4cell_matrix.sh` (cells 1-4), `scripts/run_cells_3_and_4.sh` (re-run after stale-checkout fix), `scripts/run_forensics.sh`
+
Evidence
The findings doc states that all artifacts/scripts are under a machine-local absolute path, and then
lists paths that are not anchored to the repository (e.g., scripts/*.sh). That makes the
documented artifacts location and reproduction steps dependent on a specific workstation filesystem
rather than the repo itself.

docs/findings/2026-04-19-zeb-139-kl-retrofit.md[147-159]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The findings document hard-codes an author-specific absolute path (`/home/...`) for artifacts and lists helper scripts as if they’re accessible from that location. This makes the doc non-reproducible for other readers working from a clean repo checkout.

### Issue Context
This PR is a docs-only update; the primary value of the change is long-term auditability/reproducibility of ZEB-139.

### Fix Focus Areas
- docs/findings/2026-04-19-zeb-139-kl-retrofit.md[147-159]

### Suggested changes
- Replace `/home/zebli/work/LOCAL/zeb139/` with either:
 - a repo-relative path convention (e.g. `./LOCAL/zeb139/` with a short note “not checked into git”), or
 - a stable artifact location (e.g. internal blob store/S3 URL) if that’s the intended sharing mechanism.
- For each listed artifact/script, clarify whether it is:
 - generated by commands runnable from this repo (include the exact commands), or
 - a local-only helper (state explicitly and avoid implying it exists in-repo).
- If the helper scripts are meant to be shared/reused, add them to the repository under an appropriate directory (e.g. `training/scripts/` or `scripts/`) and update the doc to reference the in-repo paths.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Ambiguous probe module path 🐞 Bug ⚙ Maintainability
Description
The findings doc references the forensic probe as scripts.probe_skip_to_logit, but the
repository’s probe lives under training/scripts/probe_skip_to_logit.py, making the invocation
ambiguous from repo root. This can cause operators to fail to locate/run the probe when reproducing
the reported metrics.
Code

docs/findings/2026-04-19-zeb-139-kl-retrofit.md[68]

+Pulled from `forensics/router_on_kl.txt` (probe: `scripts.probe_skip_to_logit`):
Evidence
The findings doc names the probe as scripts.probe_skip_to_logit, while the repo contains the probe
script under training/scripts/probe_skip_to_logit.py, and that script expects repo-relative
imports managed by adding the training/ directory onto sys.path. Without clarifying the working
directory / invocation, readers may not be able to run the probe as referenced.

docs/findings/2026-04-19-zeb-139-kl-retrofit.md[68-69]
training/scripts/probe_skip_to_logit.py[1-34]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The doc references the probe as `scripts.probe_skip_to_logit`, but the actual script is located at `training/scripts/probe_skip_to_logit.py`. Without a clear invocation, reproducing the forensics becomes error-prone.

### Issue Context
`training/scripts/probe_skip_to_logit.py` manipulates `sys.path` relative to the `training/` directory, so the command used to run it matters.

### Fix Focus Areas
- docs/findings/2026-04-19-zeb-139-kl-retrofit.md[68-69]

### Suggested changes
- Update the probe reference to a repo-root runnable command, for example:
 - `python training/scripts/probe_skip_to_logit.py --real-ckpt ...` 
 - (or) `cd training && python -m scripts.probe_skip_to_logit ...`
- If you keep the module-style name (`scripts.probe_skip_to_logit`), explicitly document the required working directory (`cd training`) so readers can reproduce consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Apr 20, 2026
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 20, 2026

User description

Summary

Findings doc + spec status update for ZEB-139. The 4-cell matrix run completed; verdict is attractor HOLDS under λ=0.5 KL+CE on the cross-arch TinyLlama setup.

TL;DR

  • Cells 1+2 (no-router baseline) reproduce ZEB-136 to 4 decimals (val_loss=4.5546). Sanity check passes.
  • Cells 3+4 (router-on, KL=0.5, real vs shuffled) produced val_loss=4.5636 vs 4.5637, Δ-diff = −0.0001 nats — essentially zero.
  • Forensic: cross_run_cos engram_logits = +0.9999 between cells 3 and 4. Smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution.
  • All five spec §7 fingerprint thresholds say the attractor HOLDS:
Metric Threshold (broken if…) ZEB-136 (no KL) ZEB-139 (KL=0.5)
engram_logit_entropy < log V − 0.1 = 10.27 10.3735 10.3467
α outside [0.14, 0.20] 0.1644 0.1762
Cross-run cosine < 0.7 +0.7979 +0.9999
Δ-diff ≥ +0.001 nats +0.0002 −0.0001
W_align ‖·‖_F > 2× init 1.91 1.35

What this contributes to the bigger picture

Per spec §11 outer matrix, ZEB-139 fills the "KL-retrofit Holds" row. ZEB-138 (same-arch teacher, CE-only) is the orthogonal axis still pending KRILE's Harmony-474M handoff. Once both rows land:

ZEB-139 ZEB-138 Combined
Holds (this PR) Breaks Teacher-arch dominates; pursue same-arch + capgap, deprioritize KL+CE
Holds (this PR) Holds Structural ceiling at 40M (Gemini §7 steelman); next axis is multi-layer non-linear W_align or backbone unfreezing

Bonus diagnostic (worth flagging)

Notably, max LM-head row cos jumped from 0.22 (ZEB-136 without KL) to 0.78 (ZEB-139 with KL). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos=1.0). The KL term is doing what it was designed to do (push the router toward the teacher's distribution); the teacher's distribution just turns out to be roughly position-independent at the corpus average, so the result is content-blind. A future variant could try a temperature on the router-side softmax (spec §12 q2) or a per-token KL mask that down-weights frequent-token positions to force per-position attention.

Operational note (worth capturing for next operator)

First matrix attempt failed at cell 3 because the local main repo dir was checked out on the stale zeblith/zeb-138-same-arch-teacher branch (predates PR #257 by several commits). The venv's ct87 editable install therefore imported a train.py without the new --engram-skip-to-logit / --kl-lambda flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently from zeta_ctrl_2048, so no chaining was lost). Doc captures the recipe so the next operator doesn't repeat.

Test plan

  • Findings doc renders cleanly (markdown headers, tables, code blocks)
  • All numbers in the doc trace to the source artifacts (val_loss from CSVs / training logs, fingerprint metrics from forensics/router_on_kl.txt, ZEB-136 baselines from /home/zebli/work/LOCAL/zeb136/)
  • Spec doc STATUS line updated from "DRAFT — blocked on PR feat(oracle): harmony:<path> teacher URI for ZEB-138 same-arch oracle extraction #254" to "COMPLETE — see findings"

Doc-only PR; no code changes, no tests to run.

🤖 Generated with Claude Code


Note

Low Risk
Low risk because this PR only adds/updates documentation; no runtime code paths change. Review risk is limited to correctness/clarity of the recorded experimental results and conclusions.

Overview
Documents the completed ZEB-139 KL-retrofit experiment by adding a new findings report (docs/findings/2026-04-19-zeb-139-kl-retrofit.md) with the 4-cell matrix results, key metrics, and the conclusion that the attractor holds under KL(P_router || P_teacher) at λ=0.5 (including evidence of the cheap-win confound via near-identical real vs shuffled runs).

Updates the ZEB-139 design spec status from draft to complete, linking to the findings and summarizing the locked-in verdict for the spec §11 outcome matrix pending ZEB-138.

Reviewed by Cursor Bugbot for commit 9900737. Bugbot is set up for automated code reviews on this repo. Configure here.


CodeAnt-AI Description

Mark the ZEB-139 spec as complete and add the final findings

What Changed

  • Updated the ZEB-139 design spec from draft to complete and linked it to the findings document
  • Added a new findings report showing that λ=0.5 KL+CE did not break the attractor on the cross-arch TinyLlama setup
  • Recorded the key outcome that real and shuffled runs converged to nearly identical router outputs, pointing to the same content-independent pattern
  • Included the run matrix, checks, artifacts, and next-step notes in the new findings doc

Impact

✅ Clearer experiment status
✅ Faster access to the final ZEB-139 result
✅ Easier follow-up on the next experiment path

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 20, 2026

CodeAnt AI finished reviewing your PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/findings/2026-04-19-zeb-139-kl-retrofit.md`:
- Line 61: The sentence claiming "to 4 decimal places" is inaccurate for the
comparison 4.5546 vs 4.5544; update the wording in that sentence (the one that
mentions 4.5546 / 4.5544) to either "within 0.0002" or "to 3 decimal places"
(e.g., replace "to 4 decimal places" with "within 0.0002") so the baseline
reproducibility claim is numerically precise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: df753444-0555-4d53-bd46-05a0ea539b70

📥 Commits

Reviewing files that changed from the base of the PR and between e239720 and 9900737.

📒 Files selected for processing (2)
  • docs/findings/2026-04-19-zeb-139-kl-retrofit.md
  • docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🧰 Additional context used
🪛 LanguageTool
docs/findings/2026-04-19-zeb-139-kl-retrofit.md

[style] ~87-~87: Consider using a different adverb to strengthen your wording.
Context: ...ator: when two router models trained on completely different per-position teacher targets ...

(COMPLETELY_ENTIRELY)

🔇 Additional comments (2)
docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md (1)

3-3: Status update and quantitative verdict are clear and well-anchored.

This succinctly captures completion state, key metric (cross_run_cos), and how the result maps into §11.

docs/findings/2026-04-19-zeb-139-kl-retrofit.md (1)

79-90: Fingerprint interpretation and cheap-win confound conclusion are well-supported.

The threshold table plus cross_run_cos = +0.9999 provides a clear, evidence-based discriminator outcome.


**Two observations from the matrix alone**:

1. The router-off baseline reproduces ZEB-136's cells 1+2 to 4 decimal places (4.5546 vs ZEB-136's 4.5546 / 4.5544). Sanity check passes — the data path and frozen-backbone init are unchanged.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix precision wording in the baseline reproducibility claim.

Line 61 says “to 4 decimal places,” but 4.5546 vs 4.5544 differs at the 4th decimal. Suggest rewording to “within 0.0002” or “to 3 decimal places” for exactness.

✏️ Proposed doc fix
-1. The router-off baseline reproduces ZEB-136's cells 1+2 to 4 decimal places (4.5546 vs ZEB-136's 4.5546 / 4.5544). Sanity check passes — the data path and frozen-backbone init are unchanged.
+1. The router-off baseline closely reproduces ZEB-136's cells 1+2 (4.5546 vs ZEB-136's 4.5546 / 4.5544; max Δ=0.0002). Sanity check passes — the data path and frozen-backbone init are unchanged.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
1. The router-off baseline reproduces ZEB-136's cells 1+2 to 4 decimal places (4.5546 vs ZEB-136's 4.5546 / 4.5544). Sanity check passes — the data path and frozen-backbone init are unchanged.
1. The router-off baseline closely reproduces ZEB-136's cells 1+2 (4.5546 vs ZEB-136's 4.5546 / 4.5544; max Δ=0.0002). Sanity check passes — the data path and frozen-backbone init are unchanged.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/findings/2026-04-19-zeb-139-kl-retrofit.md` at line 61, The sentence
claiming "to 4 decimal places" is inaccurate for the comparison 4.5546 vs
4.5544; update the wording in that sentence (the one that mentions 4.5546 /
4.5544) to either "within 0.0002" or "to 3 decimal places" (e.g., replace "to 4
decimal places" with "within 0.0002") so the baseline reproducibility claim is
numerically precise.

…closed

Per spec §12 q1, ran cells 3+4 again at --kl-lambda 0.9 to nail down
the λ-sensitivity signal before retiring the KL-only experimental
axis at 40M. Same setup as λ=0.5 except for the λ value and output
paths; ~30 min wall time.

Headline numbers:

  λ          Cell 3 (real)    Cell 4 (shuf)    Δ-diff       vs ZEB-136
  0 (none)   4.5545           4.5543           +0.0002      —
  0.5        4.5636           4.5637           −0.0001      +0.009
  0.9        4.5907           4.5912           −0.0005      +0.036

Two clean monotonic patterns:
1. Higher λ → val_loss strictly worse. KL pressure increasingly
   hurts the LM objective.
2. Δ-diff stays at noise across all λ values. No content-dependence
   emerges no matter how hard we crank KL.

Forensic fingerprint (skip-to-logit probe at λ=0.9):
  cross_run_cos engram_logits  =  +1.0000  (was +0.9999 at λ=0.5)
  max LM-head row |cos|        =   0.9257  (was 0.78 at λ=0.5)
  ||W_align||_F                 =   0.58    (was 1.35 at λ=0.5)
  engram_logit_entropy          =  10.3039  (was 10.3467; still
                                            well above 10.27 break
                                            threshold)
  alpha                         =   0.1762  (saturated, λ-independent
                                            above 0.5)

cross_run_cos = +1.0000 between real-oracle and shuffled-oracle cells
at λ=0.9 is the dispositive cheap-win signature. Higher KL pressure
intensifies the lever rather than escapes the attractor.

Curious side observation: g5 (L5 engram gate alpha) flipped sign
between λ=0.5 (+0.40) and λ=0.9 (-0.41). Different optimization
regime, same content-blind destination — suggests the "match the
corpus average" attractor is robust across optimizer trajectories.

Doc updates:
- TL;DR mentions both λ values now; net verdict unchanged
- Open question §1 (λ-sweep) marked DONE, points at the new section
- New section "λ=0.9 closure run" with full λ-sweep matrix, fingerprint
  comparison, and the optimization-regime observation
- Artifacts section lists the new λ=0.9 checkpoints, CSVs, forensic
  output, and run script

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/findings/2026-04-19-zeb-139-kl-retrofit.md (1)

63-63: ⚠️ Potential issue | 🟡 Minor

Precision wording is still numerically inaccurate.

“to 4 decimal places” is not correct for 4.5546 vs 4.5544; use “within 0.0002” (or “to 3 decimal places”).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/findings/2026-04-19-zeb-139-kl-retrofit.md` at line 63, The wording "to
4 decimal places" is numerically incorrect for the values 4.5546 vs 4.5544; edit
the sentence that currently reads "to 4 decimal places" (the router-off baseline
reproduction line) and replace it with either "within 0.0002" or "to 3 decimal
places" so the precision claim matches the actual numeric difference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/findings/2026-04-19-zeb-139-kl-retrofit.md`:
- Around line 218-229: Replace the hard-coded absolute paths under
/home/zebli/work/LOCAL/zeb139/ with repo-relative paths (e.g., artifacts/... ,
logs/... , checkpoints/... , forensics/... , scripts/...) or use an
environment-placeholder like ${ZEB139_ROOT} for the top-level directory
referenced in this document; update the listed entries such as
artifacts/oracle_tinyllama_10k.safetensors,
artifacts/oracle_tinyllama_10k_teacher_logits.safetensors, logs/run{1..4}_*.csv,
checkpoints/zeb139_router_{off,on}_..., forensics/router_on_kl.txt and scripts/*
accordingly and add one short note that a local mount point (e.g.,
${ZEB139_ROOT} -> /home/zebli/work/LOCAL/zeb139) may be required for reproducing
locally.

---

Duplicate comments:
In `@docs/findings/2026-04-19-zeb-139-kl-retrofit.md`:
- Line 63: The wording "to 4 decimal places" is numerically incorrect for the
values 4.5546 vs 4.5544; edit the sentence that currently reads "to 4 decimal
places" (the router-off baseline reproduction line) and replace it with either
"within 0.0002" or "to 3 decimal places" so the precision claim matches the
actual numeric difference.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40dff53e-645d-4a4b-a57a-b17c55248a90

📥 Commits

Reviewing files that changed from the base of the PR and between 9900737 and b4a371b.

📒 Files selected for processing (1)
  • docs/findings/2026-04-19-zeb-139-kl-retrofit.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cursor Bugbot
🧰 Additional context used
🪛 LanguageTool
docs/findings/2026-04-19-zeb-139-kl-retrofit.md

[style] ~89-~89: Consider using a different adverb to strengthen your wording.
Context: ...ator: when two router models trained on completely different per-position teacher targets ...

(COMPLETELY_ENTIRELY)

Comment on lines +218 to +229
All under `/home/zebli/work/LOCAL/zeb139/`:

- **Oracle**: `artifacts/oracle_tinyllama_10k.safetensors` (4.9 MB, [10K, 128] f32) and `_shuffled_seed0.safetensors`
- **Teacher-logits sidecar**: `artifacts/oracle_tinyllama_10k_teacher_logits.safetensors` (611 MB, [10K, 32K] bf16) and `_shuffled_seed0_teacher_logits.safetensors`
- **Stats**: `artifacts/oracle_tinyllama_10k.safetensors.stats.json` (PCA explained variance, populated rows, hash seeds)
- **Per-cell training logs (CSV)**: `logs/run{1..4}_*.csv` (λ=0.5) and `logs/run{3,4}_router_on_{real,shuf}_kl09.csv` (λ=0.9 closure), 200 rows × 36 columns each, including the `kl_loss` column
- **Per-cell checkpoints**: `checkpoints/zeb139_router_{off,on}_{real,shuf}{,_kl,_kl09}/checkpoint.pt`
- **Forensic outputs**: `forensics/router_off_no_kl.txt` (full 10-probe battery, pair A), `forensics/router_on_kl.txt` (skip-to-logit diagnostics, λ=0.5), `forensics/router_on_kl09.txt` (skip-to-logit diagnostics, λ=0.9 closure)
- **Scripts**: `scripts/shuffle_oracle_and_sidecar.py`, `scripts/run_4cell_matrix.sh` (cells 1-4), `scripts/run_cells_3_and_4.sh` (re-run after stale-checkout fix), `scripts/run_cells_3_and_4_lambda09.sh` (λ=0.9 closure), `scripts/run_forensics.sh`

ZEB-136's prior forensics (`/home/zebli/work/LOCAL/zeb136/forensics/router_on.txt`) are the direct comparison point for the ZEB-139 (KL+CE) vs ZEB-136 (CE-only) contrast.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use repo-relative or environment-agnostic artifact paths.

Hard-coded local absolute paths make the findings harder to reproduce for other operators. Prefer repo-relative paths (or a ${ZEB139_ROOT} placeholder) and one short note for the local mount.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/findings/2026-04-19-zeb-139-kl-retrofit.md` around lines 218 - 229,
Replace the hard-coded absolute paths under /home/zebli/work/LOCAL/zeb139/ with
repo-relative paths (e.g., artifacts/... , logs/... , checkpoints/... ,
forensics/... , scripts/...) or use an environment-placeholder like
${ZEB139_ROOT} for the top-level directory referenced in this document; update
the listed entries such as artifacts/oracle_tinyllama_10k.safetensors,
artifacts/oracle_tinyllama_10k_teacher_logits.safetensors, logs/run{1..4}_*.csv,
checkpoints/zeb139_router_{off,on}_..., forensics/router_on_kl.txt and scripts/*
accordingly and add one short note that a local mount point (e.g.,
${ZEB139_ROOT} -> /home/zebli/work/LOCAL/zeb139) may be required for reproducing
locally.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

CodeAnt AI is running the review.

@codeant-ai codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Apr 28, 2026
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

User description

Summary

Findings doc + spec status update for ZEB-139. The 4-cell matrix run completed; verdict is attractor HOLDS under λ=0.5 KL+CE on the cross-arch TinyLlama setup.

TL;DR

  • Cells 1+2 (no-router baseline) reproduce ZEB-136 to 4 decimals (val_loss=4.5546). Sanity check passes.
  • Cells 3+4 (router-on, KL=0.5, real vs shuffled) produced val_loss=4.5636 vs 4.5637, Δ-diff = −0.0001 nats — essentially zero.
  • Forensic: cross_run_cos engram_logits = +0.9999 between cells 3 and 4. Smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution.
  • All five spec §7 fingerprint thresholds say the attractor HOLDS:
Metric Threshold (broken if…) ZEB-136 (no KL) ZEB-139 (KL=0.5)
engram_logit_entropy < log V − 0.1 = 10.27 10.3735 10.3467
α outside [0.14, 0.20] 0.1644 0.1762
Cross-run cosine < 0.7 +0.7979 +0.9999
Δ-diff ≥ +0.001 nats +0.0002 −0.0001
W_align ‖·‖_F > 2× init 1.91 1.35

What this contributes to the bigger picture

Per spec §11 outer matrix, ZEB-139 fills the "KL-retrofit Holds" row. ZEB-138 (same-arch teacher, CE-only) is the orthogonal axis still pending KRILE's Harmony-474M handoff. Once both rows land:

ZEB-139 ZEB-138 Combined
Holds (this PR) Breaks Teacher-arch dominates; pursue same-arch + capgap, deprioritize KL+CE
Holds (this PR) Holds Structural ceiling at 40M (Gemini §7 steelman); next axis is multi-layer non-linear W_align or backbone unfreezing

Bonus diagnostic (worth flagging)

Notably, max LM-head row cos jumped from 0.22 (ZEB-136 without KL) to 0.78 (ZEB-139 with KL). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos=1.0). The KL term is doing what it was designed to do (push the router toward the teacher's distribution); the teacher's distribution just turns out to be roughly position-independent at the corpus average, so the result is content-blind. A future variant could try a temperature on the router-side softmax (spec §12 q2) or a per-token KL mask that down-weights frequent-token positions to force per-position attention.

Operational note (worth capturing for next operator)

First matrix attempt failed at cell 3 because the local main repo dir was checked out on the stale zeblith/zeb-138-same-arch-teacher branch (predates PR #257 by several commits). The venv's ct87 editable install therefore imported a train.py without the new --engram-skip-to-logit / --kl-lambda flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently from zeta_ctrl_2048, so no chaining was lost). Doc captures the recipe so the next operator doesn't repeat.

Test plan

  • Findings doc renders cleanly (markdown headers, tables, code blocks)
  • All numbers in the doc trace to the source artifacts (val_loss from CSVs / training logs, fingerprint metrics from forensics/router_on_kl.txt, ZEB-136 baselines from /home/zebli/work/LOCAL/zeb136/)
  • Spec doc STATUS line updated from "DRAFT — blocked on PR feat(oracle): harmony:<path> teacher URI for ZEB-138 same-arch oracle extraction #254" to "COMPLETE — see findings"

Doc-only PR; no code changes, no tests to run.

🤖 Generated with Claude Code


Note

Low Risk
Low risk because this PR only adds/updates documentation and does not change runtime code paths or data handling.

Overview
Adds a new findings writeup docs/findings/2026-04-19-zeb-139-kl-retrofit.md documenting the completed ZEB-139 4-cell KL+CE experiment (including a λ=0.9 closure run) and its key outcome: the maximum-entropy attractor holds with evidence of a content-independent “cheap-win” collapse.

Updates the ZEB-139 design spec docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md status from draft/blocked to complete, linking to the findings and summarizing the final verdict + discriminator metric.

Reviewed by Cursor Bugbot for commit b4a371b. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive experiment findings document detailing methodology, results matrix, diagnostic metrics, and forensic artifacts for the KL-retrofit study.
    • Updated the experimental specification from draft to complete with quantitative validation (near-identical cross-run behavior) and closure notes for the KL-retrofit axis, plus recommended next steps.

CodeAnt-AI Description

Mark ZEB-139 as complete and add the experiment findings

What Changed

  • Updated the ZEB-139 design spec from draft to complete and linked it to the findings report
  • Added a new findings document with the final result: KL+CE at λ=0.5 did not escape the attractor on the TinyLlama setup
  • Included the follow-up λ=0.9 run, which showed the same content-blind outcome and no benefit from stronger KL pressure
  • Recorded the key user-facing takeaway: the KL-only path is closed, and the output stays the same even when the teacher data is shuffled

Impact

✅ Clearer experiment status
✅ Faster decision on the KL-only path
✅ Fewer follow-up runs on the same dead-end setup

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

Sequence Diagram

This PR documents the completed KL-retrofit experiment: generating teacher-logits sidecars, running a 4-cell matrix with router on and off under KL+CE, and concluding that the engram attractor holds and the KL-only axis at 40M is closed.

sequenceDiagram
    participant Researcher
    participant ExperimentRunner
    participant TeacherModel
    participant Metrics

    Researcher->>ExperimentRunner: Launch KL retrofit runs with lambda values
    ExperimentRunner->>TeacherModel: Generate oracle and teacher logits sidecar
    ExperimentRunner->>ExperimentRunner: Train 4 cell matrix (router off and on, real and shuffled)
    ExperimentRunner->>Metrics: Collect validation loss and router fingerprints
    Metrics-->>ExperimentRunner: Report matching real and shuffled behavior and high cross run cosine
    ExperimentRunner-->>Researcher: Conclude attractor holds and KL only axis is closed
Loading

Generated by CodeAnt AI

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

CodeAnt AI finished running the review.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

CodeAnt AI is running the review.

@codeant-ai codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Apr 28, 2026
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

User description

Summary

Findings doc + spec status update for ZEB-139. The 4-cell matrix run completed; verdict is attractor HOLDS under λ=0.5 KL+CE on the cross-arch TinyLlama setup.

TL;DR

  • Cells 1+2 (no-router baseline) reproduce ZEB-136 to 4 decimals (val_loss=4.5546). Sanity check passes.
  • Cells 3+4 (router-on, KL=0.5, real vs shuffled) produced val_loss=4.5636 vs 4.5637, Δ-diff = −0.0001 nats — essentially zero.
  • Forensic: cross_run_cos engram_logits = +0.9999 between cells 3 and 4. Smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution.
  • All five spec §7 fingerprint thresholds say the attractor HOLDS:
Metric Threshold (broken if…) ZEB-136 (no KL) ZEB-139 (KL=0.5)
engram_logit_entropy < log V − 0.1 = 10.27 10.3735 10.3467
α outside [0.14, 0.20] 0.1644 0.1762
Cross-run cosine < 0.7 +0.7979 +0.9999
Δ-diff ≥ +0.001 nats +0.0002 −0.0001
W_align ‖·‖_F > 2× init 1.91 1.35

What this contributes to the bigger picture

Per spec §11 outer matrix, ZEB-139 fills the "KL-retrofit Holds" row. ZEB-138 (same-arch teacher, CE-only) is the orthogonal axis still pending KRILE's Harmony-474M handoff. Once both rows land:

ZEB-139 ZEB-138 Combined
Holds (this PR) Breaks Teacher-arch dominates; pursue same-arch + capgap, deprioritize KL+CE
Holds (this PR) Holds Structural ceiling at 40M (Gemini §7 steelman); next axis is multi-layer non-linear W_align or backbone unfreezing

Bonus diagnostic (worth flagging)

Notably, max LM-head row cos jumped from 0.22 (ZEB-136 without KL) to 0.78 (ZEB-139 with KL). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos=1.0). The KL term is doing what it was designed to do (push the router toward the teacher's distribution); the teacher's distribution just turns out to be roughly position-independent at the corpus average, so the result is content-blind. A future variant could try a temperature on the router-side softmax (spec §12 q2) or a per-token KL mask that down-weights frequent-token positions to force per-position attention.

Operational note (worth capturing for next operator)

First matrix attempt failed at cell 3 because the local main repo dir was checked out on the stale zeblith/zeb-138-same-arch-teacher branch (predates PR #257 by several commits). The venv's ct87 editable install therefore imported a train.py without the new --engram-skip-to-logit / --kl-lambda flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently from zeta_ctrl_2048, so no chaining was lost). Doc captures the recipe so the next operator doesn't repeat.

Test plan

  • Findings doc renders cleanly (markdown headers, tables, code blocks)
  • All numbers in the doc trace to the source artifacts (val_loss from CSVs / training logs, fingerprint metrics from forensics/router_on_kl.txt, ZEB-136 baselines from /home/zebli/work/LOCAL/zeb136/)
  • Spec doc STATUS line updated from "DRAFT — blocked on PR feat(oracle): harmony:<path> teacher URI for ZEB-138 same-arch oracle extraction #254" to "COMPLETE — see findings"

Doc-only PR; no code changes, no tests to run.

🤖 Generated with Claude Code


Note

Low Risk
Low risk because this PR only adds/updates documentation and does not change runtime code paths or data handling.

Overview
Adds a new findings writeup docs/findings/2026-04-19-zeb-139-kl-retrofit.md documenting the completed ZEB-139 4-cell KL+CE experiment (including a λ=0.9 closure run) and its key outcome: the maximum-entropy attractor holds with evidence of a content-independent “cheap-win” collapse.

Updates the ZEB-139 design spec docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md status from draft/blocked to complete, linking to the findings and summarizing the final verdict + discriminator metric.

Reviewed by Cursor Bugbot for commit b4a371b. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive experiment findings document detailing methodology, results matrix, diagnostic metrics, and forensic artifacts for the KL-retrofit study.
    • Updated the experimental specification from draft to complete with quantitative validation (near-identical cross-run behavior) and closure notes for the KL-retrofit axis, plus recommended next steps.

CodeAnt-AI Description

Mark the ZEB-139 KL-retrofit experiment complete with final findings

What Changed

  • Added the full ZEB-139 findings report, including the final verdict that the KL+CE setup did not escape the attractor on the cross-arch TinyLlama run
  • Recorded the key outcome that real and shuffled teacher targets produced nearly identical results, showing the KL signal collapsed into a content-blind average
  • Updated the design spec status from draft to complete and linked it to the findings report

Impact

✅ Clearer experiment status
✅ Faster review of ZEB-139 results
✅ Easier follow-up on the next experiment axis

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

Sequence Diagram

This PR documents the completed KL-retrofit experiment: generating teacher logits, running the 4-cell KL+CE training matrix, probing router behavior, and concluding that the maximum-entropy attractor still holds.

sequenceDiagram
    participant Researcher
    participant OracleJob
    participant TrainingMatrix
    participant Forensics
    participant SpecDoc

    Researcher->>OracleJob: Generate oracle and teacher logits sidecar
    Researcher->>TrainingMatrix: Run 4-cell KL plus CE training with real and shuffled sidecars
    TrainingMatrix-->>Forensics: Emit val loss and router outputs for all cells
    Forensics->>Forensics: Compute cross run cosine and fingerprint metrics
    Forensics-->>Researcher: Verdict attractor holds under KL
    Researcher->>SpecDoc: Update spec status and record ZEB-139 findings
Loading

Generated by CodeAnt AI

Comment on lines +83 to +84
| `engram_logit_entropy` | < log(V) − 0.1 = 10.27 | 10.3735 (= log V) | **10.3467** | **HOLDS** (Δ from log V = 0.027, well above the 0.1 threshold for "broken") |
| `α` | outside [0.14, 0.20] | 0.1644 | **0.1762** | **HOLDS** (still inside attractor band) |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Architect Review — HIGH

The entropy-threshold explanation is directionally wrong: the table marks "broken if engram_logit_entropy < log(V) − 0.1 = 10.27", but the narrative calls Δ from log V = 0.027 "well above" the 0.1 break threshold, inverting the inequality and mis-stating what counts as a break.

Suggestion: Reword the verdict text so it correctly states that Δ=0.027 is well below the 0.1 break threshold (or equivalently that entropy must stay >10.27 to hold) and apply the same convention consistently, including in the λ=0.9 entropy row.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** docs/findings/2026-04-19-zeb-139-kl-retrofit.md
**Line:** 83:84
**Comment:**
	*HIGH: The entropy-threshold explanation is directionally wrong: the table marks "broken if engram_logit_entropy < log(V) − 0.1 = 10.27", but the narrative calls Δ from log V = 0.027 "well above" the 0.1 break threshold, inverting the inequality and mis-stating what counts as a break.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 28, 2026

CodeAnt AI finished running the review.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

CodeAnt AI is running the review.

@codeant-ai codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Apr 29, 2026
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

User description

Summary

Findings doc + spec status update for ZEB-139. The 4-cell matrix run completed; verdict is attractor HOLDS under λ=0.5 KL+CE on the cross-arch TinyLlama setup.

TL;DR

  • Cells 1+2 (no-router baseline) reproduce ZEB-136 to 4 decimals (val_loss=4.5546). Sanity check passes.
  • Cells 3+4 (router-on, KL=0.5, real vs shuffled) produced val_loss=4.5636 vs 4.5637, Δ-diff = −0.0001 nats — essentially zero.
  • Forensic: cross_run_cos engram_logits = +0.9999 between cells 3 and 4. Smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution.
  • All five spec §7 fingerprint thresholds say the attractor HOLDS:
Metric Threshold (broken if…) ZEB-136 (no KL) ZEB-139 (KL=0.5)
engram_logit_entropy < log V − 0.1 = 10.27 10.3735 10.3467
α outside [0.14, 0.20] 0.1644 0.1762
Cross-run cosine < 0.7 +0.7979 +0.9999
Δ-diff ≥ +0.001 nats +0.0002 −0.0001
W_align ‖·‖_F > 2× init 1.91 1.35

What this contributes to the bigger picture

Per spec §11 outer matrix, ZEB-139 fills the "KL-retrofit Holds" row. ZEB-138 (same-arch teacher, CE-only) is the orthogonal axis still pending KRILE's Harmony-474M handoff. Once both rows land:

ZEB-139 ZEB-138 Combined
Holds (this PR) Breaks Teacher-arch dominates; pursue same-arch + capgap, deprioritize KL+CE
Holds (this PR) Holds Structural ceiling at 40M (Gemini §7 steelman); next axis is multi-layer non-linear W_align or backbone unfreezing

Bonus diagnostic (worth flagging)

Notably, max LM-head row cos jumped from 0.22 (ZEB-136 without KL) to 0.78 (ZEB-139 with KL). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos=1.0). The KL term is doing what it was designed to do (push the router toward the teacher's distribution); the teacher's distribution just turns out to be roughly position-independent at the corpus average, so the result is content-blind. A future variant could try a temperature on the router-side softmax (spec §12 q2) or a per-token KL mask that down-weights frequent-token positions to force per-position attention.

Operational note (worth capturing for next operator)

First matrix attempt failed at cell 3 because the local main repo dir was checked out on the stale zeblith/zeb-138-same-arch-teacher branch (predates PR #257 by several commits). The venv's ct87 editable install therefore imported a train.py without the new --engram-skip-to-logit / --kl-lambda flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently from zeta_ctrl_2048, so no chaining was lost). Doc captures the recipe so the next operator doesn't repeat.

Test plan

  • Findings doc renders cleanly (markdown headers, tables, code blocks)
  • All numbers in the doc trace to the source artifacts (val_loss from CSVs / training logs, fingerprint metrics from forensics/router_on_kl.txt, ZEB-136 baselines from /home/zebli/work/LOCAL/zeb136/)
  • Spec doc STATUS line updated from "DRAFT — blocked on PR feat(oracle): harmony:<path> teacher URI for ZEB-138 same-arch oracle extraction #254" to "COMPLETE — see findings"

Doc-only PR; no code changes, no tests to run.

🤖 Generated with Claude Code


Note

Low Risk
Low risk because this PR only adds/updates documentation and does not change runtime code paths or data handling.

Overview
Adds a new findings writeup docs/findings/2026-04-19-zeb-139-kl-retrofit.md documenting the completed ZEB-139 4-cell KL+CE experiment (including a λ=0.9 closure run) and its key outcome: the maximum-entropy attractor holds with evidence of a content-independent “cheap-win” collapse.

Updates the ZEB-139 design spec docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md status from draft/blocked to complete, linking to the findings and summarizing the final verdict + discriminator metric.

Reviewed by Cursor Bugbot for commit b4a371b. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive experiment findings document detailing methodology, results matrix, diagnostic metrics, and forensic artifacts for the KL-retrofit study.
    • Updated the experimental specification from draft to complete with quantitative validation (near-identical cross-run behavior) and closure notes for the KL-retrofit axis, plus recommended next steps.

CodeAnt-AI Description

Mark the ZEB-139 KL-retrofit experiment as complete and record the final findings

What Changed

  • Added the final findings document for ZEB-139 with the experiment setup, results, and verdict
  • Reports that KL+CE at λ=0.5 did not break the attractor: real and shuffled runs ended with nearly identical loss and router outputs
  • Adds the λ=0.9 closure run, showing the same content-blind outcome with worse loss at higher KL pressure
  • Updates the design spec status to complete and points to the new findings doc

Impact

✅ Clearer ZEB-139 experiment status
✅ Faster review of the final KL-retrofit verdict
✅ Easier comparison of real vs shuffled training results

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

Sequence Diagram

This PR documents the completed ZEB-139 KL-retrofit experiment, where KL+CE training with teacher logits is run and analyzed, concluding the attractor still holds and updating the design spec status to complete.

sequenceDiagram
    participant Researcher
    participant TrainingPipeline
    participant MetricsProbe
    participant SpecDoc

    Researcher->>TrainingPipeline: Run ZEB-139 KL retrofit matrix with teacher logits
    TrainingPipeline-->>MetricsProbe: Output losses and router traces for all cells
    MetricsProbe->>MetricsProbe: Compare real vs shuffled cells and compute fingerprints
    MetricsProbe-->>Researcher: Conclude attractor holds and KL only axis closed
    Researcher->>SpecDoc: Record findings and mark spec complete
Loading

Generated by CodeAnt AI

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

CodeAnt AI finished running the review.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

CodeAnt AI is running the review.

@codeant-ai codeant-ai Bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Apr 29, 2026
@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

User description

Summary

Findings doc + spec status update for ZEB-139. The 4-cell matrix run completed; verdict is attractor HOLDS under λ=0.5 KL+CE on the cross-arch TinyLlama setup.

TL;DR

  • Cells 1+2 (no-router baseline) reproduce ZEB-136 to 4 decimals (val_loss=4.5546). Sanity check passes.
  • Cells 3+4 (router-on, KL=0.5, real vs shuffled) produced val_loss=4.5636 vs 4.5637, Δ-diff = −0.0001 nats — essentially zero.
  • Forensic: cross_run_cos engram_logits = +0.9999 between cells 3 and 4. Smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution.
  • All five spec §7 fingerprint thresholds say the attractor HOLDS:
Metric Threshold (broken if…) ZEB-136 (no KL) ZEB-139 (KL=0.5)
engram_logit_entropy < log V − 0.1 = 10.27 10.3735 10.3467
α outside [0.14, 0.20] 0.1644 0.1762
Cross-run cosine < 0.7 +0.7979 +0.9999
Δ-diff ≥ +0.001 nats +0.0002 −0.0001
W_align ‖·‖_F > 2× init 1.91 1.35

What this contributes to the bigger picture

Per spec §11 outer matrix, ZEB-139 fills the "KL-retrofit Holds" row. ZEB-138 (same-arch teacher, CE-only) is the orthogonal axis still pending KRILE's Harmony-474M handoff. Once both rows land:

ZEB-139 ZEB-138 Combined
Holds (this PR) Breaks Teacher-arch dominates; pursue same-arch + capgap, deprioritize KL+CE
Holds (this PR) Holds Structural ceiling at 40M (Gemini §7 steelman); next axis is multi-layer non-linear W_align or backbone unfreezing

Bonus diagnostic (worth flagging)

Notably, max LM-head row cos jumped from 0.22 (ZEB-136 without KL) to 0.78 (ZEB-139 with KL). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos=1.0). The KL term is doing what it was designed to do (push the router toward the teacher's distribution); the teacher's distribution just turns out to be roughly position-independent at the corpus average, so the result is content-blind. A future variant could try a temperature on the router-side softmax (spec §12 q2) or a per-token KL mask that down-weights frequent-token positions to force per-position attention.

Operational note (worth capturing for next operator)

First matrix attempt failed at cell 3 because the local main repo dir was checked out on the stale zeblith/zeb-138-same-arch-teacher branch (predates PR #257 by several commits). The venv's ct87 editable install therefore imported a train.py without the new --engram-skip-to-logit / --kl-lambda flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently from zeta_ctrl_2048, so no chaining was lost). Doc captures the recipe so the next operator doesn't repeat.

Test plan

  • Findings doc renders cleanly (markdown headers, tables, code blocks)
  • All numbers in the doc trace to the source artifacts (val_loss from CSVs / training logs, fingerprint metrics from forensics/router_on_kl.txt, ZEB-136 baselines from /home/zebli/work/LOCAL/zeb136/)
  • Spec doc STATUS line updated from "DRAFT — blocked on PR feat(oracle): harmony:<path> teacher URI for ZEB-138 same-arch oracle extraction #254" to "COMPLETE — see findings"

Doc-only PR; no code changes, no tests to run.

🤖 Generated with Claude Code


Note

Low Risk
Low risk because this PR only adds/updates documentation and does not change runtime code paths or data handling.

Overview
Adds a new findings writeup docs/findings/2026-04-19-zeb-139-kl-retrofit.md documenting the completed ZEB-139 4-cell KL+CE experiment (including a λ=0.9 closure run) and its key outcome: the maximum-entropy attractor holds with evidence of a content-independent “cheap-win” collapse.

Updates the ZEB-139 design spec docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md status from draft/blocked to complete, linking to the findings and summarizing the final verdict + discriminator metric.

Reviewed by Cursor Bugbot for commit b4a371b. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive experiment findings document detailing methodology, results matrix, diagnostic metrics, and forensic artifacts for the KL-retrofit study.
    • Updated the experimental specification from draft to complete with quantitative validation (near-identical cross-run behavior) and closure notes for the KL-retrofit axis, plus recommended next steps.

CodeAnt-AI Description

Mark the ZEB-139 KL-retrofit spec as complete and add the findings report

What Changed

  • Updated the ZEB-139 design spec status from draft/blocked to complete and linked it to the final findings doc.
  • Added a new findings report with the experiment outcome, showing that KL+CE at λ=0.5 and λ=0.9 still stays in the attractor and does not separate real vs shuffled teacher targets.
  • Recorded the key user-facing conclusion for the project: the KL-only path at 40M is closed, and the next step depends on the pending ZEB-138 result.

Impact

✅ Clearer experiment status
✅ Faster review of ZEB-139 results
✅ Quicker next-step planning for the 40M KL path

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

Sequence Diagram

This PR documents the completed ZEB-139 experiment, where KL+CE training is applied to router cells with real and shuffled teacher logits to test whether the engram attractor can be broken; the findings show both cells collapse to the same content-blind distribution, so the attractor holds and the KL-only axis is closed.

sequenceDiagram
    participant Researcher
    participant Experiment
    participant TeacherModel
    participant Router
    participant Metrics

    Researcher->>Experiment: Launch ZEB-139 KL+CE 4-cell matrix
    Experiment->>TeacherModel: Load oracle and teacher logits sidecars (real and shuffled)
    Experiment->>Router: Train router with CE and KL router teacher objective
    Experiment->>Router: Run real oracle cell and shuffled oracle cell
    Router-->>Metrics: Report router outputs and validation losses
    Metrics-->>Researcher: cross_run_cos near 1 and matched losses (attractor holds, KL axis closed)
Loading

Generated by CodeAnt AI

@codeant-ai
Copy link
Copy Markdown

codeant-ai Bot commented Apr 29, 2026

CodeAnt AI finished running the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant