From 99007376b2a9aab2e8f12c18627d970ec1d3ce28 Mon Sep 17 00:00:00 2001 From: Jake Englund Date: Sun, 19 Apr 2026 21:50:15 -0700 Subject: [PATCH 1/2] =?UTF-8?q?docs(zeb-139):=20KL-retrofit=20findings=20?= =?UTF-8?q?=E2=80=94=20attractor=20holds=20at=20=CE=BB=3D0.5=20(cheap-win?= =?UTF-8?q?=20confound)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ZEB-139 4-cell matrix run on TinyLlama oracle + sidecar produced by PR #255. KL+CE training (λ=0.5) at cells 3+4 did NOT escape the maximum-entropy attractor. cross_run_cos between real-oracle and shuffled-oracle KL+router cells = +0.9999 — the smoking gun for the cheap-win confound: KL forces both routers to the same content-independent average distribution rather than learning per-position content routing. Per spec §11 outer matrix: ZEB-139 contribution is "Holds". Combined with ZEB-138's pending verdict, points to either teacher-arch dominance (if ZEB-138 breaks) or the structural-ceiling steelman (if ZEB-138 also holds). Sanity checks all passed: - Cell 1+2 (no-router baseline) reproduces ZEB-136's val_loss to 4 decimals (4.5546 vs ZEB-136's 4.5546 / 4.5544) - Oracle PCA explained_variance_ratio_total = 0.9338690864205668, bit-identical to ZEB-136's stored value (proves the GPU-side index_add_ accumulator from PR #255's perf fix produces the same Welford means as the original CPU path) - Sidecar shape (10000, 32000) bf16, 10000/10000 rows populated, shape-matched to engram_table Operational note in the doc: first matrix attempt failed at cell 3 because the local main repo dir was on a stale branch (zeblith/zeb-138-same-arch-teacher); the venv's ct87 editable install therefore imported a train.py without the new flags. Resolved by git checkout main && git pull, then re-ran cells 3+4 only (each cell init's independently, no chaining was lost). Doc captures the recipe to avoid recurrence. Spec doc STATUS line updated from "DRAFT — blocked on PR #254" to "COMPLETE — see findings". Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-19-zeb-139-kl-retrofit.md | 166 ++++++++++++++++++ .../2026-04-18-zeb-139-kl-retrofit-design.md | 2 +- 2 files changed, 167 insertions(+), 1 deletion(-) create mode 100644 docs/findings/2026-04-19-zeb-139-kl-retrofit.md diff --git a/docs/findings/2026-04-19-zeb-139-kl-retrofit.md b/docs/findings/2026-04-19-zeb-139-kl-retrofit.md new file mode 100644 index 00000000..f474608a --- /dev/null +++ b/docs/findings/2026-04-19-zeb-139-kl-retrofit.md @@ -0,0 +1,166 @@ +# ZEB-139 — KL-Retrofit Objective-Axis Experiment Findings + +**Date:** 2026-04-19 +**Linear:** [ZEB-139](https://linear.app/zeblith/issue/ZEB-139/kl-retrofit-experiment-objective-axis-diagnostic-for-engram-attractor) +**Spec:** `docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md` +**Foundation:** [PR #257](https://github.com/zeblithic/harmony/pull/257) (ZEB-134 revival + ZEB-139 KL+CE) and [PR #255](https://github.com/zeblithic/harmony/pull/255) (`--save-teacher-logits` sidecar producer) +**Prior art:** [ZEB-134](https://linear.app/zeblith/issue/ZEB-134) (Skip-to-Logit, CE-only, attractor observed), [ZEB-136](https://linear.app/zeblith/issue/ZEB-136) (TinyLlama cross-arch, attractor observed) + +--- + +## TL;DR + +**Adding a Memory-Decoder-style `KL(P_router || P_teacher)` term at λ=0.5 did NOT escape the maximum-entropy attractor on the cross-arch TinyLlama setup.** Both the real-oracle and shuffled-oracle KL+router cells converged to essentially identical val_loss (4.5636 vs 4.5637, Δ-diff = -0.0001 nats) and produced a router output with `cross_run_cos = +0.9999` between the two cells — the smoking gun for the cheap-win confound. KL forced both routers to the same content-independent average distribution rather than learning per-position content routing. + +**Per spec §11 outer matrix this is the "KL-retrofit attractor HOLDS" outcome.** Combined with whatever ZEB-138 produces on the orthogonal teacher-architecture axis, it points toward either teacher-arch dominance (if ZEB-138 breaks) or a structural ceiling at 40M (if ZEB-138 also holds — Gemini §7 steelman). + +--- + +## Setup + +### Code prereqs (both merged to main 2026-04-19) + +- **PR #255**: `--save-teacher-logits` flag added to `generate_oracle_table.py`. Welford-means the teacher's full LM-head outputs (vocab=32000) keyed by the same xxhash row indices as the existing oracle. Sidecar is `[10K, 32K] bf16` ≈ 640MB. Throughput recovered from a 24× regression via a GPU-side `index_add_` accumulator (CPU `np.add.at` on a `[10K, 32K] f64` master is fundamentally bandwidth-bound at ~10 GB/s). +- **PR #257**: ZEB-134's `SkipToLogitEngramRouter` revived (W_align d_model→d_model + log_alpha scalar + frozen LM-head reuse), and ZEB-139's KL+CE wired into `train.py`: `--kl-lambda` + `--oracle-teacher-logits` flags, per-token-normalized `F.kl_div(log_p_teacher, log_p_router, log_target=True).sum(-1).mean()` (the `log_target=True` form gets the spec'd FORWARD KL `KL(P_router || P_teacher)` direction; PyTorch's `F.kl_div(input, target)` computes `KL(target || input)` per its `target * (log target − input)` formula). + +### Teacher-logits sidecar extraction + +Re-ran `generate_oracle_table.py --save-teacher-logits` on the same TinyLlama-1.1B teacher + 99M-token FineWeb-Edu-POC corpus that ZEB-136 used. Wall time **5.8 hours** at 4,771 tok/s sustained on the 5080 (~14% slower than ZEB-136's no-sidecar 5,017 tok/s baseline — the GPU-resident `[10K, 32K] f64` sum table + bf16→fp32 cast per chunk accounts for the gap). + +Sanity check: `pca_explained_variance_ratio_total = 0.9338690864205668`, **bit-identical to ZEB-136's stored value**. Confirms the GPU-side `SumAccumulatorTable`/`GpuSumAccumulatorTable` math is equivalent to the original CPU `WelfordTable` for the hidden-state path (proves the perf-optimization didn't change numerics). + +Shuffled artifacts produced by a single `torch.randperm(seed=0)` applied to BOTH the oracle (`engram.weight`) and the sidecar (`teacher_logits.weight`) — same permutation across both files so cell-4's per-position teacher target is independently scrambled in both the engram-emb path AND the KL target path. + +### 4-cell matrix configuration + +Identical to ZEB-136's `run_4cell_matrix.sh` for cells 1+2 (router-off baselines), with `--engram-skip-to-logit --engram-skip-alpha-init 0.1 --kl-lambda 0.5 --oracle-teacher-logits …` added to cells 3+4. + +| Cell | Router | Oracle | Teacher-logits sidecar | KL term | +| --- | --- | --- | --- | --- | +| 1 | off | real | — | off | +| 2 | off | shuf (seed=0) | — | off | +| 3 | on | real | real | λ=0.5 | +| 4 | on | shuf (seed=0) | shuf (seed=0, same perm) | λ=0.5 | + +Each cell init's from `zeta_ctrl_2048/checkpoint.pt` (the same backbone-frozen baseline ZEB-136 used) with `--allow-partial-init` so the new `engram_skip_router.W_align` (zero) and `engram_skip_router.log_alpha` (= log(0.1)) start from their constructor's safe-init values. 2000 steps each, batch=4, seq=2048, bf16 mixed precision, `--engram-vcontrast` + `--engram-qdiv` aux losses still active per spec §4.2. + +--- + +## Results + +### val_loss matrix (final, step 2000) + +| | Real oracle | Shuffled oracle | Δ-diff (real − shuf) | +| --- | --- | --- | --- | +| **Router off, KL off** (cells 1, 2) | 4.5546 | 4.5546 | **+0.0000** | +| **Router on, KL on** (cells 3, 4) | 4.5636 | 4.5637 | **−0.0001** | +| Δ vs no-router baseline | +0.0090 | +0.0091 | — | + +**Two observations from the matrix alone**: + +1. The router-off baseline reproduces ZEB-136's cells 1+2 to 4 decimal places (4.5546 vs ZEB-136's 4.5546 / 4.5544). Sanity check passes — the data path and frozen-backbone init are unchanged. +2. The router-on KL+CE cells got val_loss ~0.009 nats *worse* than the no-router baseline, with cell 3 vs cell 4 essentially identical. This is the inverse of what would constitute a positive ZEB-139 result. + +### Cell 3 vs Cell 4 fingerprint (the discriminator) + +Per spec §11's intra-experiment discriminator table, cell 3 vs cell 4 separates "KL signal is content-dependent" (real teacher info actually helping, the clean positive result) from "KL signal is content-independent" (KL forcing sharp output regardless of input — the cheap-win confound). + +Pulled from `forensics/router_on_kl.txt` (probe: `scripts.probe_skip_to_logit`): + +```text +real: log_alpha=-1.7360 alpha=exp=0.1762 ||W_align||_F=1.3461 +shuf: log_alpha=-1.7342 alpha=exp=0.1765 ||W_align||_F=1.3471 + +cross_run_cos engram_logits = +0.9999 +max LM-head row |cos| = 0.7779 +engram_logit_entropy (nats) = 10.3467 (log(vocab) = 10.3735) +``` + +| Fingerprint metric | Spec §7 threshold (broken if…) | ZEB-136 (no KL) | ZEB-139 (KL=0.5, real) | Verdict | +| --- | --- | --- | --- | --- | +| `engram_logit_entropy` | < log(V) − 0.1 = 10.27 | 10.3735 (= log V) | **10.3467** | **HOLDS** (Δ from log V = 0.027, well above the 0.1 threshold for "broken") | +| `α` | outside [0.14, 0.20] | 0.1644 | **0.1762** | **HOLDS** (still inside attractor band) | +| Cross-run cosine (real vs shuf, router) | < 0.7 | +0.7979 | **+0.9999** | **HOLDS** + WORSE — KL drove the two routers to converge | +| Δ-diff (real − shuf val_loss) | ≥ +0.001 nats | +0.0002 | **−0.0001** | **HOLDS** + slight reverse | +| `W_align` Frobenius drift | > 2× init (init = 0) | 1.91 | **1.35** | **HOLDS** + smaller — KL kept W_align contained | + +**All five thresholds say the attractor HOLDS.** And `cross_run_cos = +0.9999` is the dispositive result for the cheap-win discriminator: when two router models trained on completely different per-position teacher targets (real vs shuffled-via-permutation) end up producing essentially the same output distribution to 4-decimal cosine, the model is matching SOMETHING content-independent — almost certainly the corpus-wide token-frequency average that the Welford-mean teacher logits encode after enough position averaging. + +That same `cross_run_cos` jumping from 0.80 (no KL) to 1.00 (with KL=0.5) is the mechanism: KL pressure pulls both routers to the SAME target distribution. The "real" and "shuf" sidecars contain the same set of per-row teacher distributions just at different row indices — the KL signal therefore averages out to "match the corpus distribution somehow", which is identical regardless of how rows are permuted. + +### KL trajectory + +From the per-step CSV logs (`run3_router_on_real_kl.csv`): + +```text +step 0 loss=2.9147 kl_loss=1.2697 alpha=0.10 W_align=0 (init) +step 300 loss=3.1555 kl_loss=1.2705 (essentially flat — W_align still ~0, gradient through alpha is zero by construction) +step 600 loss=2.9575 kl_loss=1.2705 (alpha=0.1, W_align starting to grow under small alpha gradient) +step 900 loss=2.9254 kl_loss=1.2678 (KL begins moving) +step 1200 loss=2.9449 kl_loss=1.2587 +step 1500 loss=2.9853 kl_loss=1.2453 +step 1800 loss=2.9885 kl_loss=1.2212 +final loss=4.5636 (val) KL trajectory: 1.27 → 1.22 nats over 2000 steps (Δ = −0.05 nats) +``` + +The KL did decrease monotonically — the router IS learning to better match the teacher distribution. But the magnitude is small (~4% relative drop) and the destination is content-independent: cell 4 (shuffled sidecar) shows the IDENTICAL trajectory (`kl_loss` 1.2709 → 1.2209 in the same number of steps). Both runs are converging toward the same "average TinyLlama distribution" target, which is not what the experiment hoped to find. + +### Pair A baseline forensic (router-off cells) + +Reproduces ZEB-136's standard η-B capgap battery on cells 1+2 (full output in `forensics/router_off_no_kl.txt`). All ten ZEB-130 probes (D/P/E/M/C/W/A/X/Q-overlap/V-rank) within noise of ZEB-136's prior values; cross-run cos at L2 = +0.87 / L5 = +0.80 (matching ZEB-136's known content-poor baseline). Confirms the no-router data path didn't drift between ZEB-136 and ZEB-139. + +--- + +## Verdict matrix (this experiment × ZEB-138) + +Per spec §11: + +| This (KL-retrofit) | ZEB-138 (same-arch teacher, CE-only) | Combined interpretation | +| --- | --- | --- | +| Holds | Holds | **Structural ceiling confirmed** at 40M (Gemini §7 steelman) — neither objective shift nor teacher-arch shift escapes the attractor; the 40M frozen-backbone linear pipeline is the binding constraint. Multi-layer non-linear `W_align` OR end-to-end retraining without freezing is the recommended next axis. | +| Holds | Breaks | **Teacher-arch dominates, objective insufficient** — ZEB-138's same-arch decode break is the load-bearing axis; pursue same-arch teacher + capgap as the substrate, deprioritize KL+CE. | + +ZEB-138's verdict is pending KRILE's Harmony-474M handoff and the corresponding 4-cell run on AVALON. **ZEB-139's contribution to the matrix is now locked in as "Holds".** + +--- + +## Open questions and next-step recommendations + +### 1. λ-sweep (spec §12 question 1) + +The spec says "If no break at 0.5, try 0.9 once before concluding." This is worth doing for completeness, but the `cross_run_cos = +0.9999` result strongly suggests a higher λ would just intensify the convergence to the average distribution — it cranks up the same lever that's already saturating. **Recommendation: run a single λ=0.9 cell-3 + cell-4 pair (~30 min on AVALON) to nail down the λ-sensitivity signal, then close the door on the KL-only axis at 40M.** + +### 2. Same-arch teacher + KL (spec §10 follow-up) + +The 2×2's fourth cell (same-arch teacher AND KL term) is contingent on either ZEB-139 or ZEB-138 yielding signal. Since ZEB-139 didn't, and ZEB-138 is pending, this remains "wait for ZEB-138." If ZEB-138 also holds, the 2×2 is closed (structural ceiling) and same-arch+KL becomes redundant. If ZEB-138 breaks, same-arch+KL becomes the natural follow-up to test whether KL adds to the same-arch signal. + +### 3. The Gemini §7 steelman — multi-layer non-linear `W_align` + +If ZEB-138 also holds, the Gemini Deep Research findings (§7) recommend abandoning the single-layer-linear `W_align` in favor of either a multi-layer non-linear projection (more capacity in the alignment path) or unfreezing the backbone (resolves the "frozen 40M can't decode high-dim teacher features" steelman). Both are substantially more invasive than ZEB-139 was. Multi-layer `W_align` is probably the cheaper try-first. + +### 4. Diagnostic bonus: KL trajectory IS learning, just not usefully + +Worth flagging that KL did monotonically decrease (1.27 → 1.22) and `max LM-head row cos` jumped from 0.22 (ZEB-136) to 0.78 (ZEB-139). The router IS aligning with vocab directions — it's just aligning ALL positions with the SAME average direction (cross_run_cos = 1.0). A future variant could try a temperature on the router-side softmax (spec §12 question 2) or a per-token KL mask that down-weights frequent-token positions — both might force the router to pay attention to per-position content rather than averaging it out. These are speculative; the cleaner next move is the λ-sweep + ZEB-138 result. + +--- + +## Artifacts + +All under `/home/zebli/work/LOCAL/zeb139/`: + +- **Oracle**: `artifacts/oracle_tinyllama_10k.safetensors` (4.9 MB, [10K, 128] f32) and `_shuffled_seed0.safetensors` +- **Teacher-logits sidecar**: `artifacts/oracle_tinyllama_10k_teacher_logits.safetensors` (611 MB, [10K, 32K] bf16) and `_shuffled_seed0_teacher_logits.safetensors` +- **Stats**: `artifacts/oracle_tinyllama_10k.safetensors.stats.json` (PCA explained variance, populated rows, hash seeds) +- **Per-cell training logs (CSV)**: `logs/run{1..4}_*.csv` (200 rows × 36 columns each, including the new `kl_loss` column) +- **Per-cell checkpoints**: `checkpoints/zeb139_router_{off,on}_{real,shuf}{,_kl}/checkpoint.pt` +- **Forensic outputs**: `forensics/router_off_no_kl.txt` (full 10-probe battery, pair A) and `forensics/router_on_kl.txt` (skip-to-logit diagnostics, pair B) +- **Scripts**: `scripts/shuffle_oracle_and_sidecar.py`, `scripts/run_4cell_matrix.sh` (cells 1-4), `scripts/run_cells_3_and_4.sh` (re-run after stale-checkout fix), `scripts/run_forensics.sh` + +ZEB-136's prior forensics (`/home/zebli/work/LOCAL/zeb136/forensics/router_on.txt`) are the direct comparison point for the ZEB-139 (KL+CE) vs ZEB-136 (CE-only) contrast. + +--- + +## Operational notes + +- The 4-cell matrix's first attempt failed at cell 3 because the local main repo dir was checked out on `zeblith/zeb-138-same-arch-teacher` (stale, predates PR #257). The venv's `ct87` editable install therefore imported a `train.py` without the `--engram-skip-to-logit` / `--kl-lambda` flags. Cells 1+2 succeeded incidentally (no-router code path is identical across branches). Resolved by `git checkout main && git pull` and re-running cells 3+4 only (each cell init's independently from `zeta_ctrl_2048`, so no chaining was lost). +- Total wall time for the experiment: ~6h oracle extraction + ~30 min cells 1+2 + ~30 min cells 3+4 + a few min for forensics. The spec §8 estimate of "4-6h end-to-end" was off by ~3× on the oracle extraction (the new logits-Welford accumulator is the dominant cost); the matrix + forensics matched spec. diff --git a/docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md b/docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md index bca83f3f..d0981357 100644 --- a/docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md +++ b/docs/superpowers/specs/2026-04-18-zeb-139-kl-retrofit-design.md @@ -1,6 +1,6 @@ # ZEB-139 — KL-Retrofit Objective-Axis Experiment (Design Spec) -> **STATUS: DRAFT — blocked on PR #254 merge + teacher-logits extension.** This spec is written during the ZEB-137/138 wait so the experiment can launch immediately once the prereq PRs land on main. AVALON can execute end-to-end in ~4-6h once unblocked. +> **STATUS: COMPLETE — see findings at `docs/findings/2026-04-19-zeb-139-kl-retrofit.md`.** Verdict: **attractor HOLDS** under λ=0.5 KL+CE on the cross-arch TinyLlama setup. `cross_run_cos = +0.9999` between the real-oracle and shuffled-oracle KL+router cells confirms the cheap-win confound (KL forces both routers to the same content-independent average distribution). Combined with ZEB-138's pending verdict, this fills the "KL-retrofit Holds" cell of the spec §11 outer matrix. **Linear:** [ZEB-139](https://linear.app/zeblith/issue/ZEB-139/kl-retrofit-experiment-objective-axis-diagnostic-for-engram-attractor) **Parent:** [ZEB-102](https://linear.app/zeblith/issue/ZEB-102) From b4a371b9d9f11dd9c3c7402ce26148388c77b237 Mon Sep 17 00:00:00 2001 From: Jake Englund Date: Sun, 19 Apr 2026 23:05:10 -0700 Subject: [PATCH 2/2] =?UTF-8?q?docs(zeb-139):=20=CE=BB=3D0.9=20closure=20r?= =?UTF-8?q?un=20results=20=E2=80=94=20KL-only=20axis=20definitively=20clos?= =?UTF-8?q?ed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per spec §12 q1, ran cells 3+4 again at --kl-lambda 0.9 to nail down the λ-sensitivity signal before retiring the KL-only experimental axis at 40M. Same setup as λ=0.5 except for the λ value and output paths; ~30 min wall time. Headline numbers: λ Cell 3 (real) Cell 4 (shuf) Δ-diff vs ZEB-136 0 (none) 4.5545 4.5543 +0.0002 — 0.5 4.5636 4.5637 −0.0001 +0.009 0.9 4.5907 4.5912 −0.0005 +0.036 Two clean monotonic patterns: 1. Higher λ → val_loss strictly worse. KL pressure increasingly hurts the LM objective. 2. Δ-diff stays at noise across all λ values. No content-dependence emerges no matter how hard we crank KL. Forensic fingerprint (skip-to-logit probe at λ=0.9): cross_run_cos engram_logits = +1.0000 (was +0.9999 at λ=0.5) max LM-head row |cos| = 0.9257 (was 0.78 at λ=0.5) ||W_align||_F = 0.58 (was 1.35 at λ=0.5) engram_logit_entropy = 10.3039 (was 10.3467; still well above 10.27 break threshold) alpha = 0.1762 (saturated, λ-independent above 0.5) cross_run_cos = +1.0000 between real-oracle and shuffled-oracle cells at λ=0.9 is the dispositive cheap-win signature. Higher KL pressure intensifies the lever rather than escapes the attractor. Curious side observation: g5 (L5 engram gate alpha) flipped sign between λ=0.5 (+0.40) and λ=0.9 (-0.41). Different optimization regime, same content-blind destination — suggests the "match the corpus average" attractor is robust across optimizer trajectories. Doc updates: - TL;DR mentions both λ values now; net verdict unchanged - Open question §1 (λ-sweep) marked DONE, points at the new section - New section "λ=0.9 closure run" with full λ-sweep matrix, fingerprint comparison, and the optimization-regime observation - Artifacts section lists the new λ=0.9 checkpoints, CSVs, forensic output, and run script Co-Authored-By: Claude Opus 4.7 (1M context) --- .../2026-04-19-zeb-139-kl-retrofit.md | 81 +++++++++++++++++-- 1 file changed, 75 insertions(+), 6 deletions(-) diff --git a/docs/findings/2026-04-19-zeb-139-kl-retrofit.md b/docs/findings/2026-04-19-zeb-139-kl-retrofit.md index f474608a..e5646139 100644 --- a/docs/findings/2026-04-19-zeb-139-kl-retrofit.md +++ b/docs/findings/2026-04-19-zeb-139-kl-retrofit.md @@ -12,6 +12,8 @@ **Adding a Memory-Decoder-style `KL(P_router || P_teacher)` term at λ=0.5 did NOT escape the maximum-entropy attractor on the cross-arch TinyLlama setup.** Both the real-oracle and shuffled-oracle KL+router cells converged to essentially identical val_loss (4.5636 vs 4.5637, Δ-diff = -0.0001 nats) and produced a router output with `cross_run_cos = +0.9999` between the two cells — the smoking gun for the cheap-win confound. KL forced both routers to the same content-independent average distribution rather than learning per-position content routing. +**The λ=0.9 closure run (spec §12 q1) confirmed the verdict** with stronger signature: `cross_run_cos = +1.0000` (rounds to bit-exact), val_loss got *worse* (+0.027 nats over λ=0.5; +0.036 nats over the no-KL baseline), and the optimization shifted regime (g5 alpha goes negative under λ=0.9 vs positive under λ=0.5) but produced the same content-blind result. Higher KL pressure intensifies the cheap-win lever rather than escaping the attractor. **The KL-only axis at 40M is closed.** + **Per spec §11 outer matrix this is the "KL-retrofit attractor HOLDS" outcome.** Combined with whatever ZEB-138 produces on the orthogonal teacher-architecture axis, it points toward either teacher-arch dominance (if ZEB-138 breaks) or a structural ceiling at 40M (if ZEB-138 also holds — Gemini §7 steelman). --- @@ -126,9 +128,12 @@ ZEB-138's verdict is pending KRILE's Harmony-474M handoff and the corresponding ## Open questions and next-step recommendations -### 1. λ-sweep (spec §12 question 1) +### 1. λ-sweep (spec §12 question 1) — DONE, see "λ=0.9 closure run" section below -The spec says "If no break at 0.5, try 0.9 once before concluding." This is worth doing for completeness, but the `cross_run_cos = +0.9999` result strongly suggests a higher λ would just intensify the convergence to the average distribution — it cranks up the same lever that's already saturating. **Recommendation: run a single λ=0.9 cell-3 + cell-4 pair (~30 min on AVALON) to nail down the λ-sensitivity signal, then close the door on the KL-only axis at 40M.** +The spec said "If no break at 0.5, try 0.9 once before concluding." Completed +2026-04-19; results in the new "λ=0.9 closure run" section. Net: the higher λ +intensified the cheap-win signature exactly as predicted (`cross_run_cos` +1.0000, val_loss worse). KL-only axis at 40M is closed. ### 2. Same-arch teacher + KL (spec §10 follow-up) @@ -144,6 +149,70 @@ Worth flagging that KL did monotonically decrease (1.27 → 1.22) and `max LM-he --- +## λ=0.9 closure run + +Per spec §12 q1 ("if no break at 0.5, try 0.9 once before concluding"), reran +cells 3+4 with `--kl-lambda 0.9`. Same setup otherwise (same data, same +checkpoints-init-from, same seeds, same code). Wall time ~30 min. + +### Full λ-sweep matrix (cells 3+4 only — cells 1+2 are router-off, λ-independent) + +| λ | Cell 3 (real) | Cell 4 (shuf) | Δ-diff (real − shuf) | Δ vs no-KL baseline (cell 3) | +| --- | --- | --- | --- | --- | +| 0 (ZEB-136 router-on) | 4.5545 | 4.5543 | +0.0002 | — | +| 0.5 | 4.5636 | 4.5637 | −0.0001 | +0.009 | +| **0.9** | **4.5907** | **4.5912** | **−0.0005** | **+0.036** | + +**Two clean monotonic patterns**: +1. Higher λ → val_loss strictly worse. KL pressure increasingly hurts the LM + objective at every step up the λ ladder. +2. Δ-diff stays at noise across all λ values. **No content-dependence emerges + regardless of how hard we crank the KL lever.** This is the dispositive + answer to spec §12 q1. + +### Forensic fingerprint comparison (skip-to-logit probe) + +Pulled from `forensics/router_on_kl09.txt`: + +```text +real: log_alpha=-1.7362 alpha=exp=0.1762 ||W_align||_F=0.5786 +shuf: log_alpha=-1.7377 alpha=exp=0.1759 ||W_align||_F=0.5709 + +cross_run_cos engram_logits = +1.0000 +max LM-head row |cos| = 0.9257 +engram_logit_entropy (nats) = 10.3039 (log(vocab) = 10.3735) +``` + +| Metric | ZEB-136 (no KL) | λ=0.5 | **λ=0.9** | Trend | +| --- | --- | --- | --- | --- | +| `α` (real) | 0.1644 | 0.1762 | **0.1762** | Saturated in attractor band, λ-independent above 0.5 | +| `‖W_align‖_F` (real) | 1.91 | 1.35 | **0.58** | Monotonically smaller — KL keeps the projection more contained at higher λ | +| `cross_run_cos` | +0.7979 | +0.9999 | **+1.0000** | Higher λ → more perfect collapse to identical content-blind output | +| `max LM-head row \|cos\|` | 0.22 | 0.78 | **0.93** | Router aligns with one average LM-head direction more strongly | +| `engram_logit_entropy` (Δ from log V) | 0.0000 | 0.027 | **0.069** | Slowly moving away from log V but still well above the 0.1 break threshold | + +### Optimization-regime shift, same destination + +One curiosity: the L5 engram gate's behavior changes sign across λ. At λ=0.5, +g5 (the post-tanh L5 gate alpha) grew positive (+0.40 at step 1800). At +λ=0.9, g5 went negative (-0.41 at step 1500). The router under λ=0.9 is +*subtracting* the L5 engram contribution from the hidden state instead of +adding it — a different optimization regime entirely. Yet both regimes land +at the same content-independent average distribution at the router's output +(`cross_run_cos` = +1.0000). This further suggests the destination ("match +the corpus average teacher distribution") is robust across optimizer +trajectories, and that varying λ just changes HOW the model gets to the +same useless attractor, not WHETHER. + +### Verdict + +KL-only axis at 40M is **definitively closed**. Higher λ intensifies the +cheap-win lever but does not unlock content routing. ZEB-139's row of the +spec §11 outer matrix is locked in as "Holds". Next move depends on +ZEB-138's verdict (see open question §2 above). + +--- + ## Artifacts All under `/home/zebli/work/LOCAL/zeb139/`: @@ -151,10 +220,10 @@ All under `/home/zebli/work/LOCAL/zeb139/`: - **Oracle**: `artifacts/oracle_tinyllama_10k.safetensors` (4.9 MB, [10K, 128] f32) and `_shuffled_seed0.safetensors` - **Teacher-logits sidecar**: `artifacts/oracle_tinyllama_10k_teacher_logits.safetensors` (611 MB, [10K, 32K] bf16) and `_shuffled_seed0_teacher_logits.safetensors` - **Stats**: `artifacts/oracle_tinyllama_10k.safetensors.stats.json` (PCA explained variance, populated rows, hash seeds) -- **Per-cell training logs (CSV)**: `logs/run{1..4}_*.csv` (200 rows × 36 columns each, including the new `kl_loss` column) -- **Per-cell checkpoints**: `checkpoints/zeb139_router_{off,on}_{real,shuf}{,_kl}/checkpoint.pt` -- **Forensic outputs**: `forensics/router_off_no_kl.txt` (full 10-probe battery, pair A) and `forensics/router_on_kl.txt` (skip-to-logit diagnostics, pair B) -- **Scripts**: `scripts/shuffle_oracle_and_sidecar.py`, `scripts/run_4cell_matrix.sh` (cells 1-4), `scripts/run_cells_3_and_4.sh` (re-run after stale-checkout fix), `scripts/run_forensics.sh` +- **Per-cell training logs (CSV)**: `logs/run{1..4}_*.csv` (λ=0.5) and `logs/run{3,4}_router_on_{real,shuf}_kl09.csv` (λ=0.9 closure), 200 rows × 36 columns each, including the `kl_loss` column +- **Per-cell checkpoints**: `checkpoints/zeb139_router_{off,on}_{real,shuf}{,_kl,_kl09}/checkpoint.pt` +- **Forensic outputs**: `forensics/router_off_no_kl.txt` (full 10-probe battery, pair A), `forensics/router_on_kl.txt` (skip-to-logit diagnostics, λ=0.5), `forensics/router_on_kl09.txt` (skip-to-logit diagnostics, λ=0.9 closure) +- **Scripts**: `scripts/shuffle_oracle_and_sidecar.py`, `scripts/run_4cell_matrix.sh` (cells 1-4), `scripts/run_cells_3_and_4.sh` (re-run after stale-checkout fix), `scripts/run_cells_3_and_4_lambda09.sh` (λ=0.9 closure), `scripts/run_forensics.sh` ZEB-136's prior forensics (`/home/zebli/work/LOCAL/zeb136/forensics/router_on.txt`) are the direct comparison point for the ZEB-139 (KL+CE) vs ZEB-136 (CE-only) contrast.