Skip to content

fix(soundness): bind ShardRam y-sign to is_global_write#1344

Open
spherel wants to merge 3 commits into
masterfrom
fix/issue-1338-shard-ram-y-sign
Open

fix(soundness): bind ShardRam y-sign to is_global_write#1344
spherel wants to merge 3 commits into
masterfrom
fix/issue-1338-shard-ram-y-sign

Conversation

@spherel
Copy link
Copy Markdown
Member

@spherel spherel commented May 25, 2026

Problem

Issue #1338 reproduces a soundness break on master. For the same RISC-V
execution, the base verifier and the recursion verifier both accept two
distinct proof batches whose public per-shard shard_rw_sum values differ
on all 17 shards. The attacker takes an honest witness, replaces every
cross-shard EC accumulator leaf (x, y) with its inverse (x, -y),
updates shard_rw_sum, and reproves.

Root cause: ceno_zkvm/src/tables/shard_ram.rs:276-281 was a TODO. The
host code in ShardRamRecord::to_ec_point encodes read vs write in the
sign of y[6], but the circuit only constrained the curve equation and
the EC sum — never tying y[6]'s half-of-field to is_global_write.
Both (x, y) and (x, -y) satisfied every existing check, so the public
summary of cross-shard RAM flow was unbound.

The defect survives recursion (the reporter's PoC verifies through the
recursion verifier program).

Design Rationale

Approach borrows the idea from SP1's
crates/core/machine/src/operations/global_interaction.rs:210-236,
not its column layout. Three pieces:

  1. Offset by +1. Express y[6] in terms of a fresh witness y6_lo
    so y[6] = 0 is never valid in either branch (it is invariant under the negate operation, thus make it impossible to distinguish read and write).
  2. Safe band + prover retry. Restrict y6_lo to [0, (p-1)/2). For
    the rare exception y[6] = 0 (probability ~1/p ≈ 2^-31 per record)
    the host rejects and retries with a new nonce.
  3. Byte-decomposition range check. y6_lo decomposed into four byte
    limbs b0..b3 (assert_byte for b0..b2, lookup_ltu_byte(b3, 60, 1)
    for b3). For BabyBear, (p-1)/2 = 60·2^24 exactly, so b3 < 60
    gives the tightest no-overlap band.

In-circuit branch equality via condition_require_equal:

  • read (is_global_write = 0): y[6] = y6_lo + 1y[6] ∈ [1, (p-1)/2]
  • write (is_global_write = 1): y[6] = p - 1 - y6_loy[6] ∈ [(p+1)/2, p-1]

Union covers [1, p-1] with no overlap; y[6] = 0 is excluded.

Why not a single AssertLtConfig(y6_lo, (p-1)/2, max_bits=30)?
On BabyBear (p = 0x78000001, 31-bit) the AssertLt gadget only
constrains lhs - rhs ≡ diff - 2^max_bits (mod p) with diff ∈ [0, 2^30)
— it does not pre-bound lhs to be canonical-small. A malicious
y6_lo ∈ [0x74000001, p-1] (≈ 2^26 values) produces a field-wrap diff
that still fits in 30 bits, so the constraint accepts upper-half values
and the exploit survives. Byte-decomposing first kills the wrap. Ceno's
DynamicRangeTableCircuit<E, 18> also does not carry 30-bit lookup
entries, so a direct assert_const_range(_, 30) is not available
anyway.

Why M = 60 (vs SP1's 63). SP1 targets KoalaBear; its (p-1)/2 = 0x3f800000, so 63 leaves a small safety band. For BabyBear,
(p-1)/2 = 60·2^24 exactly — 63 would let y[6] straddle p/2 and
reintroduce the ambiguity.

Also corrects the stale comment that previously had the convention
reversed (claimed write ⇒ lower half, opposite of what the host code
does).

Change Highlights

ceno_zkvm/src/tables/shard_ram.rs — chip-level y-sign binding

  • ShardRamRecord::to_ec_point: reject y6 == 0 and try the next
    nonce. Classify with strict y6 > prime / 2 so the boundary
    (p-1)/2 correctly stays in the read region (a previous draft used
    >= which misclassified that single boundary value and would have
    produced an out-of-range y6_lo for both branches).
  • ShardRamConfig: new field y6_lo_bytes: [WitIn; 4].
  • ShardRamConfig::configure: replace the TODO with the byte
    decomposition, byte-range / LTU lookups, and the
    condition_require_equal branch equality.
  • ShardRamCircuit::assign_instance: compute y6_lo from y[6] and
    is_to_write_set via a small y6_lo_value helper, assign byte
    limbs, register byte and LTU multiplicities.
  • New test test_shard_ram_y_sign_circuit_rejects_negation drives
    assign_instances_with_lk_multiplicities + MockProver over one
    honest row and one sign-flipped row, asserting lookup_Ltu rejects
    the tampered witness. A concrete challenge is supplied so the
    no-challenge run path doesn't drop structural_witin.

Lookup-multiplicity plumbing for ShardRam

ShardRam's per-row y6_lo byte / LTU lookups must reach
combined_lk_mlt so the U8 / LTU table mlt columns balance.
ShardRam runs after opcode + dummy circuits, before
finalize_lk_multiplicities. To surface mlt without burdening every
other table circuit:

  • ceno_zkvm/src/tables/mod.rs: TableCircuit trait gains a second
    default-unimplemented method
    assign_instances_with_lk_multiplicities alongside the existing
    assign_instances. ShardRam overrides the former; every other
    table keeps overriding the latter.
  • ceno_zkvm/src/structs.rs: ZKVMWitnesses::assign_shared_circuit
    threads a LkMultiplicity::default() through ShardRam's
    parallel-chunk witgen and inserts
    lk_multiplicity.into_finalize_result() into
    lk_mlts["ShardRamCircuit"] before finalize. Asserts swap from
    combined_lk_mlt.is_some() to is_none() to lock the ordering.
    assign_table_circuit tolerates combined_lk_mlt = None by
    passing an empty multiplicity slice, so LocalFinalCircuit (which
    ignores the argument anyway) can also run before finalize.
  • ceno_zkvm/src/e2e.rs: move
    MmuConfig::assign_continuation_circuit (LocalFinal + ShardRam) to
    just before finalize_lk_multiplicities. Mirror the move inside
    the GPU debug-compare block so combined_lk_mlt diff stays
    meaningful.
  • ceno_zkvm/src/instructions/riscv/rv32im/mmu.rs: docstring updated
    to describe the new ordering invariant.

Device-resident GPU shortcut for ShardRam (mlt mirror)

ZKVMWitnesses::try_assign_shared_circuit_gpu dispatches into
instructions::gpu::chips::shard_ram::try_gpu_assign_shared_circuit
to keep the continuation EC computation device-resident
(gpu_batch_continuation_ec_on_device + merge_and_partition_records)
when is_gpu_witgen_enabled(). The GPU kernels never enter the CPU
assign_instance per-row push, so the y6_lo lookup multiplicity is
derived host-side:

  • After step 6 of try_gpu_assign_shared_circuit (merge+partition),
    D2H partitioned_buf once to Vec<u32> and walk it with stride
    record_u32s = 26 (GpuShardRamRecord #[repr(C)] layout).
    Per record extract is_to_write_set (u32 offset 10) and
    point_y[6] (u32 offset 25), compute y6_lo, push the same
    4 lookup queries the CPU path emits per row, then
    into_finalize_result() and return alongside the chunked
    Vec<ChipInput<E>>. debug_assert_eq!(record_u32s, 26) guards
    against ceno_gpu layout drift.
  • try_assign_shared_circuit_gpu inserts both ChipInput and the
    derived multiplicity into self.witnesses /
    self.lk_mlts["ShardRamCircuit"] so finalize folds the GPU-path
    contribution into combined_lk_mlt the same way the CPU shortcut
    does.

Verifier: account for has_ecc_ops row doubling

ShardRamCircuit::has_ecc_ops() adds an extra hypercube variable;
the chip matrix has 2 * next_pow2(num_instance) rows where the
back half is EC-tree internal nodes with selector_zero = 0. Before
this fix the chip had num_lks = 0, so the verifier's
dummy_table_item_multiplicity correction never had to consider it.
With the new byte/LTU queries the correction under-counted dummy
lookups by a factor of 2 and shard verification failed with
logup_sum != 0.

  • ceno_zkvm/src/scheme/verifier.rs: multiply next_pow2_instance
    by 2 when circuit_vk.get_cs().has_ecc_ops().
  • ceno_recursion/src/zkvm_verifier/verifier.rs: mirror the same
    adjustment in the recursive verifier (lockstep per CLAUDE.md).

Benchmark / Performance Impact

Per ShardRam row this PR adds 4 byte WitIn columns plus 3 byte-range
and 1 LTU lookup multiplicities. ShardRam rows scale with cross-shard
RAM events, not with cycles, so the absolute cost is sub-percent on the
prover. No full prover bench was rerun (no hot-loop arithmetic changed).

Existing test_shard_ram_circuit (170k reads + 1420 writes, full chip
proof) runtime is unchanged within noise:

master   : ~5.0 s
this PR  : ~5.0 s

Testing

cargo fmt --all --check
cargo check --workspace --all-targets
cargo check --workspace --all-targets --release
cargo make clippy
cargo clippy --workspace --all-targets --release -- -D warnings
RUST_MIN_STACK=33554432 cargo test --workspace --lib --release
cargo run --release --package ceno_zkvm --features sanity-check --bin e2e -- \
  --platform=ceno --max-cycle-per-shard=20000 --hints=10 --public-io=4191 \
  examples/target/riscv32im-ceno-zkvm-elf/release/examples/fibonacci

All pass locally on BabyBear. test_shard_ram_circuit and
test_shard_ram_y_sign_circuit_rejects_negation are green. End-to-end
multi-shard fibonacci verifies ShardRamCircuit and LocalRAMTableFinal
on every shard with exit code 0. Success.

cargo make tests / cargo make tests_goldilock should be re-run by
CI; the change is gated to BabyBear via a debug_assert_eq! on
MODULUS_U64 and goldilocks does not exercise shard_ram (per
integration.yml commented-out lines and CLAUDE.md).

Risks and Rollout

  • Soundness. Closes ShardRam read/write direction is not bound to the public shard_rw_sum #1338. The new constraint only adds local byte
    arithmetic and existing lookups — no change to transcript, sumcheck,
    PCS, or EC accumulation. Recursive and native verifiers move in
    lockstep (the has_ecc_ops row-factor fix lands in both).
  • GPU. The device-resident GPU shortcut now derives the y6_lo
    lookup multiplicity host-side from the merged partitioned device
    buffer (single D2H of ~26 u32 × records). Layout assumption is
    guarded by debug_assert_eq!(record_u32s, 26) against
    ceno_gpu::GpuShardRamRecord. CPU + GPU paths converge on the same
    combined_lk_mlt contribution; runtime verification with
    CENO_GPU_ENABLE_WITGEN=1 --features gpu on a CUDA host is
    recommended before tag.
  • Recursion. The recursive verifier mirrors the native verifier's
    has_ecc_ops × 2 row adjustment; no separate constraint-system
    change is needed for the y-sign binding itself.
  • Field support. Hardcodes the BabyBear constant M = 60. A
    debug_assert_eq!(MODULUS_U64, 0x78000001, ...) guards against
    accidental use on a different field; shard_ram is BabyBear-only
    today per CLAUDE.md.

Follow-ups

  • The remaining finish the TODOs in shard_ram.rs #1340 TODOs (local read ⇄ global write pairing on
    shard_ram.rs:235-236, shard == shard_id binding on line 244) are
    intentionally out of scope here.

Fixes #1338.
Partially addresses #1340.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Findings (sorted by severity)

  • Blocker | ceno_zkvm/src/tables/shard_ram.rs (new assert_byte / lookup_ltu_byte constraints): LK multiplicity aggregation/order looks inconsistent with new lookups.
    This PR introduces new LK interactions in ShardRamConfig::configure (assert_byte + lookup_ltu_byte). However, the global LK multiplicities used to assign the DynamicRange/LTU table circuits are finalized and those table circuits are assigned before ShardRamCircuit is assigned in the shard pipeline (see ceno_zkvm/src/e2e.rs:1500-1589). As written, ShardRam’s new lookup usage does not appear to contribute to combined_lk_mlt prior to table-circuit assignment, which is expected to break the logup multiset check (or otherwise leave these lookups unaccounted).
    Suggested fix: update the witness/LK aggregation flow so ShardRam’s byte/LTU lookups contribute to the global multiplicity before Rv32imConfig::assign_table_circuit runs (e.g., collect a per-chip multiplicity for ShardRam and include it in ZKVMWitnesses.lk_mlts prior to finalize_lk_multiplicities, or reorder assignment so ShardRam is assigned before lookup-table circuits).

  • Major | ShardRamRecord::to_ec_point: half-of-field boundary is off by one vs the new convention.
    is_y_in_2nd_half currently uses y6 >= prime/2. For odd primes, prime/2 == (p-1)/2, so the boundary value y6 == (p-1)/2 is classified as “second half”, causing the new convention (“read => [1,(p-1)/2]”) to be violated and potentially making otherwise-valid witnesses fail the new in-circuit banding.
    Suggested fix: compare against (prime + 1)/2 (or use a strict y6 > prime/2) to match the stated ranges.

  • Major | BabyBear-only guard uses debug_assert_eq! in circuit configuration.
    The constraint relies on BabyBear’s (p-1)/2 = 60·2^24, but debug_assert_eq! is compiled out in release. If instantiated over a different field, the circuit would silently become incorrect.
    Suggested fix: enforce at runtime (e.g., assert_eq! or return Err(CircuitBuilderError::CircuitError(..))).

  • Minor | Comment accuracy in to_ec_point.
    The “2-torsion case where (x,y)==(x,-y)” phrasing is misleading: y6 == 0 doesn’t imply the full y-coordinate is zero; it only means that limb is fixed under negation, which is what makes the chosen encoding ambiguous/unsatisfiable.
    Suggested fix: reword the comment to reflect the actual reason for rejection.

  • Minor (testing) | New test does not assert constraint/prover rejection.
    test_shard_ram_y_sign_circuit_rejects_negation currently checks derived limb properties (b3 < 60 vs >= 60) but doesn’t actually run a constraint satisfiability check / mock prover / proof attempt that must fail for the tampered row. This can pass even if the lookup constraint is missing or if LK-table population is broken.
    Suggested fix: make it a true regression by asserting the tampered witness fails constraint satisfaction (e.g., via MockProver with the necessary public inputs / table chips, or by attempting proof generation and asserting it errors).

Open questions / assumptions

  • Is ShardRam always assigned after lookup-table circuits in all proving entrypoints (CPU + GPU + recursion pipelines)? If yes, the LK multiplicity/order blocker needs a design-level fix (not just a local change).
  • Is it acceptable to hard-fail (non-debug) when BaseField != BabyBear, or is there a preferred feature-gate pattern for BabyBear-only chips in this repo?

Changes:

  • Host-side to_ec_point now rejects y6 == 0 and documents the read/write y-half convention.
  • Circuit-side: adds byte-decomposition + lookup constraints and a conditional equality binding y6 to is_global_write.
  • Adds a targeted unit test around the y-sign binding logic for honest vs sign-flipped points.

Comment thread ceno_zkvm/src/tables/shard_ram.rs Outdated
Comment thread ceno_zkvm/src/tables/shard_ram.rs Outdated
Comment thread ceno_zkvm/src/tables/shard_ram.rs Outdated
Comment thread ceno_zkvm/src/tables/shard_ram.rs
Comment thread ceno_zkvm/src/tables/shard_ram.rs Outdated
Comment thread ceno_zkvm/src/tables/shard_ram.rs
@spherel spherel marked this pull request as draft May 25, 2026 13:39
@dreamATD dreamATD force-pushed the fix/issue-1338-shard-ram-y-sign branch from 15c96f8 to c666a53 Compare May 25, 2026 22:33
@spherel spherel requested a review from Copilot May 25, 2026 22:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Comment thread ceno_zkvm/src/tables/shard_ram.rs
Comment thread ceno_zkvm/src/tables/mod.rs
Comment thread ceno_zkvm/src/structs.rs
Comment thread ceno_zkvm/src/instructions/gpu/chips/shard_ram.rs Outdated
Comment thread ceno_zkvm/src/tables/shard_ram.rs
@spherel spherel force-pushed the fix/issue-1338-shard-ram-y-sign branch from c666a53 to fe8cd5e Compare May 25, 2026 22:54
## Problem

`ShardRamCircuit` differentiates a global *read* from a global *write*
by writing one of (x, y) or (x, -y) into the witness. Before this fix
nothing constrained which y was chosen, so an attacker could flip
is_global_write and migrate a record between the read/write sets
without changing anything else in the witness. The y-sign was the
entire signal — a soundness break.

## Design Rationale

Bind the sign of `y6 = y[SEPTIC_EXTENSION_DEGREE - 1]` to
is_global_write via a half-of-field convention:

- read  (is_global_write = 0): y6 in [1, (p-1)/2]
- write (is_global_write = 1): y6 in [(p+1)/2, p-1]

For BabyBear `(p-1)/2 = 60 * 2^24` exactly, so a witnessed
`y6_lo in [0, (p-1)/2)` decomposes into four bytes with top byte
`b3 < 60`. Three U8 `assert_byte` queries plus one
`lookup_ltu_byte(b3, 60, 1)` bound y6_lo, then a single
`condition_require_equal` ties y6 to either `y6_lo + 1` (read) or
`(p-1) - y6_lo` (write) under the is_global_write selector. y6 = 0 is
the unique fixed point not covered by either branch; `to_ec_point`
skips it so the prover doesn't generate an unprovable record.

Mirror the partition on the prover side: `to_ec_point` uses
`y6 > prime / 2` (strict; `(p-1)/2` belongs to the read region) to
decide whether to negate the natural sqrt, and bumps the nonce when
y6 = 0.

## Change Highlights

### `ceno_zkvm/src/tables/shard_ram.rs` — chip-level y-sign binding

- `ShardRamConfig`: add `y6_lo_bytes: [WitIn; 4]`; in `configure` emit
  3 x `assert_byte` + 1 x `lookup_ltu_byte(_, 60, 1)` and one
  `condition_require_equal` tying y6 to is_global_write under the
  is_global_write selector.
- `to_ec_point`: skip the `y6 = 0` case; classify
  `y6 > prime / 2` (strict, so the boundary `(p-1)/2` stays read) to
  decide whether to negate the natural sqrt.
- `assign_instance`: write the four `y6_lo` byte limbs via the new
  `y6_lo_value` helper. mlt is surfaced via the new
  `assign_instances_with_lk_multiplicities` entry below — no per-row
  push left dangling.

### Lookup-multiplicity plumbing for ShardRam

ShardRam's per-row y6_lo byte / LTU lookups must reach
`combined_lk_mlt` so the U8 / LTU table `mlt` columns balance.
ShardRam runs after opcode + dummy circuits, before
`finalize_lk_multiplicities`. To surface mlt without burdening every
other table circuit:

- `ceno_zkvm/src/tables/mod.rs`: `TableCircuit` trait gains a second
  default-unimplemented method
  `assign_instances_with_lk_multiplicities` alongside the existing
  `assign_instances`. ShardRam overrides the former; every other
  table keeps overriding the latter.
- `ceno_zkvm/src/structs.rs`: `ZKVMWitnesses::assign_shared_circuit`
  threads a `LkMultiplicity::default()` through ShardRam's
  parallel-chunk witgen and inserts
  `lk_multiplicity.into_finalize_result()` into
  `lk_mlts["ShardRamCircuit"]` before finalize. Asserts swap from
  `combined_lk_mlt.is_some()` to `is_none()` to lock the ordering.
  `assign_table_circuit` tolerates `combined_lk_mlt = None` by
  passing an empty multiplicity slice, so LocalFinalCircuit (which
  ignores the argument anyway) can also run before finalize.
- `ceno_zkvm/src/e2e.rs`: move
  `MmuConfig::assign_continuation_circuit` (LocalFinal + ShardRam) to
  just before `finalize_lk_multiplicities`. Mirror the move inside
  the GPU debug-compare block so `combined_lk_mlt` diff stays
  meaningful.
- `ceno_zkvm/src/instructions/riscv/rv32im/mmu.rs`: docstring updated
  to describe the new ordering invariant.

### Device-resident GPU shortcut for ShardRam (mlt mirror)

`ZKVMWitnesses::try_assign_shared_circuit_gpu` dispatches into
`instructions::gpu::chips::shard_ram::try_gpu_assign_shared_circuit`
to keep the continuation EC computation device-resident
(`gpu_batch_continuation_ec_on_device` + `merge_and_partition_records`)
when `is_gpu_witgen_enabled()`. The GPU kernels never enter the CPU
`assign_instance` per-row push, so the y6_lo lookup multiplicity is
derived host-side:

- After step 6 of `try_gpu_assign_shared_circuit` (merge+partition),
  D2H `partitioned_buf` once to `Vec<u32>` and walk it with stride
  `record_u32s = 26` (`GpuShardRamRecord` `#[repr(C)]` layout).
  Per record extract `is_to_write_set` (u32 offset 10) and
  `point_y[6]` (u32 offset 25), compute `y6_lo`, push the same
  4 lookup queries the CPU path emits per row, then
  `into_finalize_result()` and return alongside the chunked
  `Vec<ChipInput<E>>`. `debug_assert_eq!(record_u32s, 26)` guards
  against `ceno_gpu` layout drift.
- `try_assign_shared_circuit_gpu` inserts both `ChipInput` and the
  derived multiplicity into `self.witnesses` /
  `self.lk_mlts["ShardRamCircuit"]` so finalize folds the GPU-path
  contribution into `combined_lk_mlt` the same way the CPU shortcut
  does.

### Verifier: account for `has_ecc_ops` row doubling

`ShardRamCircuit::has_ecc_ops()` adds an extra hypercube variable;
the chip matrix has `2 * next_pow2(num_instance)` rows where the
back half is EC-tree internal nodes with `selector_zero = 0`. Before
this fix the chip had `num_lks = 0`, so the verifier's
`dummy_table_item_multiplicity` correction never had to consider it.
With the new byte/LTU queries the correction under-counted dummy
lookups by a factor of 2 and shard verification failed with
`logup_sum != 0`.

- `ceno_zkvm/src/scheme/verifier.rs`: multiply `next_pow2_instance`
  by 2 when `circuit_vk.get_cs().has_ecc_ops()`.
- `ceno_recursion/src/zkvm_verifier/verifier.rs`: mirror the same
  adjustment in the recursive verifier (lockstep per CLAUDE.md).

### Tests

- `tables::shard_ram::tests::test_shard_ram_y_sign_circuit_rejects_negation`
  drives `assign_instances_with_lk_multiplicities` + `MockProver`.
  The honest row satisfies every constraint; the tampered row (same
  record, negated EC point) trips `lookup_Ltu` on the wrong-sign b3.
  A concrete challenge is supplied so the no-challenge `run` path
  doesn't drop `structural_witin`.
- `test_shard_ram_circuit` updated to call
  `assign_instances_with_lk_multiplicities`.

## Testing

```
cargo fmt --all --check
cargo make clippy                                  # -D warnings, dev profile
cargo clippy --workspace --all-targets --release
cargo test --workspace --lib --release
cargo run --release -p ceno_zkvm --features sanity-check --bin e2e -- \
  --platform=ceno --max-cycle-per-shard=20000 \
  --hints=10 --public-io=4191 \
  examples/target/riscv32im-ceno-zkvm-elf/release/examples/fibonacci
```

End-to-end fibonacci across 6 shards verifies `ShardRamCircuit` and
`LocalRAMTableFinal` on every shard with `exit code 0. Success.`
GPU shortcut (`--features gpu` + `CENO_GPU_ENABLE_WITGEN=1`) needs a
CUDA host to verify at runtime; static structure mirrors the CPU
shortcut and CPU path remains identical.

## Risks and Rollout

- Soundness boundary moved: the chip now constrains the EC y-sign
  that was previously unconstrained. Mirrored on native and
  recursive verifiers; protocol/transcript order is unchanged so the
  two stay in lockstep.
- The `has_ecc_ops` row-factor verifier fix only manifests once any
  `has_ecc_ops` chip has `num_lks > 0`. ShardRam is the only such
  chip today; lookup balance failures elsewhere would be unrelated.
- GPU mlt offsets are read from `shard_ram_record_to_gpu` (offsets 10
  and 25 in 26 u32s). `debug_assert_eq!(record_u32s, 26)` trips if
  `ceno_gpu` reshuffles `GpuShardRamRecord` so silent drift is
  caught.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@spherel spherel force-pushed the fix/issue-1338-shard-ram-y-sign branch from 6e52439 to a4879ab Compare May 25, 2026 23:21
@spherel spherel marked this pull request as ready for review May 25, 2026 23:44
@spherel spherel requested a review from Copilot May 25, 2026 23:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines +115 to +119
_multiplicity: &[FxHashMap<u64, usize>],
_input: &Self::WitnessInput<'_>,
) -> Result<RMMCollections<E::BaseField>, ZKVMError> {
unimplemented!("assign_instances is not implemented for this table circuit")
}
Comment thread ceno_zkvm/src/structs.rs
Comment on lines 479 to +483
let witness = TC::assign_instances(
config,
cs.zkvm_v1_css.num_witin as usize,
cs.zkvm_v1_css.num_structural_witin as usize,
self.combined_lk_mlt.as_ref().unwrap(),
self.combined_lk_mlt.as_ref().unwrap_or(&empty_mlt),
Comment thread ceno_zkvm/src/tables/shard_ram.rs
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread ceno_zkvm/src/tables/shard_ram.rs Outdated
// `lookup_ltu_byte(a, b, 1)` asserts `a, b` are bytes and `a < b`.
cb.lookup_ltu_byte(
y6_lo_bytes[3].expr(),
E::BaseField::from_canonical_u64(60).expr(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please make this 60 as a constant s.t. we can refers to it here and in witness generation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d2c12a4. I introduced Y6_LO_TOP_BYTE_LT_BOUND and replaced the duplicated 60 in the circuit-side check plus both CPU/GPU witness-generation lookup multiplicity paths so they all reference the same constant.

Copy link
Copy Markdown
Contributor

Copilot AI commented May 26, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/scroll-tech/ceno-patch/commits/main
    • Triggering command: /home/REDACTED/.rustup/toolchains/nightly-2025-11-20-x86_64-REDACTED-linux-gnu/bin/cargo /home/REDACTED/.rustup/toolchains/nightly-2025-11-20-x86_64-REDACTED-linux-gnu/bin/cargo metadata --format-version 1 --all-features --manifest-path /home/REDACTED/work/ceno/ceno/ceno_cli/example/Cargo.toml --filter-platform x86_64-REDACTED-linux-gnu --lockfile-path /tmp/rust-analyzer3638e-0/Cargo.lock -Zunstable-options scv_�� in.so /lto-wrapper f/parse-size-1.1.0/src/lib.rs scv_stats-77399cgit scv_stats-77399cpush scv_stats-77399c-v 20-x86_64-unknoworigin (http block)
    • Triggering command: /home/REDACTED/.rustup/toolchains/nightly-2025-11-20-x86_64-REDACTED-linux-gnu/bin/cargo /home/REDACTED/.rustup/toolchains/nightly-2025-11-20-x86_64-REDACTED-linux-gnu/bin/cargo check --quiet --workspace --message-format=json --manifest-path /home/REDACTED/work/ceno/ceno/ceno_cli/example/Cargo.toml --target-dir /tmp/codeql-scratch-2d4a8dd44548aa29/dbs/rust/working/target --lockfile-path /tmp/rust-analyzer3638e-2/Cargo.lock --all-features --keep-going --compile-time-deps --all-targets -Zunstable-options -cgu.1.rcgu.o.1c/opt/hostedtoolcache/CodeQL/2.25.4/x64/codeql/rust/tools/autobuild.sh -incompat 0b12adedc1-cgu.0.rcgu.o.1clq9tr./tmp/rustcpEmwhq/symbols.o 43b58db621-cgu.0.rcgu.o.1clq9tr./home/REDACTED/work/ceno/ceno/target/debug/build/ahash-821ee3fbe46--print (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from kunxian-xia May 26, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ShardRam read/write direction is not bound to the public shard_rw_sum

4 participants