Skip to content

worker: decode partition records as v2 envelopes with v1 fallback#4696

Merged
muhamadazmy merged 1 commit into
mainfrom
pr4696
May 11, 2026
Merged

worker: decode partition records as v2 envelopes with v1 fallback#4696
muhamadazmy merged 1 commit into
mainfrom
pr4696

Conversation

@muhamadazmy
Copy link
Copy Markdown
Contributor

@muhamadazmy muhamadazmy commented May 5, 2026

worker: decode partition records as v2 envelopes with v1 fallback

Summary

Switch the partition processor's read path to v2 envelopes. Records
are decoded as v2::Envelope<Raw> first; if the record on disk is
still flexbuffers-encoded v1, we downcast through the new
StorageDecodeError::TypedValueMismatch variant (which now carries
the original Arc) and convert via the v1→v2 compatibility shim.

The decode path no longer relies on Header::dest to decide whether
a record targets this partition. Instead, the processor stores a
KeyFilter derived from its partition key range and checks each
record's Keys via MatchKeyQuery. Deduplication still runs, but
now reads Dedup off the v2 header (converted to the existing
DedupInformation for the dedup table — TODO left to consume Dedup
directly).

apply_record is rewritten around RecordKind: AnnounceLeader and
UpdatePartitionDurability are handled inline; everything else is
forwarded to the state machine as a typed v2 envelope.

Tests under state_machine/ and pp-bench are updated to construct
records via records::<Kind>::new_test(...) instead of the v1
Command::<Variant>(...) enum.

Writers still emit v1 envelopes; only reads go through v2.


Stack created with Sapling. Best reviewed with ReviewStack.

@muhamadazmy muhamadazmy changed the title Read and use Envelope v2 [wip] read and use envelope v2 May 6, 2026
@muhamadazmy muhamadazmy force-pushed the pr4696 branch 2 times, most recently from 983f7e3 to 4a99ecf Compare May 6, 2026 16:25
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Test Results

  8 files  ±0    8 suites  ±0   4m 36s ⏱️ -27s
 50 tests ±0   50 ✅ ±0  0 💤 ±0  0 ❌ ±0 
218 runs  ±0  218 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 4bdc261. ± Comparison against base commit d71638a.

♻️ This comment has been updated with latest results.

@muhamadazmy muhamadazmy force-pushed the pr4696 branch 2 times, most recently from 5c3811d to e49a760 Compare May 7, 2026 08:41
@muhamadazmy muhamadazmy changed the title [wip] read and use envelope v2 worker: decode partition records as v2 envelopes with v1 fallback May 7, 2026
@muhamadazmy muhamadazmy force-pushed the pr4696 branch 4 times, most recently from 9bac5c5 to c83fbaf Compare May 7, 2026 13:24
@muhamadazmy muhamadazmy marked this pull request as ready for review May 7, 2026 13:48
@muhamadazmy muhamadazmy requested a review from AhmedSoliman May 7, 2026 13:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c83fbaf695

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +197 to +199
/// partition key range
#[bilrost(tag(4))]
pub partition_key_range: Keys,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve decoding of old durability records

When a new processor replays or follows a WAL segment containing Command::UpdatePartitionDurability written by a pre-change node, the flexbuffers/serde V1 payload will not contain this newly added partition_key_range field. Because the field is non-optional and has no #[serde(default)], v1::Envelope::decode fails before the V1-to-V2 fallback can convert the record, so rolling upgrades or replay of untrimmed old durability records can stop the partition processor. Please add a backward-compatible default/compat path for this field.

Useful? React with 👍 / 👎.

@muhamadazmy muhamadazmy force-pushed the pr4696 branch 13 times, most recently from b3e0c8a to c528d5a Compare May 8, 2026 11:24
Comment thread crates/platform/src/storage.rs Outdated
UnsupportedCodecKind(StorageCodecKind),
#[error("Type mismatch. Original value in PolyBytes::Typed({}) does not match requested type", _0.type_name())]
#[debug("TypedValueMismatch({})", _0.type_name())]
TypedValueMismatch(Arc<dyn StorageEncode>),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of passing the entire value into the error. Is this really necessary? I suppose one can formulate the error message and return it in a ReString instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only for the name. this is actually used to fallback to v1 decoding in some cases.

The problem here is because of record cache, we might try to decode v1 envelope as v2 where the PolyBytes are typed. To make this work, I instead pass the inner value to be handled here https://github.com/restatedev/restate/pull/4696/changes#diff-09e9d092973d97a3155dc5ca9f928dee76e6b85bbea659659c3083da645f7d09R458

Comment thread crates/wal-protocol/src/v2.rs Outdated
keys: payload.record_keys(),
payload,
fn from(command: C) -> PartialEnvelope {
// let payload = payload.into();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

let record_lsn = record.lsn;
let envelope = Arc::unwrap_or_clone(record.envelope);
// todo(azmy): use dedup() directly without first converting to DedupInformation
let dedup_information: Option<DedupInformation> = record.envelope.dedup().clone().into();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love if we can address this one pretty soon

Copy link
Copy Markdown
Member

@AhmedSoliman AhmedSoliman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this 🚢

Comment thread crates/platform/src/storage.rs Outdated
Comment thread crates/worker/src/partition/state_machine/mod.rs
Comment thread crates/worker/src/partition/state_machine/mod.rs
Comment thread tools/pp-bench/src/extract.rs Outdated
## Summary
Switch the partition processor's read path to v2 envelopes. Records
are decoded as `v2::Envelope<Raw>` first; if the record on disk is
still flexbuffers-encoded v1, we downcast through the new
`StorageDecodeError::TypedValueMismatch` variant (which now carries
the original Arc) and convert via the v1→v2 compatibility shim.

The decode path no longer relies on `Header::dest` to decide whether
a record targets this partition. Instead, the processor stores a
`KeyFilter` derived from its partition key range and checks each
record's `Keys` via `MatchKeyQuery`. Deduplication still runs, but
now reads `Dedup` off the v2 header (converted to the existing
`DedupInformation` for the dedup table — TODO left to consume Dedup
directly).

`apply_record` is rewritten around `RecordKind`: AnnounceLeader and
UpdatePartitionDurability are handled inline; everything else is
forwarded to the state machine as a typed v2 envelope.

Tests under state_machine/ and pp-bench are updated to construct
records via `records::<Kind>::new_test(...)` instead of the v1
`Command::<Variant>(...)` enum.

Writers still emit v1 envelopes; only reads go through v2.
@muhamadazmy muhamadazmy merged commit c53757e into main May 11, 2026
73 of 74 checks passed
@muhamadazmy muhamadazmy deleted the pr4696 branch May 11, 2026 11:03
@github-actions github-actions Bot locked and limited conversation to collaborators May 11, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants