Skip to content

activate: preserve journal and shard splits across dataflow reset#2928

Open
jshearer wants to merge 3 commits into
masterfrom
jshearer/propagate_split_journals_across_collection_reset
Open

activate: preserve journal and shard splits across dataflow reset#2928
jshearer wants to merge 3 commits into
masterfrom
jshearer/propagate_split_journals_across_collection_reset

Conversation

@jshearer
Copy link
Copy Markdown
Contributor

@jshearer jshearer commented May 7, 2026

Summary

  • Preserve shard split count across dataflow resets by using the old-generation shard count (instead of the default 1) when creating new-generation shards in apply_initial_splits
  • Preserve partition journal splits by mirroring old-generation partition topology (key ranges and logical partition field labels) into new-generation partitions via a new apply_initial_partition_splits function
  • Both changes are no-ops for truly new tasks/collections (empty old-gen) and for tasks where the new-gen already exists

Note on dekaf::e2e::not_ready

The not_ready test reached the no-journals state by resetting a populated collection, then racing to observe LeaderNotAvailable before journals were recreated. Now that apply_initial_partition_splits recreates partition journals at activation time, that state is no longer reachable via reset and the test was bailing every run with Journals were created before we could test NotReady state.

The relevant dekaf branch (Collection::new returning NotReady when partitions.is_empty()) is still load-bearing for fresh-published collections where the capture hasn't committed any data yet. Partition journals are created lazily by the runtime mapper on first commit, so dropping the reset cycle and asserting against a fresh setup (no inject) exercises the same codepath.

Fixes #2881

Not much more to say: we used to just create one partition, now we create the same number of partitions as the previous pre-reset version had.

Fixes #2881
@jshearer jshearer force-pushed the jshearer/propagate_split_journals_across_collection_reset branch from b1644ea to 65473d2 Compare May 7, 2026 21:34
The test previously reached the no-journals state via reset and raced to observe `LeaderNotAvailable` before journals were recreated. With partition splits now preserved across reset, the post-reset no-journals window doesn't exist and the test bails every run.

Drive the same dekaf branch by skipping the document inject before assertions: a fresh-published collection has no partition journals until the runtime mapper creates them on first commit. Removes the race and exercises the production scenario the `LeaderNotAvailable` response covers.
@jshearer jshearer force-pushed the jshearer/propagate_split_journals_across_collection_reset branch from 569c730 to bef6ae5 Compare May 7, 2026 23:35
@jshearer jshearer self-assigned this May 8, 2026
@jgraettinger
Copy link
Copy Markdown
Member

I think this needs to attempt only to preserve splits of an unpartitioned collection that continues to be unpartitioned. We should not attempt to preserve logical partitions across collection resets, for a couple of reasons:

  1. Collection reset is how one changes the logical partitioning being applied to a collection. The former splits may not even be logically partitioned fields of the post-reset spec.
  2. Reset is also how one cleans up partition instances which may no longer be relevant or desired, and we provide no other means for removing partition instances today (aside from reset).

So, if a collection is or was partitioned, we should use initial splits of 1 (and not attempt to pre-create any logical partitions). Otherwise, in the common case of a collection without logical partitioning, preserving splits makes plenty of practical sense.

@jshearer
Copy link
Copy Markdown
Contributor Author

jshearer commented May 8, 2026

Ahhhh, didn't even think about logical partitioning. Good point, I'll carve that out 👍

@jshearer jshearer force-pushed the jshearer/propagate_split_journals_across_collection_reset branch from c542fd8 to 05db6fa Compare May 11, 2026 18:22
@jshearer
Copy link
Copy Markdown
Contributor Author

jshearer commented May 11, 2026

Kk, updated so that apply_initial_partition_splits now preserves splits only when both old and new generations are unpartitioned. If either is logically partitioned, the function returns its input unchanged, partition_changes deletes the old generation's journals, and the actual partitions are created at runtime.

Will squash before merging

@jshearer jshearer requested a review from jgraettinger May 11, 2026 18:52
Copy link
Copy Markdown
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataflow reset drops journal splitting, painful for large captures

2 participants