activate: preserve journal and shard splits across dataflow reset#2928
activate: preserve journal and shard splits across dataflow reset#2928jshearer wants to merge 3 commits into
Conversation
Not much more to say: we used to just create one partition, now we create the same number of partitions as the previous pre-reset version had. Fixes #2881
b1644ea to
65473d2
Compare
The test previously reached the no-journals state via reset and raced to observe `LeaderNotAvailable` before journals were recreated. With partition splits now preserved across reset, the post-reset no-journals window doesn't exist and the test bails every run. Drive the same dekaf branch by skipping the document inject before assertions: a fresh-published collection has no partition journals until the runtime mapper creates them on first commit. Removes the race and exercises the production scenario the `LeaderNotAvailable` response covers.
569c730 to
bef6ae5
Compare
|
I think this needs to attempt only to preserve splits of an unpartitioned collection that continues to be unpartitioned. We should not attempt to preserve logical partitions across collection resets, for a couple of reasons:
So, if a collection is or was partitioned, we should use initial splits of 1 (and not attempt to pre-create any logical partitions). Otherwise, in the common case of a collection without logical partitioning, preserving splits makes plenty of practical sense. |
|
Ahhhh, didn't even think about logical partitioning. Good point, I'll carve that out 👍 |
c542fd8 to
05db6fa
Compare
|
Kk, updated so that Will squash before merging |
Summary
apply_initial_splitsapply_initial_partition_splitsfunctionNote on
dekaf::e2e::not_readyThe
not_readytest reached the no-journals state by resetting a populated collection, then racing to observeLeaderNotAvailablebefore journals were recreated. Now thatapply_initial_partition_splitsrecreates partition journals at activation time, that state is no longer reachable via reset and the test was bailing every run withJournals were created before we could test NotReady state.The relevant dekaf branch (
Collection::newreturningNotReadywhenpartitions.is_empty()) is still load-bearing for fresh-published collections where the capture hasn't committed any data yet. Partition journals are created lazily by the runtime mapper on first commit, so dropping the reset cycle and asserting against a freshsetup(no inject) exercises the same codepath.Fixes #2881