br/pkg/streamhelper: stabilize flaky TestSubBasic#67859
br/pkg/streamhelper: stabilize flaky TestSubBasic#67859flaky-claw wants to merge 2 commits intopingcap:masterfrom
Conversation
|
@flaky-claw I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @flaky-claw. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughReplaces a timing-based stabilization wait in TestSubBasic with a new test helper Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
br/pkg/streamhelper/subscription_test.go (1)
47-51: Optional: include an informative failure message.If the subscriber stalls below
expected,require.Eventuallywill fail with a generic "condition never satisfied" message and no visibility into how many events actually arrived, which makes future flake diagnosis harder. Consider capturing the last observed count and passing amsgAndArgsto aid debugging.♻️ Proposed refactor
func waitAtLeastEvents(t *testing.T, sub *streamhelper.FlushSubscriber, expected int) { + var last int require.Eventually(t, func() bool { - return len(sub.Events()) >= expected - }, 30*time.Second, 10*time.Millisecond) + last = len(sub.Events()) + return last >= expected + }, 30*time.Second, 10*time.Millisecond, "got %d events, expected at least %d", last, expected) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@br/pkg/streamhelper/subscription_test.go` around lines 47 - 51, The helper waitAtLeastEvents currently calls require.Eventually with a bare predicate so failures only show "condition never satisfied"; modify waitAtLeastEvents to record the last observed count inside the predicate (calling sub.Events()) and pass a descriptive msgAndArgs to require.Eventually that includes that lastObserved value and the expected value so test failures show how many events actually arrived; update references in this function (waitAtLeastEvents, FlushSubscriber, Events()) only—no other API changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@br/pkg/streamhelper/subscription_test.go`:
- Around line 47-51: The helper waitAtLeastEvents currently calls
require.Eventually with a bare predicate so failures only show "condition never
satisfied"; modify waitAtLeastEvents to record the last observed count inside
the predicate (calling sub.Events()) and pass a descriptive msgAndArgs to
require.Eventually that includes that lastObserved value and the expected value
so test failures show how many events actually arrived; update references in
this function (waitAtLeastEvents, FlushSubscriber, Events()) only—no other API
changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8fde9afe-1b3e-43f4-9813-48f92c709a5b
📒 Files selected for processing (1)
br/pkg/streamhelper/subscription_test.go
Codecov Report✅ All modified and coverable lines are covered by tests. Please upload reports for the commit 27b50b3 to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## master #67859 +/- ##
================================================
+ Coverage 77.5862% 78.6346% +1.0484%
================================================
Files 1982 1984 +2
Lines 548966 549132 +166
================================================
+ Hits 425922 431808 +5886
+ Misses 122239 116308 -5931
- Partials 805 1016 +211
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
|
@yinsustart: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
YuJuncen
left a comment
There was a problem hiding this comment.
Better to extend expected checkpoint range instead of dropping events.
[LGTM Timeline notifier]Timeline:
|
| } | ||
| }() | ||
|
|
||
| req.Equal(cp, s.MinValue(), "%d vs %d", cp, s.MinValue()) |
There was a problem hiding this comment.
ASSERT: cp ≥ s.MinValue()
If you noticed cp < s.MinValue() in a "flaky" test, there must be other problems. Upload full log and ask for help.
|
FlakyFixer PR update summary Fix
Verification Spec:
Observed result:
Gate checklist:
Commands:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
br/pkg/streamhelper/subscription_test.go (1)
77-89: Consider hardening failpoint cleanup witht.Cleanup.In the happy path the spawned goroutine reliably disables
subscription.listenOver.aboutToSend(it survivest.FailNow()on the test goroutine sinceGoexitonly terminates the calling goroutine, andreleaseErris buffered). However, iffailpoint.Disableitself returns an error,req.NoError(<-releaseErr)fails the test while leaving the failpoint enabled and leaking it into subsequent tests in the package. At.Cleanupregistered immediately afterEnablewould make disable idempotent and guaranteed.♻️ Suggested hardening
fp := "github.com/pingcap/tidb/br/pkg/streamhelper/subscription.listenOver.aboutToSend" req.NoError(failpoint.Enable(fp, "pause")) +t.Cleanup(func() { _ = failpoint.Disable(fp) }) releaseErr := make(chan error, 1) cp = c.advanceCheckpoints() c.flushAll() go func() { time.Sleep(10 * time.Millisecond) releaseErr <- failpoint.Disable(fp) }()As per coding guidelines: "Unit tests in a package that uses failpoints: MUST enable failpoints before tests and disable afterward."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@br/pkg/streamhelper/subscription_test.go` around lines 77 - 89, After calling failpoint.Enable (fp := "github.com/pingcap/tidb/br/pkg/streamhelper/subscription.listenOver.aboutToSend" and req.NoError(failpoint.Enable(fp, "pause"))), register a t.Cleanup that disables the failpoint (call failpoint.Disable(fp) and ignore or log its error) so disable is idempotent and always run even if the test fails; keep the existing goroutine that writes to releaseErr but ensure the cleanup is the guaranteed fallback to avoid leaking the failpoint.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@br/pkg/streamhelper/subscription_test.go`:
- Around line 77-89: After calling failpoint.Enable (fp :=
"github.com/pingcap/tidb/br/pkg/streamhelper/subscription.listenOver.aboutToSend"
and req.NoError(failpoint.Enable(fp, "pause"))), register a t.Cleanup that
disables the failpoint (call failpoint.Disable(fp) and ignore or log its error)
so disable is idempotent and always run even if the test fails; keep the
existing goroutine that writes to releaseErr but ensure the cleanup is the
guaranteed fallback to avoid leaking the failpoint.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a612be71-da5e-4a3f-a676-f50ecc46e0b2
📒 Files selected for processing (1)
br/pkg/streamhelper/subscription_test.go
|
@flaky-claw: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #67839
Problem Summary:
Flaky test
TestSubBasicinbr/pkg/streamhelperintermittently fails, so this PR stabilizes that path.What changed and how does it work?
Root Cause
TEST_ISSUE in
TestSubBasicwaiting logic, where a single quiet poll could falsely conclude event drain completion before all queued flush events arrived.Fix
TestSubBasicmust deterministically wait for the full expected flush-event set beforeDrop()to avoid truncating event delivery and producing false checkpoint mismatches.Verification
Spec:
br/pkg/streamhelper :: TestSubBasictidb.go_flaky.defaultBASELINE_ONLYObserved result:
Gate checklist:
Commands:
go test -json ./br/pkg/streamhelper -run '^TestSubBasic$' -count=1go test -json ./br/pkg/streamhelper -count=1make buildCheck List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Fixes #67839
Summary by CodeRabbit