br/pkg/stream/crr/internal/checkpoint: stabilize flaky TestCheckpointCalculatorRandomizedCRRSimulation#67826
Conversation
|
@flaky-claw I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Hi @flaky-claw. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
📝 WalkthroughWalkthroughReplaced unconditional per-round logging with an environment-variable guard ( Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #67826 +/- ##
================================================
+ Coverage 77.5964% 78.5801% +0.9837%
================================================
Files 1982 1993 +11
Lines 548885 562697 +13812
================================================
+ Hits 425915 442168 +16253
+ Misses 122165 118887 -3278
- Partials 805 1642 +837
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
YuJuncen
left a comment
There was a problem hiding this comment.
Better to also reduce test rounds.
|
@YuJuncen, I reduced Iterations to 300 because this test still needs enough rounds to exercise at least one full catch-up cycle. With the current config, CatchUpEvery is 299, so lowering iterations to 100 can skip catch-up entirely and leave SyncedTS at 0, causing the randomized simulation to fail for some seeds. 300 is the minimum value assuming no other changes to the logic. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
br/pkg/stream/crr/internal/checkpoint/randomized_integration_test.go (1)
37-38: Avoid weakening the randomized coverage unless the logging fix is insufficient.With
CatchUpEvery: 299,Iterations: 300exercises the catch-up block only once. Since the reported root cause is per-round logging, consider keeping the original iteration count so this PR stays focused on removing the timing overhead without reducing simulation depth.Proposed adjustment
cfg := randomizedCRRSimulationConfig{ - Iterations: 300, + Iterations: 1000,As per coding guidelines, keep test changes minimal and deterministic; avoid broad test churn unless required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@br/pkg/stream/crr/internal/checkpoint/randomized_integration_test.go` around lines 37 - 38, The test weakens randomized coverage by setting CatchUpEvery: 299 while keeping Iterations: 300 so the catch-up path runs only once; update the randomizedCRRSimulationConfig so Iterations is restored to its original/higher value (or at least substantially > CatchUpEvery, e.g., >= CatchUpEvery*3) to ensure multiple catch-up rounds are exercised, keeping the rest of the logging/timing change minimal and deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@br/pkg/stream/crr/internal/checkpoint/randomized_integration_test.go`:
- Around line 37-38: The test weakens randomized coverage by setting
CatchUpEvery: 299 while keeping Iterations: 300 so the catch-up path runs only
once; update the randomizedCRRSimulationConfig so Iterations is restored to its
original/higher value (or at least substantially > CatchUpEvery, e.g., >=
CatchUpEvery*3) to ensure multiple catch-up rounds are exercised, keeping the
rest of the logging/timing change minimal and deterministic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cb609bdd-0095-4666-8a72-da0fed8b2ead
📒 Files selected for processing (1)
br/pkg/stream/crr/internal/checkpoint/randomized_integration_test.go
|
/check-issue-triage-complete |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Leavrth, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@yinsustart: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
5 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
1 similar comment
|
/retest |
|
@flaky-claw: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #67699
Problem Summary:
Flaky test
TestCheckpointCalculatorRandomizedCRRSimulationinbr/pkg/stream/crr/internal/checkpointintermittently fails, so this PR stabilizes that path.What changed and how does it work?
Root Cause
TestCheckpointCalculatorRandomizedCRRSimulationwas flaky because unconditional per-round info logging made the randomized test exceed CI slow-test thresholds under scheduler pressure; this was a test-side timing issue, not a product bug.Fix
The patch removes only the non-essential round trace from normal runs by making it opt-in through
TIDB_RANDOMIZED_CRR_SIM_ROUND_LOG, which cuts timing variance without weakening any assertions.Verification
Spec:
br/pkg/stream/crr/internal/checkpoint :: TestCheckpointCalculatorRandomizedCRRSimulationtidb.go_flaky.defaultBASELINE_ONLYObserved result:
Gate checklist:
Commands:
go test -json ./br/pkg/stream/crr/internal/checkpoint -run '^TestCheckpointCalculatorRandomizedCRRSimulation$' -count=1go test -json ./br/pkg/stream/crr/internal/checkpoint -count=1make buildCheck List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Fixes #67699
Summary by CodeRabbit