executor: reduce TestDistSQLSharedKVRequestRace iterations to fix CI timeout#67675
executor: reduce TestDistSQLSharedKVRequestRace iterations to fix CI timeout#67675joechenrh wants to merge 3 commits intopingcap:masterfrom
Conversation
…timeout The test runs 5 replica-read modes × 20 iterations × 2 queries = 200 queries with the race detector enabled. On CI this takes ~278s, barely under the 5-minute Bazel "moderate" timeout, causing flaky timeouts. Reduce iterations from 20 to 5. The race detector catches data races deterministically on first occurrence, and the RequestBuilder.used safety check catches any reuse regression even with a single iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Review Complete Findings: 0 issues ℹ️ Learn more details on Pantheon AI. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReduced the iteration count in TestDistSQLSharedKVRequestRace from 20 to 5; test SQL statements and assertions remain unchanged. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @joechenrh. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #67675 +/- ##
================================================
- Coverage 77.8173% 77.5996% -0.2178%
================================================
Files 2023 1965 -58
Lines 556183 557000 +817
================================================
- Hits 432807 432230 -577
- Misses 121632 124739 +3107
+ Partials 1744 31 -1713
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
🔍 Starting code review for this PR... |
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 3
- Inline comments: 3
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
🟡 [Minor] (3)
- Race regression test lost replay coverage after loop-count reduction (pkg/executor/test/distsqltest/distsql_test.go:138)
- Race reproducer coverage is reduced by cutting loop repetitions (pkg/executor/test/distsqltest/distsql_test.go:138)
- Race-check loop bound changed without intent documentation (pkg/executor/test/distsqltest/distsql_test.go:138)
| for _, mode := range replicaReadModes { | ||
| tk.MustExec(fmt.Sprintf("set session tidb_replica_read = '%s'", mode)) | ||
| for i := 0; i < 20; i++ { | ||
| for i := 0; i < 5; i++ { |
There was a problem hiding this comment.
🟡 [Minor] Race regression test lost replay coverage after loop-count reduction
Impact
TestDistSQLSharedKVRequestRace now runs each replica-read mode 5 times instead of 20, reducing stress executions from 200 to 50 across the two query paths.
This reduces replay/retry sampling for a schedule-sensitive race regression, so repeated runs no longer provide the prior detection confidence.
Scope
pkg/executor/test/distsqltest/distsql_test.go:138—TestDistSQLSharedKVRequestRace
Evidence
The changed loop bound is for i := 0; i < 5; i++, replacing the previous 20-iteration stress loop in TestDistSQLSharedKVRequestRace.
That loop wraps both force index(ic) and index-merge query checks for every tidb_replica_read mode, so each mode now executes only one quarter of the previous repetition count.
Change request
Restore the stress loop count to the previous level, or introduce an explicit deterministic stress knob with documented rationale for lower coverage.
Keep per-mode repeated executions high enough that race detection remains stable across reruns and scheduler variance.
| for _, mode := range replicaReadModes { | ||
| tk.MustExec(fmt.Sprintf("set session tidb_replica_read = '%s'", mode)) | ||
| for i := 0; i < 20; i++ { | ||
| for i := 0; i < 5; i++ { |
There was a problem hiding this comment.
🟡 [Minor] Race reproducer coverage is reduced by cutting loop repetitions
Impact
TestDistSQLSharedKVRequestRace is the regression guard for shared kv.Request race behavior from issue 60175, and this patch cuts repeated executions from 20 to 5 per replica-read mode.
The reduced repetition lowers scheduler interleaving coverage, allowing low-frequency race regressions to pass this guard.
Scope
pkg/executor/test/distsqltest/distsql_test.go:138—TestDistSQLSharedKVRequestRace
Evidence
In TestDistSQLSharedKVRequestRace, the inner loop now runs 5 iterations instead of 20 while executing the same two query paths each round.
The function comment marks this test as the regression check for https://github.com/pingcap/tidb/issues/60175, so reducing only the repetition count removes stress coverage without adding a deterministic trigger.
Change request
Restore the previous repetition budget or replace it with a deterministic concurrency trigger that guarantees the race window is exercised on every run.
If test time is the concern, keep a fast-path count here only with an additional stress variant that preserves equivalent race-detection strength in CI.
| for _, mode := range replicaReadModes { | ||
| tk.MustExec(fmt.Sprintf("set session tidb_replica_read = '%s'", mode)) | ||
| for i := 0; i < 20; i++ { | ||
| for i := 0; i < 5; i++ { |
There was a problem hiding this comment.
🟡 [Minor] Race-check loop bound changed without intent documentation
Impact
TestDistSQLSharedKVRequestRace is explicitly tied to issue 60175, but the new 5 iteration bound is an unexplained magic number in a stress-style test.
Without rationale for why this bound is sufficient, later edits can keep shrinking or reshaping the probe while the test name still implies strong race-coverage intent.
Scope
pkg/executor/test/distsqltest/distsql_test.go:138—TestDistSQLSharedKVRequestRace
Evidence
The diff changes the inner loop from for i := 0; i < 20; i++ to for i := 0; i < 5; i++ at line 138.
The nearby comments only label query forms (index lookup and index merge) and do not document the expected repetition invariant or the tradeoff behind the new bound.
Change request
Introduce an intent-revealing constant name for this loop bound and add a short comment explaining why the chosen count is sufficient for the 60175 regression guard.
Document the accepted coverage/performance tradeoff at this line so future maintainers can adjust it without guesswork.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
@joechenrh: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: ref xxx
Problem Summary:
TestDistSQLSharedKVRequestRacefrequently times out in CI (pull_unit_test_next_gen). With the race detector enabled, the test runs 5 replica-read modes × 20 iterations × 2 queries = 200 queries on a partitioned table, taking ~278s on CI — dangerously close to the 5-minute Bazel "moderate" timeout. This causes flaky timeouts (example).What changed and how does it work?
Reduce the inner loop iterations from 20 to 5 (total queries: 200 → 50). The race detector catches data races deterministically on first occurrence, and the
RequestBuilder.usedsafety check (added in #61376) catches any builder-reuse regression even with a single iteration. 5 iterations is more than sufficient for confidence.Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit