Skip to content

pkg/util/topsql/reporter: stabilize flaky TestTopRUPipelineInProcessIntegration#67600

Closed
flaky-claw wants to merge 4 commits intopingcap:masterfrom
flaky-claw:flakyfixer/case_ee66a3d888fd-a2
Closed

pkg/util/topsql/reporter: stabilize flaky TestTopRUPipelineInProcessIntegration#67600
flaky-claw wants to merge 4 commits intopingcap:masterfrom
flaky-claw:flakyfixer/case_ee66a3d888fd-a2

Conversation

@flaky-claw
Copy link
Copy Markdown
Contributor

@flaky-claw flaky-claw commented Apr 8, 2026

What problem does this PR solve?

Issue Number: close #67578

Problem Summary:
Flaky test TestTopRUPipelineInProcessIntegration in pkg/util/topsql/reporter intermittently fails, so this PR stabilizes that path.

What changed and how does it work?

Root Cause

reportWorker was changed to treat taken SQL/plan meta maps as immutable even though in-flight RegisterSQL/RegisterPlan calls can still write into the old map after enqueue via a stale loaded pointer

Fix

restored the reportWorker settle delay and replaced the prior regression with a deterministic stale-pointer timing repro that validates the real in-flight meta registration race

Verification

native flaky test stayed weak pre-fix, the new timing-only repro failed before the fix, and after the fix the repro plus TestTopRUPipelineInProcessIntegration passed and make lint succeeded

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Fixes #67578

Summary by CodeRabbit

  • Tests
    • Added a concurrency test ensuring report assembly waits for in-flight SQL/plan registrations so payloads include concurrently-registered entries.
  • Bug Fixes
    • Improved reporting consistency and concurrency handling to avoid omitting metadata when registrations overlap with report construction.
  • Chores
    • Restored a backward-compatible make target alias for legacy automation.

@ti-chi-bot ti-chi-bot Bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 8, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 8, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 8, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign charlescheung96 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pingcap-cla-assistant
Copy link
Copy Markdown

pingcap-cla-assistant Bot commented Apr 8, 2026

CLA assistant check
All committers have signed the CLA.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 8, 2026

Hi @flaky-claw. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added RWMutex synchronization and package-level test hooks to TopSQL datamodel and reporter to coordinate timing between in-flight SQL/plan meta registrations and the report worker; added a concurrency test to verify racing registrations are included in emitted payloads; added a legacy Makefile alias.

Changes

Cohort / File(s) Summary
Datamodel hooks & sync
pkg/util/topsql/reporter/datamodel.go
Added var normalizedMetaRegisterAfterLoadHook func(); introduced sync.RWMutex around normalizedSQLMap/normalizedPlanMap; register uses RLock and invokes the hook after loading the map pointer and before LoadOrStore; take uses Lock while swapping maps and resetting length.
Reporter hook
pkg/util/topsql/reporter/reporter.go
Added reportWorkerBeforeBuildReportDataHook func() and invoke it in RemoteTopSQLReporter.reportWorker() immediately after dequeuing a payload and before converting/packing data for the report.
Tests
pkg/util/topsql/reporter/reporter_test.go
Added TestReportWorkerWaitsForInFlightSQLMetaRegistration that uses the new hooks to orchestrate a race and assert both seed and racing metas appear in the payload; tightened readiness check in TestTopRUPipelineInProcessIntegration.
Build target alias
Makefile
Added .PHONY target nogo as a backward-compatible alias that depends on lint.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Registrar as RegisterSQL / RegisterPlan
    participant Hook as normalizedMetaRegisterAfterLoadHook
    participant Reporter as RemoteTopSQLReporter
    participant Map as normalizedSQLMap (sync.Map)

    Client->>Registrar: call RegisterSQL (loads map pointer)
    Registrar->>Hook: invoke hook after load (may block)
    Hook-->>Registrar: unblock to continue
    Registrar->>Map: LoadOrStore metadata

    Reporter->>Reporter: dequeues payload from channel
    Reporter->>reportWorkerBeforeBuildReportDataHook: invoke hook (signals worker started)
    reportWorkerBeforeBuildReportDataHook-->>Reporter: unblocks worker
    Reporter->>Map: read metas to build ReportData
    Reporter->>Client: send ReportData (contains seed + racing metas)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

ok-to-test, approved, lgtm

Suggested reviewers

  • XuHuaiyu
  • yibin87
  • qw4990

Poem

🐰
I hold a hook with gentle squeeze,
So racing metas find their ease.
The worker waits beneath the moon,
Two digests meet — then out they zoom,
A hopping payload — snug as cheese! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: stabilizing a flaky test in pkg/util/topsql/reporter by fixing a race condition in reportWorker.
Description check ✅ Passed The PR description follows the template with all required sections: Issue Number linking to #67578, Problem Summary, What Changed explanation, and Check List with unit test verified.
Linked Issues check ✅ Passed The PR addresses the root cause of the flaky test (#67578) by restoring the reportWorker settle delay and adding a deterministic timing-based test that validates the in-flight meta registration race is properly handled.
Out of Scope Changes check ✅ Passed All changes directly support fixing the flaky test: mutex/hooks for race synchronization, test improvements with precise bucket inspection, Makefile alias for legacy tooling, and no unrelated modifications present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 28.57143% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.9877%. Comparing base (72f9da0) to head (5d764c4).
⚠️ Report is 61 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67600        +/-   ##
================================================
- Coverage   77.5455%   76.9877%   -0.5578%     
================================================
  Files          1963       1946        -17     
  Lines        544697     548513      +3816     
================================================
- Hits         422388     422288       -100     
- Misses       121499     125931      +4432     
+ Partials        810        294       -516     
Flag Coverage Δ
integration 40.9103% <28.5714%> (+6.5706%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (ø)
parser ∅ <ø> (∅)
br 48.0815% <ø> (-11.3094%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/util/topsql/reporter/datamodel.go (1)

640-655: ⚠️ Potential issue | 🟠 Major

Post-take() stale registrations increment the wrong generation's count.

Both register() paths snapshot data before take() can swap the backing map, but m.length.Add(1) always updates the receiver's current counter afterward. That lets a stale insert land in the old map while inflating the new generation's length, which can make the next interval hit MaxCollect early and drop fresh metas. The worker delay only fixes old-map visibility; it does not fix this counter drift.

Please swap the map and its count as one generation object, or otherwise wait for pre-take() registrations before resetting the counter.

Also applies to: 708-723

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/util/topsql/reporter/datamodel.go` around lines 640 - 655, The register()
path can increment the wrong generation counter because it snapshots m.data but
always calls m.length.Add(1) on the current receiver; change the generation
model so map+counter are swapped together: introduce a generation struct (e.g.,
gen { data *sync.Map; length *atomic.Int64 }) and have m.data.Load()/Store()
return that generation object; in normalizedSQLMap.register() load the gen
atomically, call gen.data.LoadOrStore(...), and if not loaded call
gen.length.Add(1) so the increment applies to the same generation whose map you
inserted into; apply the same change to the other similar block referenced
(lines ~708-723) so all registrations update the correct generation counter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/util/topsql/reporter/reporter_test.go`:
- Around line 197-208: The test must ensure reportWorker() exits before
restoring package hooks; change the cleanup order so tsr.Close() runs first,
then wait for the worker goroutine to finish via a done channel, and only after
that restore reportWorkerBeforeBuildReportDataHook and
normalizedMetaRegisterAfterLoadHook and close releaseRegister; specifically,
create a done chan (e.g. done := make(chan struct{}), have the worker or test
signal close(done) when reportWorker returns, call tsr.Close() in cleanup, then
block on <-done before resetting the package globals and closing releaseRegister
to avoid races and leaks.

In `@pkg/util/topsql/reporter/reporter.go`:
- Around line 377-383: The 100ms time.Sleep in reportWorker is a brittle timing
workaround; replace it with an explicit generation/in-flight handoff so
RegisterSQL/RegisterPlan cannot race with reportWorker serialization. Implement
a generation token or in-flight counter (e.g., an atomic uint64 or
sync.WaitGroup) that RegisterSQL/RegisterPlan bump or register with when
mutating the maps and that reportWorker reads before building payload (see
reportWorker and reportWorkerBeforeBuildReportDataHook); then have reportWorker
wait deterministically for that generation/in-flight set to quiesce (or perform
a safe map swap under a mutex) instead of sleeping so serialization only
proceeds after all LoadOrStore operations for that generation have completed.
Ensure users of Load/LoadOrStore observe the same generation protocol.

---

Outside diff comments:
In `@pkg/util/topsql/reporter/datamodel.go`:
- Around line 640-655: The register() path can increment the wrong generation
counter because it snapshots m.data but always calls m.length.Add(1) on the
current receiver; change the generation model so map+counter are swapped
together: introduce a generation struct (e.g., gen { data *sync.Map; length
*atomic.Int64 }) and have m.data.Load()/Store() return that generation object;
in normalizedSQLMap.register() load the gen atomically, call
gen.data.LoadOrStore(...), and if not loaded call gen.length.Add(1) so the
increment applies to the same generation whose map you inserted into; apply the
same change to the other similar block referenced (lines ~708-723) so all
registrations update the correct generation counter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c98bdfbb-c48a-4bfd-bced-902c85203f5b

📥 Commits

Reviewing files that changed from the base of the PR and between 72f9da0 and 2d64e36.

📒 Files selected for processing (3)
  • pkg/util/topsql/reporter/datamodel.go
  • pkg/util/topsql/reporter/reporter.go
  • pkg/util/topsql/reporter/reporter_test.go

Comment on lines +197 to +208
tsr := NewRemoteTopSQLReporter(mockPlanBinaryDecoderFunc, mockPlanBinaryCompressFunc)
tsr.BindKeyspaceName([]byte("ks-race"))
t.Cleanup(tsr.Close)
t.Cleanup(func() {
select {
case <-releaseRegister:
default:
close(releaseRegister)
}
normalizedMetaRegisterAfterLoadHook = origRegisterHook
reportWorkerBeforeBuildReportDataHook = origReportHook
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait for reportWorker() to exit before restoring the package hooks.

t.Cleanup runs LIFO, so this currently restores reportWorkerBeforeBuildReportDataHook / normalizedMetaRegisterAfterLoadHook before tsr.Close(). Also, Close() does not join reportWorker(). On an early-failing path, that leaves a real background worker racing with hook restoration and can leak hook state into later tests.

Track the worker with a done channel, close the reporter first, wait for the goroutine to exit, and only then restore the package globals.

Also applies to: 210-212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/util/topsql/reporter/reporter_test.go` around lines 197 - 208, The test
must ensure reportWorker() exits before restoring package hooks; change the
cleanup order so tsr.Close() runs first, then wait for the worker goroutine to
finish via a done channel, and only after that restore
reportWorkerBeforeBuildReportDataHook and normalizedMetaRegisterAfterLoadHook
and close releaseRegister; specifically, create a done chan (e.g. done :=
make(chan struct{}), have the worker or test signal close(done) when
reportWorker returns, call tsr.Close() in cleanup, then block on <-done before
resetting the package globals and closing releaseRegister to avoid races and
leaks.

Comment thread pkg/util/topsql/reporter/reporter.go
Copy link
Copy Markdown

@pantheon-ai pantheon-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

@flaky-claw flaky-claw force-pushed the flakyfixer/case_ee66a3d888fd-a2 branch from 2d64e36 to 7a687fc Compare April 8, 2026 08:24
@flaky-claw
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 10, 2026

@flaky-claw: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@flaky-claw
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 10, 2026

@flaky-claw: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@flaky-claw
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 10, 2026

@flaky-claw: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@flaky-claw
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 11, 2026

@flaky-claw: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@flaky-claw
Copy link
Copy Markdown
Contributor Author

FlakyFixer PR update summary

Fix

  • Synchronizing register and take with RWMutex is necessary to close the race deterministically instead of relying on timing sleeps.

Verification

Spec:

  • target: pkg/util/topsql/reporter :: TestTopRUPipelineInProcessIntegration
  • strategy: tidb.go_flaky.default
  • requirements: required case must execute; no skip; repeat count = 1

Observed result:

  • status: passed
  • required case executed: yes
  • note: Validation shows native path behavior remains green, deterministic pre-fix race fails, and the same deterministic race plus original flaky case pass after the product fix (including failpoint-enabled runs and lint).

Commands:

  • go test -json ./pkg/util/topsql/reporter -run '^TestTopRUPipelineInProcessIntegration$' -count=1

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/util/topsql/reporter/reporter_test.go (1)

198-213: ⚠️ Potential issue | 🟠 Major

Restore the package hooks only after the worker is gone.

Because t.Cleanup is LIFO, the cleanup on Lines 201-209 runs before the tsr.Close registered on Line 200. If this test fails early, the goroutine started on Line 213 can still run while reportWorkerBeforeBuildReportDataHook / normalizedMetaRegisterAfterLoadHook are being reset, which can leak package-global state into later tests.

Suggested fix
 tsr := NewRemoteTopSQLReporter(mockPlanBinaryDecoderFunc, mockPlanBinaryCompressFunc)
 tsr.BindKeyspaceName([]byte("ks-race"))
- t.Cleanup(tsr.Close)
+ reportWorkerDone := make(chan struct{})
+ close(reportWorkerDone) // cleanup-safe before the worker starts
  t.Cleanup(func() {
+     tsr.Close()
+     <-reportWorkerDone
+     normalizedMetaRegisterAfterLoadHook = origRegisterHook
+     reportWorkerBeforeBuildReportDataHook = origReportHook
      select {
      case <-releaseRegister:
      default:
          close(releaseRegister)
      }
-     normalizedMetaRegisterAfterLoadHook = origRegisterHook
-     reportWorkerBeforeBuildReportDataHook = origReportHook
  })

  ch := make(chan *ReportData, 1)
  require.NoError(t, tsr.Register(newMockDataSink(ch)))
- go tsr.reportWorker()
+ reportWorkerDone = make(chan struct{})
+ go func() {
+     defer close(reportWorkerDone)
+     tsr.reportWorker()
+ }()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/util/topsql/reporter/reporter_test.go` around lines 198 - 213, The
package-level hooks (reportWorkerBeforeBuildReportDataHook,
normalizedMetaRegisterAfterLoadHook) are being restored before tsr.Close runs
due to t.Cleanup LIFO ordering; register the cleanup that restores those hooks
(and closes releaseRegister) before registering t.Cleanup(tsr.Close) so the
tsr.Close (and the goroutine shutdown) runs first, then the hooks are reset
afterward; reference the existing tsr.Close call, the hook vars
reportWorkerBeforeBuildReportDataHook and normalizedMetaRegisterAfterLoadHook,
and the releaseRegister channel to locate where to reorder the t.Cleanup
registrations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/util/topsql/reporter/reporter_test.go`:
- Around line 198-213: The package-level hooks
(reportWorkerBeforeBuildReportDataHook, normalizedMetaRegisterAfterLoadHook) are
being restored before tsr.Close runs due to t.Cleanup LIFO ordering; register
the cleanup that restores those hooks (and closes releaseRegister) before
registering t.Cleanup(tsr.Close) so the tsr.Close (and the goroutine shutdown)
runs first, then the hooks are reset afterward; reference the existing tsr.Close
call, the hook vars reportWorkerBeforeBuildReportDataHook and
normalizedMetaRegisterAfterLoadHook, and the releaseRegister channel to locate
where to reorder the t.Cleanup registrations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ec12e1a0-238f-456b-9874-c31eeac12feb

📥 Commits

Reviewing files that changed from the base of the PR and between 7a687fc and 289b6cb.

📒 Files selected for processing (2)
  • pkg/util/topsql/reporter/datamodel.go
  • pkg/util/topsql/reporter/reporter_test.go

@flaky-claw
Copy link
Copy Markdown
Contributor Author

FlakyFixer PR update summary

Fix

  • Adding a backward-compatible nogo alias is the minimal self-contained fix that makes the required validation command executable and prevents the same MUST_FIX finding from recurring.

Verification

Spec:

  • target: pkg/util/topsql/reporter :: TestTopRUPipelineInProcessIntegration
  • strategy: tidb.go_flaky.default
  • requirements: required case must execute; no skip; repeat count = 1

Observed result:

  • status: passed
  • required case executed: yes
  • submission decision: ALLOWED
  • note: Required flaky case executed during validation.
    Required flaky case was not skipped.
    Required flaky gate passed.
    Package regression gate passed.
    Repo pre-push gate passed.
    Feedback specific gate passed.

Gate checklist:

  • Required flaky gate: PASS
  • Package regression gate: PASS
  • Repo pre-push gate: PASS
  • Feedback specific gate: PASS

Commands:

  • go test -json ./pkg/util/topsql/reporter -run '^TestTopRUPipelineInProcessIntegration$' -count=1
  • go test -json ./pkg/util/topsql/reporter -count=1
  • make build
  • make lint
  • make nogo

@flaky-claw
Copy link
Copy Markdown
Contributor Author

FlakyFixer PR update summary

Fix

  • Reordering the two struct fields removes the CI-blocking analyzer failure, and reordering t.Cleanup closes the hook-leak race window without altering product logic.

Verification

Spec:

  • target: pkg/util/topsql/reporter :: TestTopRUPipelineInProcessIntegration
  • strategy: tidb.go_flaky.default
  • requirements: required case must execute; no skip; repeat count = 1

Observed result:

  • status: passed
  • required case executed: yes
  • submission decision: ALLOWED
  • note: Required flaky case executed during validation.
    Required flaky case was not skipped.
    Required flaky gate passed.
    Package regression gate passed.
    Repo pre-push gate passed.
    Feedback specific gate passed.

Gate checklist:

  • Required flaky gate: PASS
  • Package regression gate: PASS
  • Repo pre-push gate: PASS
  • Feedback specific gate: PASS

Commands:

  • go test -json ./pkg/util/topsql/reporter -run '^TestTopRUPipelineInProcessIntegration$' -count=1
  • go test -json ./pkg/util/topsql/reporter -count=1
  • make build
  • make lint
  • make nogo

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 12, 2026

@flaky-claw: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/build 5d764c4 link true /test build
pull-build-next-gen 5d764c4 link true /test pull-build-next-gen
idc-jenkins-ci-tidb/unit-test 5d764c4 link true /test unit-test
pull-unit-test-next-gen 5d764c4 link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/mysql-test 5d764c4 link true /test mysql-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/util/topsql/reporter/reporter_test.go (1)

200-213: ⚠️ Potential issue | 🟠 Major

Join the manual reportWorker before restoring package hooks.

Line 213 starts a background worker, but the cleanup on Lines 200-209 still only relies on tsr.Close() before the globals are reset. The rest of this file already treats Close() as non-joining by waiting on reportWorkerDone, so an early-failing path here can still leave this worker alive while reportWorkerBeforeBuildReportDataHook / normalizedMetaRegisterAfterLoadHook are being restored.

🔧 Suggested cleanup pattern
 	tsr := NewRemoteTopSQLReporter(mockPlanBinaryDecoderFunc, mockPlanBinaryCompressFunc)
 	tsr.BindKeyspaceName([]byte("ks-race"))
+	reportWorkerDone := make(chan struct{})
 	t.Cleanup(func() {
+		tsr.Close()
+		<-reportWorkerDone
 		select {
 		case <-releaseRegister:
 		default:
 			close(releaseRegister)
 		}
 		normalizedMetaRegisterAfterLoadHook = origRegisterHook
 		reportWorkerBeforeBuildReportDataHook = origReportHook
 	})
-	t.Cleanup(tsr.Close)
@@
-	go tsr.reportWorker()
+	go func() {
+		defer close(reportWorkerDone)
+		tsr.reportWorker()
+	}()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/util/topsql/reporter/reporter_test.go` around lines 200 - 213, The test
starts a background goroutine via reportWorker but the t.Cleanup restores global
hooks and calls tsr.Close without ensuring the worker has exited; change the
cleanup to call tsr.Close() then wait for the worker to finish (e.g. wait on
reportWorkerDone or otherwise join the reportWorker) before restoring
normalizedMetaRegisterAfterLoadHook and reportWorkerBeforeBuildReportDataHook
and closing releaseRegister so the background goroutine cannot run while globals
are reset.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/util/topsql/reporter/reporter_test.go`:
- Around line 200-213: The test starts a background goroutine via reportWorker
but the t.Cleanup restores global hooks and calls tsr.Close without ensuring the
worker has exited; change the cleanup to call tsr.Close() then wait for the
worker to finish (e.g. wait on reportWorkerDone or otherwise join the
reportWorker) before restoring normalizedMetaRegisterAfterLoadHook and
reportWorkerBeforeBuildReportDataHook and closing releaseRegister so the
background goroutine cannot run while globals are reset.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c191df9a-c506-45bc-8e7e-ca3b0d26f5d0

📥 Commits

Reviewing files that changed from the base of the PR and between 09e5356 and 5d764c4.

📒 Files selected for processing (2)
  • pkg/util/topsql/reporter/datamodel.go
  • pkg/util/topsql/reporter/reporter_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/util/topsql/reporter/datamodel.go

@yinsustart yinsustart closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestTopRUPipelineInProcessIntegration in pkg/util/topsql/reporter

2 participants