Skip to content

pkg/lightning, tests/realtikvtest: stabilize next-gen add-index tests#67831

Merged
ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
D3Hunter:speedup-add-index
Apr 17, 2026
Merged

pkg/lightning, tests/realtikvtest: stabilize next-gen add-index tests#67831
ti-chi-bot[bot] merged 3 commits intopingcap:masterfrom
D3Hunter:speedup-add-index

Conversation

@D3Hunter
Copy link
Copy Markdown
Contributor

@D3Hunter D3Hunter commented Apr 16, 2026

What problem does this PR solve?

Issue Number: close #67830

Problem Summary:

Next-gen tests/realtikvtest/addindextest cases can spend most of their runtime on split/ingest retry churn and hit the shard timeout budget. The tests need a reliable way to force split-before-ingest and to tune the region-job retry backoff used in the retry path.

What changed and how does it work?

  • add an adjustNeedSplit failpoint hook in the local backend so tests can force split-before-ingest
  • change adjustRegionJobRetryBackoff to override a time.Duration directly, so tests can also use sub-second backoff values when needed
  • add a shared alwaysSplitAndReduceBackoff helper in tests/realtikvtest/addindextest and apply it to the next-gen add-index suites
  • include the corresponding Bazel metadata updates

Measured result on TestCreateNonUniqueIndex:

  • original path without the test helper: 276.38s test time / 279.570s total
  • after this change with the test helper enabled: 64.19s test time / 67.432s total

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Manual test steps:

make failpoint-enable
go test -run TestCreateNonUniqueIndex -tags=intest,deadlock,nextgen -count=1 -v ./tests/realtikvtest/addindextest
make failpoint-disable

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Tests
    • Broadened and unified test instrumentation for index-creation scenarios (concurrent DDL, multi-schema, PiTR and failpoint-driven cases).
    • Added a shared test helper to enable faster, deterministic test paths and to adjust retry/backoff behavior for more reliable test execution.
  • Chores
    • Tweaked test build configuration to improve test parallelism and ensure required test dependencies are included.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 16, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d3dceda0-07f7-46d9-8e85-f1c3842a765d

📥 Commits

Reviewing files that changed from the base of the PR and between 14bd09f and 64ab343.

📒 Files selected for processing (7)
  • lightning/tests/lightning_local_backend/run.sh
  • pkg/lightning/backend/local/local.go
  • tests/realtikvtest/addindextest/add_index_test.go
  • tests/realtikvtest/addindextest/concurrent_ddl_test.go
  • tests/realtikvtest/addindextest/failpoints_test.go
  • tests/realtikvtest/addindextest/multi_schema_change_test.go
  • tests/realtikvtest/addindextest/pitr_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/realtikvtest/addindextest/concurrent_ddl_test.go
  • tests/realtikvtest/addindextest/pitr_test.go
  • pkg/lightning/backend/local/local.go
  • tests/realtikvtest/addindextest/multi_schema_change_test.go
  • tests/realtikvtest/addindextest/failpoints_test.go
  • tests/realtikvtest/addindextest/add_index_test.go

📝 Walkthrough

Walkthrough

Adds failpoint hooks to force split-before-ingest and to let tests override region-job retry backoff; applies these hooks in multiple add-index realtikv tests and updates two Bazel test targets.

Changes

Cohort / File(s) Summary
Build files
br/pkg/metautil/BUILD.bazel, pkg/importsdk/BUILD.bazel
Increased metautil_test shard_count (13→15); added //pkg/parser/ast to importsdk_test deps.
Lightning local backend
pkg/lightning/backend/local/local.go, pkg/lightning/backend/local/region_job.go
Renamed failpoint to forceSplitRegion in prepareAndSendJob; introduced adjustRegionJobRetryBackoff failpoint hook to let tests modify region-job retry backoff before requeueing.
Add-index realtikv tests
tests/realtikvtest/addindextest/add_index_test.go, .../concurrent_ddl_test.go, .../failpoints_test.go, .../multi_schema_change_test.go, .../pitr_test.go
Added enableFastAddIndexFailpoints(t *testing.T) helper and invoked it at the start of many add-index tests to force splitting and set a shorter retry backoff via failpoints.
Lightning local backend test script
lightning/tests/lightning_local_backend/run.sh
Updated injected failpoint name used in script from failToSplit to forceSplitRegion for the "not leader error" test.

Sequence Diagram(s)

sequenceDiagram
  participant Dispatcher as Dispatcher
  participant Failpoint as Failpoint
  participant Time as Time
  participant Retryer as Retryer

  Dispatcher->>Failpoint: InjectCall("adjustRegionJobRetryBackoff", &backoff)
  Failpoint-->>Dispatcher: (maybe modify) backoff
  Dispatcher->>Time: Now().Add(backoff)
  Time-->>Dispatcher: waitUntil timestamp
  Dispatcher->>Retryer: push(job) when retryable
  Retryer-->>Dispatcher: job requeued
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

ok-to-test, approved

Suggested reviewers

  • joechenrh
  • wjhuang2016
  • YangKeao

Poem

🐰 I nudged the splits to leap ahead,
Tweaked backoffs so retries rest instead,
Tests hop lighter, no timeout fright,
Night-long churn turned short and bright! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: stabilizing next-gen add-index tests through failpoint and retry backoff improvements.
Description check ✅ Passed The description includes required sections: issue reference (close #67830), problem summary, detailed explanation of changes, test verification, and release notes.
Linked Issues check ✅ Passed All changes directly address issue #67830: failpoint hooks for forcing split, retry backoff tuning, shared helpers for tests, and Bazel updates match the stated requirements.
Out of Scope Changes check ✅ Passed All changes are in scope: backend failpoint additions, retry backoff modifications, test helper functions, and build file updates align with stabilizing next-gen add-index tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 16, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 16, 2026

Hi @D3Hunter. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter D3Hunter marked this pull request as ready for review April 16, 2026 16:45
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 16, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/realtikvtest/addindextest/failpoints_test.go (1)

24-27: Move failpoint setup after FullMode guards for skipped paths.

At Line 24, Line 38, and Line 70, alwaysSplitAndReduceBackoff(t) runs even when the test immediately skips. For consistency with Line 55 and Line 64 (and to keep skipped paths minimal), place the helper after the if !*FullMode { t.Skip() } guard.

♻️ Suggested refactor
 func TestFailpointsCreateNonUniqueIndex(t *testing.T) {
-	alwaysSplitAndReduceBackoff(t)
 	if !*FullMode {
 		t.Skip()
 	}
+	alwaysSplitAndReduceBackoff(t)
 	var colIDs = [][]int{
@@
 func TestFailpointsCreateUniqueIndex(t *testing.T) {
-	alwaysSplitAndReduceBackoff(t)
 	if !*FullMode {
 		t.Skip()
 	}
+	alwaysSplitAndReduceBackoff(t)
 	var colIDs = [][]int{
@@
 func TestFailpointsCreateMultiColsIndex(t *testing.T) {
-	alwaysSplitAndReduceBackoff(t)
 	if !*FullMode {
 		t.Skip()
 	}
+	alwaysSplitAndReduceBackoff(t)
 	var coliIDs = [][]int{

As per coding guidelines: "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required."

Also applies to: 38-41, 70-73

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/realtikvtest/addindextest/failpoints_test.go` around lines 24 - 27, The
call to alwaysSplitAndReduceBackoff(t) is being executed before the FullMode
guard, causing unnecessary failpoint setup even when the test immediately calls
t.Skip(); move the alwaysSplitAndReduceBackoff(t) invocation to after the if
!*FullMode { t.Skip() } check in each affected test so the failpoint/setup runs
only for FullMode runs; update the three occurrences that currently precede the
FullMode guard so they follow the guard, leaving other behavior unchanged
(references: alwaysSplitAndReduceBackoff, FullMode, t.Skip).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/realtikvtest/addindextest/failpoints_test.go`:
- Around line 24-27: The call to alwaysSplitAndReduceBackoff(t) is being
executed before the FullMode guard, causing unnecessary failpoint setup even
when the test immediately calls t.Skip(); move the
alwaysSplitAndReduceBackoff(t) invocation to after the if !*FullMode { t.Skip()
} check in each affected test so the failpoint/setup runs only for FullMode
runs; update the three occurrences that currently precede the FullMode guard so
they follow the guard, leaving other behavior unchanged (references:
alwaysSplitAndReduceBackoff, FullMode, t.Skip).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7fdffcd5-aba1-46b1-8fcb-9e4c2fa315e0

📥 Commits

Reviewing files that changed from the base of the PR and between 644ebc6 and 14bd09f.

📒 Files selected for processing (9)
  • br/pkg/metautil/BUILD.bazel
  • pkg/importsdk/BUILD.bazel
  • pkg/lightning/backend/local/local.go
  • pkg/lightning/backend/local/region_job.go
  • tests/realtikvtest/addindextest/add_index_test.go
  • tests/realtikvtest/addindextest/concurrent_ddl_test.go
  • tests/realtikvtest/addindextest/failpoints_test.go
  • tests/realtikvtest/addindextest/multi_schema_change_test.go
  • tests/realtikvtest/addindextest/pitr_test.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.3266%. Comparing base (65d9fb6) to head (64ab343).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67831        +/-   ##
================================================
+ Coverage   77.5964%   79.3266%   +1.7302%     
================================================
  Files          1982       1993        +11     
  Lines        548885     549339       +454     
================================================
+ Hits         425915     435772      +9857     
+ Misses       122165     112121     -10044     
- Partials        805       1446       +641     
Flag Coverage Δ
integration 46.8450% <100.0000%> (+12.5049%) ⬆️
unit 76.6443% <100.0000%> (+0.3038%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (+0.0901%) ⬆️
parser ∅ <ø> (∅)
br 66.0289% <ø> (+5.5045%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 16, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ingress-bot
Copy link
Copy Markdown

🔍 Starting code review for this PR...

Copy link
Copy Markdown

@ingress-bot ingress-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.

Summary

  • Total findings: 2
  • Inline comments: 2
  • Summary-only findings (no inline anchor): 0
Findings (highest risk first)

🟡 [Minor] (1)

  1. Workaround helper name/comment do not state the altered test-path contract (tests/realtikvtest/addindextest/add_index_test.go:39, tests/realtikvtest/addindextest/concurrent_ddl_test.go:26, tests/realtikvtest/addindextest/failpoints_test.go:24, tests/realtikvtest/addindextest/multi_schema_change_test.go:24, tests/realtikvtest/addindextest/pitr_test.go:24)

🧹 [Nit] (1)

  1. Failpoint callback parameter name hides time.Duration semantics (tests/realtikvtest/addindextest/add_index_test.go:50, pkg/lightning/backend/local/region_job.go:1039)

Comment thread tests/realtikvtest/addindextest/add_index_test.go Outdated
Comment thread tests/realtikvtest/addindextest/add_index_test.go Outdated
@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 17, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter D3Hunter requested a review from lance6716 April 17, 2026 03:00
Comment thread pkg/lightning/backend/local/local.go Outdated
Comment thread tests/realtikvtest/addindextest/add_index_test.go
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 17, 2026
@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 17, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 17, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-17 03:04:38.398819543 +0000 UTC m=+1703083.604179600: ☑️ agreed by lance6716.
  • 2026-04-17 03:08:31.71732494 +0000 UTC m=+1703316.922684998: ☑️ agreed by wjhuang2016.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 17, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/approve

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Apr 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: D3Hunter, lance6716, Leavrth, wjhuang2016

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Apr 17, 2026
@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 17, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hawkingrei
Copy link
Copy Markdown
Member

/retest

1 similar comment
@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 17, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@D3Hunter
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 17, 2026

@D3Hunter: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot merged commit a5545f5 into pingcap:master Apr 17, 2026
41 checks passed
@D3Hunter D3Hunter deleted the speedup-add-index branch April 17, 2026 09:48
Copy link
Copy Markdown

@pantheon-ai pantheon-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests/realtikvtest: next-gen add-index cases can time out on split/ingest retries

6 participants