[DNM] *: test by joechenrh · Pull Request #67613 · pingcap/tidb

joechenrh · 2026-04-08T08:52:24Z

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What changed and how does it work?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

…eadRangeFromProps - Replace seekPropsOffsets with getReadRangeFromProps that returns [2][]uint64 (start+end offsets) per key per file - Remove getFilesReadConcurrency from engine.go, move concurrency logic into cachedReader.open - Replace readOneFile with cachedReader pattern (sequentialReader + concurrentReader) for better file read reuse - Add rangeProperty.totalSize() helper Signed-off-by: Ruihao Chen <joechenrh@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… writers

… probe Tasks 1-14 added opt-in zstd v1 compression on the encode-and-sort writer side, but the production readers were hardcoded to v0 because (a) the merge step's task meta carries only data files, not stat files, and (b) the ingest-path readers type-asserted the result of openKVStream back to *KVReader to access byteReader concurrent-prefetch internals. Turning the sysvar on would deterministically break IMPORT INTO global-sort pipelines in the merge and ingest stages. Fix without widening task-meta schemas: * detectDataFileFormat: probe the first 6 bytes of a data file for the TGSC magic. v0 data files begin with an 8-byte big-endian key length whose high bytes can never collide with the magic, so this is safe to call on either format. * StreamingV1KVReader: new reader that opens a v1 data file, skips the 6-byte header, and wraps the rest in zstd.NewReader. zstd frames are self-delimiting, so a concatenation of segment frames decodes back to the same byte sequence as reading each segment individually. No stat file or props required. * kvReaderProxy: extended to hold either *KVReader (v0) or *StreamingV1KVReader (v1). switchConcurrentMode is a no-op for v1 because streaming zstd has no concurrent-prefetch mode. * NewMergeKVIter, readOneKVFile2Ch, readOneFile: per-file detectDataFileFormat dispatch. v0 keeps the existing KVReader + byteReader concurrent prefetch. v1 uses StreamingV1KVReader. For readOneFile the v0 startOffset hint is ignored in v1 mode; the existing key-range filter in the loop drops out-of-range keys. Correctness trade-off: v1 ingest reads the whole data file even when the job range is smaller than the file. The worst case is O(filesize), same order of magnitude as the v0 path. Optimization via SegmentKVReader with filtered props can land in a follow-up once the task-meta schemas carry the props slice. Tests (reader_test.go, all end-to-end): * TestNewMergeKVIter_V1Files: merge 2+ v1 files, assert merged KV order. * TestNewMergeKVIter_MixedV0V1: merge v0 + v1 files in one iterator, assert the dispatch picks the right reader per file. * TestReadAllData_V1Files: readAllData over v1 files, assert KV set. * TestReadKVFilesAsync_V1Files: ReadKVFilesAsync path (used by conflict resolution) over v1 files. Validation: whole-package short with failpoints passes (30.6s); scoped golangci-lint reports zero new issues in the modified code.

Before this commit the v1 ingest path read the entire data file whenever any part of it was requested, because cachedReader.open (from PR pingcap#67382) only knew how to build v0 sequentialReader / concurrentReader. The streaming v1 fallback worked but dropped most of the bytes via the in-loop key-range filter — O(filesize) wasted work per batch. This commit teaches cachedReader to open v1 files at their exact [startOffset, endOffset) physical byte range: - getReadRangeFromProps: return per-file fileVersions alongside ReadRanges. Extracted from the statsReader.version() that the function already reads — no extra opens. For v1 props the end offset is now p.offset + p.compressedSize (physical frame boundary), not p.offset + p.totalSize() (uncompressed record bytes), because rangeProperty.offset is physical in v1. - cachedReader: new setFormat helper that callers invoke after getReadRangeFromProps to record the v0/v1 format. open() uses the cached format to dispatch between sequentialReader/concurrentReader (v0) and the new rangedV1Reader (v1). Callers that don't setFormat fall back to a detectDataFileFormat probe — correct but costs one extra store.Open on the data file. All callers in this tree (engine.go, merge_v2.go, testutil.go, TestReadAllData_V1Files, TestReadAllDataReuseSequentialReaderAcrossBatches, TestReadLargeFile) now call setFormat. - rangedV1Reader: new fileReader implementation that opens [startOffset, endOffset) on the data file and wraps the bytes in a streaming zstd.NewReader. Clamps startOffset to at least fileHeaderLen so getReadRangeFromProps's zero-initialized "start key not in this file" offset doesn't land inside the 6-byte file header. An empty range returns an EOF-only reader without touching storage. Implements reserve() for carry-over across batches; v1 readers are still not reused across batches (cachedReader.canReuse returns false for v1) because each batch is bound to a specific zstd-frame range. - TestReadAllData_V1Files now uses readRanges[0].Start / readRanges[1].End and calls setFormat on the cachedReaders, the same way production callers do. All 4 v1 tests pass (merge, mixed merge, readAllData ingest, ReadKVFilesAsync) and the PR pingcap#67382 reuse test passes under isolated/targeted runs. Known pre-existing test flakiness (NOT caused by this commit): TestReadAllDataReuseSequentialReaderAcrossBatches fails under the whole-package short run with failpoints enabled because byte_reader_test.go:251 mutates the global ConcurrentReaderBufferSizePerConc without restoring it. Verified by checking out the pristine PR pingcap#67382 branch (mine/fix-global-sort-read-reuse) and observing the same failure. The PR owner should fix the byte_reader_test.go leak separately. Validation: targeted runs pass for TestNewMergeKVIter_V1Files, TestNewMergeKVIter_MixedV0V1, TestReadAllData_V1Files, TestReadKVFilesAsync_V1Files, TestReadAllDataReuseSequentialReaderAcrossBatches, TestReadLargeFile, TestGetReadRangeFromProps, TestReadAllDataBasic.

Two hardening fixes from an adversarial review pass: 1. getReadRangeFromProps (util.go): once the stat file is detected as v1, require p.compressedSize > 0 for every prop. Previously the code used compressedSize == 0 as a dual-purpose sentinel: "no compressed frame, fall back to v0 totalSize() math". That is fine for legitimate v0 files but fails-open on a truncated or otherwise corrupt v1 prop — the downstream reader would open an arbitrary byte range inside compressed frames and surface as a cryptic zstd decode error later, after possibly skipping rows. Now the v1 branch errors out immediately with an explicit "zero compressedSize" corruption message while the v0 branch keeps the existing totalSize() math. 2. NewMergeKVIter (iter.go): reject non-zero pathsStartOffset on v1 files with an explicit error instead of silently ignoring the hint. v0 offsets are uncompressed-byte offsets which have no meaning inside a zstd stream, and the merge step's task meta does not carry segment metadata today, so a v1 reader cannot honor a mid-stream resume. Today every production caller (MergeOverlappingFiles, ddl backfill merge) passes zeroOffsets, but the API surface allowed a non-zero value through which would produce duplicate or out-of-range KVs. Fail fast on that shape instead. Tests: - TestGetReadRangeFromProps_V1ZeroCompressedSizeRejected: hand-crafts a v1 stat file with a prop whose compressedSize is zero and asserts getReadRangeFromProps rejects it. The search key is chosen to land on the corrupt prop so the validation branch fires. - TestNewMergeKVIter_V1RejectsNonZeroStartOffset: writes a v1 file via Writer with SetCompression(CompressionZstd), then calls NewMergeKVIter with offsets[0]=128 and asserts the returned error mentions "non-zero pathsStartOffset". The test intentionally does NOT call iter.Close() on the error path: NewMergeKVIter returns a non-nil *MergeKVIter whose internal mergeIter is nil when a reader-opener errors, and Close() would panic on the nil — that is a separate pre-existing quirk of NewMergeKVIter's constructor, not in scope for this change. Validation: all 10 targeted tests pass (4 v1 tests + 2 hardening tests + 4 PR pingcap#67382/getReadRangeFromProps tests).

Flips the default from "none" to "zstd" so the next image built from this branch runs every global-sort encode-and-sort writer in v1 mode by default. Use case: let the user produce a test image without having to set the session/global sysvar manually on each cluster. The setting remains a ScopeGlobal|ScopeSession enum with allowed values ["none", "zstd"], so operators can still fall back to "none" at runtime if compression ever causes a regression. The reader path auto-detects v0 vs v1 from the file header and does NOT consult this sysvar, so a rolling upgrade between binaries with different defaults is safe in either direction — workers running on "zstd" happily read v0 files produced before the upgrade, and workers rolled back to "none" still read v1 files produced during the upgrade window. Test: TestTiDBGlobalSortCompressionRegistered updated to expect "zstd" as the default value instead of "none". Passes.

pantheon-ai · 2026-04-08T08:52:32Z

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai · 2026-04-08T08:52:35Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e3508f6b-5820-433d-b537-93561b0d1f05

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ti-chi-bot · 2026-04-08T08:52:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign benjamin2037, terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tiprow · 2026-04-08T08:52:45Z

Hi @joechenrh. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-04-08T08:53:45Z

@joechenrh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-error-log-review	`03fbe88`	link	false	`/test pull-error-log-review`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pantheon-ai

✅ Code looks good. No issues found.

Signed-off-by: Ruihao Chen <joechenrh@gmail.com>

Diagnostic logs at three points so the next test image makes compression state unambiguous from job logs alone: 1. encode_and_sort_operator.newChunkWorker: logs the Plan's GlobalSortCompression string and the translated CompressionAlgo when the IMPORT INTO worker is created. If this line shows compression-algo=1 (CompressionZstd) the operator wired the dispatch correctly; if it shows 0 the Plan field was not populated. 2. Writer.flushSortedKVs: logs once per flushed (data, stat) file pair whether a v1 zstd header is being written or a v0 raw file. If the encode-and-sort worker is wired for zstd, you will see "writing v1 zstd file header" per file. If you see "writing v0 raw file" instead, the dispatch never reached the writer (e.g. stale binary, or a code path that builds Writer with CompressionNone). 3. OneFileWriter.lazyInitWriter: same pair of log lines for the single-file variant (used by the merge step's output writer). This is temporary debugging output. Safe to remove once the user confirms the image is picking up compression as expected.

Codex/user question: "I don't see zstd related functions in tidb-worker profile. Is the config passed to the worker node correctly?" Two logging points + one round-trip test to make the answer unambiguous the next time the test image is rebuilt: 1. importer.NewImportPlan (submitter side): logs plan-global-sort-compression right after initOptions completes, so the node that built the plan prints what it shipped into TaskMeta. 2. importinto.getTableImporter (executor side): logs plan-global-sort-compression right after TaskMeta has been unmarshalled, before building the LoadDataController. If the submitter logged "zstd" but this line logs "" the JSON round-trip dropped the field; if both show "zstd" the operator's own log line (added in the previous commit) will tell us whether the dispatch actually fired. 3. TestPlan_GlobalSortCompression_JSONRoundTrip: a unit test that json.Marshal+json.Unmarshals a Plan with GlobalSortCompression set and asserts the value round-trips. This test PASSES on this branch, proving the default serialization path preserves the field. Any future json:"-" tag or MarshalJSON change that regresses the path will break the test. So the config IS propagating at the JSON layer. If the user's image still doesn't show zstd activity after rebuilding from this commit, the problem is elsewhere (likely: stale binary, different profiled node than the encode-sort worker, or an IMPORT INTO small enough that the global-sort step never fires).

ti-chi-bot · 2026-04-08T10:47:04Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests

Unit test

Integration test

Manual test (add detailed scripts or steps below)

No code

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

joechenrh and others added 24 commits March 27, 2026 21:27

lightning/external: reuse sequential readers across ingest batches

86736ac

lightning/external: fix cached reader reuse correctness

7e372c5

lightning/external: clean up read range APIs

adeb6cf

lightning/external: clarify reader reuse docs

0ea901f

external: add v1 file header constants and parser

37e506e

external: add rangeProperty.compressedSize and v1 codec

5a7fc6b

external: add segmentCompressor and zstd encoder pool

77e1907

external: add onBoundary callback to rangePropertiesCollector

ec7fa24

external: integrate segmentCompressor with KeyValueStore

df4460c

external: add SegmentKVReader and zstd decoder pool

9d40861

external: probe file header in statsReader and dispatch v0/v1

c655d6e

external: dispatch KV reads via kvStream interface

312a558

external: wire Writer.flushSortedKVs to compressed v1 format

e410ae7

external: wire OneFileWriter to compressed v1 format

e22823a

external: add Writer compressed end-to-end round-trip test

ad8cbba

variable: add tidb_global_sort_compression sysvar

1d98d56

importer,importinto: pass tidb_global_sort_compression to encode-sort…

c3220c9

… writers

external: cover segmentCompressor write-error path

3639eef

external: bazel metadata and gofmt sweep after compression feature

5b0bb21

ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. labels Apr 8, 2026

joechenrh marked this pull request as draft April 8, 2026 08:52

ti-chi-bot Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 8, 2026

pantheon-ai Bot reviewed Apr 8, 2026

View reviewed changes

joechenrh added 3 commits April 8, 2026 06:18

1

199e0f7

Signed-off-by: Ruihao Chen <joechenrh@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] *: test#67613

[DNM] *: test#67613
joechenrh wants to merge 27 commits intopingcap:masterfrom
joechenrh:feature/global-sort-segment-compression

joechenrh commented Apr 8, 2026

Uh oh!

pantheon-ai Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

Review skipped

Uh oh!

ti-chi-bot Bot commented Apr 8, 2026

Uh oh!

tiprow Bot commented Apr 8, 2026

Uh oh!

ti-chi-bot Bot commented Apr 8, 2026

Uh oh!

pantheon-ai Bot left a comment

Uh oh!

ti-chi-bot Bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joechenrh commented Apr 8, 2026

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

pantheon-ai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ti-chi-bot Bot commented Apr 8, 2026

Uh oh!

tiprow Bot commented Apr 8, 2026

Uh oh!

ti-chi-bot Bot commented Apr 8, 2026

Uh oh!

pantheon-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pantheon-ai Bot commented Apr 8, 2026 •

edited

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading