Skip to content

[BugFix] Relax DB lock to intensive path in qe/ read-only paths#73067

Merged
huailiu1122 merged 1 commit into
StarRocks:mainfrom
kevincai:bugfix/db-lock-relax-connect-show-executor
May 13, 2026
Merged

[BugFix] Relax DB lock to intensive path in qe/ read-only paths#73067
huailiu1122 merged 1 commit into
StarRocks:mainfrom
kevincai:bugfix/db-lock-relax-connect-show-executor

Conversation

@kevincai
Copy link
Copy Markdown
Contributor

Switch four single-table read paths in qe/ from full DB READ to IS+table-READ via lockTableWithIntensiveDbLock so they no longer block concurrent DDL on other tables in the same database.

  • ShowExecutor.showCreateInternalCatalogTable: move lookup outside the lock; revalidate the resolved Table reference once the intensive lock is held so concurrent DROP/RENAME between lookup and lock acquisition is reported as ERR_BAD_TABLE_ERROR rather than serving DDL from a stale Table.
  • ShowExecutor.visitShowTabletStatement (both branches): use the known/resolved tableId for table-READ; the table-by-name branch also revalidates after locking. The single-tablet branch already re-fetches under the lock and needs no extra check.
  • ConnectProcessor.handleFieldList: move lookup outside the lock; collapse the nested lock-try inside the connector-exception try into a single try-catch-finally; revalidate after locking for internal-catalog DBs (external catalog tables are not tracked by LocalMetastore).

The revalidation closes a TOCTOU window: under the old DB READ, DROP/RENAME could not interleave between lookup and use; under IS+table-READ the lookup is unprotected so the resolved Table can be removed from the DB before the lock is taken. Re-fetching by id after locking and comparing references catches that window and reports the table as not-found, matching the old semantics.

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

Switch four single-table read paths in qe/ from full DB READ to
IS+table-READ via lockTableWithIntensiveDbLock so they no longer
block concurrent DDL on other tables in the same database.

- ShowExecutor.showCreateInternalCatalogTable: move lookup outside
  the lock; revalidate the resolved Table reference once the
  intensive lock is held so concurrent DROP/RENAME between lookup
  and lock acquisition is reported as ERR_BAD_TABLE_ERROR rather
  than serving DDL from a stale Table.
- ShowExecutor.visitShowTabletStatement (both branches): use the
  known/resolved tableId for table-READ; the table-by-name branch
  also revalidates after locking. The single-tablet branch already
  re-fetches under the lock and needs no extra check.
- ConnectProcessor.handleFieldList: move lookup outside the lock;
  collapse the nested lock-try inside the connector-exception try
  into a single try-catch-finally; revalidate after locking for
  internal-catalog DBs (external catalog tables are not tracked by
  LocalMetastore).

The revalidation closes a TOCTOU window: under the old DB READ,
DROP/RENAME could not interleave between lookup and use; under
IS+table-READ the lookup is unprotected so the resolved Table can be
removed from the DB before the lock is taken. Re-fetching by id
after locking and comparing references catches that window and
reports the table as not-found, matching the old semantics.

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@github-actions github-actions Bot requested a review from HangyuanLiu May 11, 2026 01:51
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Copy Markdown
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Copy Markdown
Contributor

[FE Incremental Coverage Report]

fail : 25 / 56 (44.64%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/qe/ShowExecutor.java 16 43 37.21% [868, 869, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 885, 886, 890, 891, 892, 893, 894, 895, 896, 901, 913, 1992, 2002, 2003]
🔵 com/starrocks/qe/ConnectProcessor.java 9 13 69.23% [787, 788, 789, 790]

@github-actions
Copy link
Copy Markdown
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces lock contention in FE qe/ read-only code paths by replacing full DB READ locks with “intensive” locking (IS + table-READ) for single-table operations, so concurrent DDL on other tables in the same DB is less likely to be blocked.

Changes:

  • Updated SHOW CREATE TABLE (internal catalog) to lock via lockTableWithIntensiveDbLock, with a post-lock revalidation step intended to detect DROP/RENAME races.
  • Updated SHOW TABLET to use per-table intensive locks (both single-tablet and table-by-name branches), with post-lock revalidation in the table-by-name branch.
  • Updated COM_FIELD_LIST handling to perform metadata lookup outside the lock and then lock via lockTableWithIntensiveDbLock, with internal-catalog revalidation after locking.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
fe/fe-core/src/main/java/com/starrocks/qe/ShowExecutor.java Switches select SHOW read paths from DB READ to IS+table-READ and adds post-lock revalidation for TOCTOU safety.
fe/fe-core/src/main/java/com/starrocks/qe/ConnectProcessor.java Adjusts COM_FIELD_LIST to use table-level intensive locking and restructures lookup/exception handling with revalidation for internal catalog.

Comment thread fe/fe-core/src/main/java/com/starrocks/qe/ShowExecutor.java
Comment thread fe/fe-core/src/main/java/com/starrocks/qe/ShowExecutor.java
@kevincai kevincai requested a review from starrocks-xupeng May 12, 2026 02:54
@gengjun-git gengjun-git self-assigned this May 12, 2026
@huailiu1122 huailiu1122 merged commit c15972b into StarRocks:main May 13, 2026
111 of 116 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.0

@github-actions github-actions Bot removed the 4.0 label May 13, 2026
@kevincai kevincai deleted the bugfix/db-lock-relax-connect-show-executor branch May 13, 2026 03:09
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.1

@github-actions github-actions Bot removed the 4.1 label May 13, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 13, 2026

backport branch-4.0

✅ Backports have been created

Details

Cherry-pick of c15972b has failed:

On branch mergify/bp/branch-4.0/pr-73067
Your branch is up to date with 'origin/branch-4.0'.

You are currently cherry-picking commit c15972b5ad.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   fe/fe-core/src/main/java/com/starrocks/qe/ConnectProcessor.java

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   fe/fe-core/src/main/java/com/starrocks/qe/ShowExecutor.java

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 13, 2026

backport branch-4.1

✅ Backports have been created

Details

wanpengfei-git pushed a commit that referenced this pull request May 13, 2026
…port #73067) (#73189)

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kevincai <771299+kevincai@users.noreply.github.com>
wanpengfei-git pushed a commit that referenced this pull request May 13, 2026
…port #73067) (#73190)

Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
Co-authored-by: Kevin Cai <kevin.cai@celerdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants