[BugFix] Relax DB lock to intensive path in qe/ read-only paths#73067
Conversation
Switch four single-table read paths in qe/ from full DB READ to IS+table-READ via lockTableWithIntensiveDbLock so they no longer block concurrent DDL on other tables in the same database. - ShowExecutor.showCreateInternalCatalogTable: move lookup outside the lock; revalidate the resolved Table reference once the intensive lock is held so concurrent DROP/RENAME between lookup and lock acquisition is reported as ERR_BAD_TABLE_ERROR rather than serving DDL from a stale Table. - ShowExecutor.visitShowTabletStatement (both branches): use the known/resolved tableId for table-READ; the table-by-name branch also revalidates after locking. The single-tablet branch already re-fetches under the lock and needs no extra check. - ConnectProcessor.handleFieldList: move lookup outside the lock; collapse the nested lock-try inside the connector-exception try into a single try-catch-finally; revalidate after locking for internal-catalog DBs (external catalog tables are not tracked by LocalMetastore). The revalidation closes a TOCTOU window: under the old DB READ, DROP/RENAME could not interleave between lookup and use; under IS+table-READ the lookup is unprotected so the resolved Table can be removed from the DB before the lock is taken. Re-fetching by id after locking and comparing references catches that window and reports the table as not-found, matching the old semantics. Signed-off-by: Kevin Cai <kevin.cai@celerdata.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]❌ fail : 25 / 56 (44.64%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
There was a problem hiding this comment.
Pull request overview
This PR reduces lock contention in FE qe/ read-only code paths by replacing full DB READ locks with “intensive” locking (IS + table-READ) for single-table operations, so concurrent DDL on other tables in the same DB is less likely to be blocked.
Changes:
- Updated
SHOW CREATE TABLE(internal catalog) to lock vialockTableWithIntensiveDbLock, with a post-lock revalidation step intended to detect DROP/RENAME races. - Updated
SHOW TABLETto use per-table intensive locks (both single-tablet and table-by-name branches), with post-lock revalidation in the table-by-name branch. - Updated
COM_FIELD_LISThandling to perform metadata lookup outside the lock and then lock vialockTableWithIntensiveDbLock, with internal-catalog revalidation after locking.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/com/starrocks/qe/ShowExecutor.java | Switches select SHOW read paths from DB READ to IS+table-READ and adds post-lock revalidation for TOCTOU safety. |
| fe/fe-core/src/main/java/com/starrocks/qe/ConnectProcessor.java | Adjusts COM_FIELD_LIST to use table-level intensive locking and restructures lookup/exception handling with revalidation for internal catalog. |
|
@Mergifyio backport branch-4.0 |
|
@Mergifyio backport branch-4.1 |
✅ Backports have been createdDetails
Cherry-pick of c15972b has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
✅ Backports have been createdDetails
|
Switch four single-table read paths in qe/ from full DB READ to IS+table-READ via lockTableWithIntensiveDbLock so they no longer block concurrent DDL on other tables in the same database.
The revalidation closes a TOCTOU window: under the old DB READ, DROP/RENAME could not interleave between lookup and use; under IS+table-READ the lookup is unprotected so the resolved Table can be removed from the DB before the lock is taken. Re-fetching by id after locking and comparing references catches that window and reports the table as not-found, matching the old semantics.
Why I'm doing:
What I'm doing:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: