Skip to content

*: support column masking#65603

Open
tiancaiamao wants to merge 102 commits intopingcap:masterfrom
tiancaiamao:demo
Open

*: support column masking#65603
tiancaiamao wants to merge 102 commits intopingcap:masterfrom
tiancaiamao:demo

Conversation

@tiancaiamao
Copy link
Copy Markdown
Contributor

@tiancaiamao tiancaiamao commented Jan 16, 2026

What problem does this PR solve?

Issue Number: ref #65744

Problem Summary:

This is the implemention for server side column masking feature, I'll finish a demo fist in this PR.
And later, it will be split into small commits.

What changed and how does it work?

Currently most of the code are finished by AI, I just review the code and verify the result.
I keep a commit for each phase, so later it can be easier break into small PRs.

Phase 1: Parser and AST

  • Add AST nodes:
    • CreateMaskingPolicyStmt
    • AlterTable specs for ADD/ENABLE/DISABLE/DROP MASKING POLICY
    • ShowStmtType for SHOW MASKING POLICIES
  • Update pkg/parser/parser.y for new grammar.
  • Update AST restore output in pkg/parser/ast/ddl.go and pkg/parser/ast/dml.go.
  • Parser tests to cover all syntax variants and mutual exclusivity.

Phase 2: Metadata Model and DDL Job Types

  • Add MaskingPolicyInfo in pkg/meta/model (new file) with IDs, names, expression, status, audit fields.
  • Add new DDL job types and args in pkg/meta/model/job.go and pkg/meta/model/job_args.go.
  • Add meta.Mutator APIs for create/update/drop/list policies with new meta keys (avoid colliding with placement policy keys).

Phase 3: System Table

  • Add CreateTiDBMaskingPolicyTable to pkg/meta/metadef/system_tables_def.go.
  • Add bootstrap and upgrade steps (pkg/session/bootstrap.go, pkg/session/upgrade_def.go).
  • Table definition (per PDF):
    • policy_id BIGINT PRIMARY KEY
    • policy_name VARCHAR(64) NOT NULL
    • db_name VARCHAR(64) NOT NULL
    • table_name VARCHAR(64) NOT NULL
    • table_id BIGINT NOT NULL
    • column_name VARCHAR(64) NOT NULL
    • column_id BIGINT NOT NULL
    • expression TEXT NOT NULL
    • status ENUM('ENABLE','DISABLE') DEFAULT 'ENABLE'
    • function_type VARCHAR(32) (FULL/PARTIAL/NULL/CUSTOM)
    • created_at TIMESTAMP DEFAULT NOW()
    • updated_at TIMESTAMP DEFAULT NOW()
    • created_by VARCHAR(128)
    • UNIQUE KEY (db_name, policy_name)
    • INDEX (table_id, column_id)

Phase 4: InfoSchema Integration

  • Extend infoSchemaMisc with a maskingPolicyMap keyed by table_id/column_id and by policy name.
  • Add accessors on InfoSchema and infoschema/context:
    • MaskingPolicyByName / MaskingPolicyByTableColumn / AllMaskingPolicies.
  • Update infoschema.Builder apply-diff paths for create/alter/drop policy jobs.
  • Ensure plan cache invalidation via schema version change.

Phase 5: DDL Execution and Validation

  • Implement DDL job handlers in pkg/ddl/masking_policy.go:
    • create/add/enable/disable/drop
  • Wire into pkg/ddl/job_worker.go and DDL executor.
  • Validation rules:
    • One policy per column.
    • No policy on temp/system/view/generated columns.
    • Expression resolves only the target column; no aliases.
    • Output type/length must match column type.
    • Enforce IF NOT EXISTS / OR REPLACE error rules.
  • DDL Guard:
    • Block modifying type/length/precision of masked columns in GetModifiableColumnJob.
  • Cascade:
    • Drop column/table removes its policies.
    • TRUNCATE TABLE updates table_id in policy entries.
    • RENAME TABLE / RENAME COLUMN updates names in policy entries.
  • Persist both meta and mysql.tidb_masking_policy in job handlers.

Phase 6: SHOW Statements (moved forward)

  • Implement SHOW MASKING POLICIES FOR t [WHERE column='c'] in pkg/executor/show.go.
  • Update SHOW CREATE TABLE output to append:
    • /* MASKING POLICY name ENABLE */ per column (policy name + status only).

Phase 7: Expression Parsing and Caching

  • Parse policy expressions via expression.ParseSimpleExpr with schema containing only the target column.
  • Cache compiled expressions in session vars; invalidate on schema version change.
  • Guard against recursion when substituting masked expressions.

Phase 8: Planner Rewrite (Apply Masking)

  • In buildProjection, substitute column references with the policy expression using expression.ColumnSubstitute.
  • Apply only to select list/projection outputs; predicates remain untouched.
  • Ensure expression-based outputs use masked inputs.

Phase 9: Masking Builtins

  • Add MASK_PARTIAL, MASK_FULL, MASK_NULL, MASK_DATE in pkg/expression and register in builtin.go.
  • Implement type-aware behavior with collation-aware string slicing.

Phase 10: Privileges and Errors

  • Add dynamic privileges:
    • CREATE MASKING POLICY
    • ALTER MASKING POLICY
    • DROP MASKING POLICY
  • Enforce privilege checks in plan builder / executor.
  • Add error codes/messages in pkg/errno and pkg/util/dbterror (patterned after placement policy errors).

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • Column-level data masking: create/alter/enable/disable/drop policies, SHOW MASKING POLICIES, SHOW CREATE TABLE annotations, restrict-on enforcement, new masking privileges, and error for access-denied to masked columns
    • Built-in masking functions: MASK_FULL, MASK_PARTIAL, MASK_NULL, MASK_DATE
    • Automatic validation tooling to run validation runs and generate human-readable reports
  • Documentation

    • Design spec, comprehensive test plan, scenario matrices, and report/template for masking validation
  • Tests

    • Extensive unit and integration tests covering lifecycle, enforcement, privileges, edge cases, and migrations

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Jan 16, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 16, 2026
@tiprow
Copy link
Copy Markdown

tiprow bot commented Jan 16, 2026

Hi @tiancaiamao. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiancaiamao tiancaiamao changed the title *: support column masking (WIP) *: support column masking Feb 4, 2026
@tiancaiamao
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 31, 2026

@tiancaiamao: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 31, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fzzf678
Once this PR has been reviewed and has the lgtm label, please assign d3hunter, terry1purcell, windtalker for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiancaiamao
Copy link
Copy Markdown
Contributor Author

/test pull-br-integration-test

@tiprow
Copy link
Copy Markdown

tiprow bot commented Mar 31, 2026

@tiancaiamao: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tiancaiamao and others added 11 commits April 1, 2026 16:28
…WHERE/HAVING/GROUP BY

This commit fixes a critical bug where masking policies in CTEs were incorrectly
applied during CTE definition building, causing WHERE/HAVING/GROUP BY/ORDER BY
clauses to see masked values instead of original values.

Changes:
- buildSelect/buildSetOpr: Skip masking when building CTE definitions by checking
  !b.isCTE && !b.buildingCTE
- buildDataSourceFromCTEMerge: Set both b.isCTE and b.buildingCTE flags to ensure
  CTE context is maintained during inline merge
- Preserve OrigTblName and OrigColName for masking policy lookup

AT RESULT semantics: Masking is now correctly applied only at final output,
not during intermediate operations.

Issue: pingcap#67341

Co-Authored-By: Claude <noreply@anthropic.com>
This commit updates the column masking auto-validation skill to include
the new CTE masking test case.

Changes:
- Add tests/integrationtest/t/privilege/column_masking_cte.test to test plan
- Add P0-CORE-03 scenario for AT RESULT semantics with CTE
- Add it_column_masking_cte step to validation script

Co-Authored-By: Claude <noreply@anthropic.com>
The buildingCTE flag causes buildDistinct to access b.outerCTEs[len-1]
which fails when b.outerCTEs has been truncated (as it is before calling
buildDataSourceFromCTEMerge). Only setting b.isCTE is sufficient for the
masking fix and avoids the index out of range panic.

This fixes the collation_misc test failure with CTE UNION queries.
The original truncation of b.outerCTEs in tryBuildCTE caused buildDistinct
to fail with "index out of range [-1]" when accessing b.outerCTEs[len-1].
The fix is to NOT truncate b.outerCTEs when calling buildDataSourceFromCTEMerge,
which allows buildDistinct to access it safely when b.buildingCTE is true.

This fixes both:
1. collation_misc test failure with CTE UNION queries
2. column_masking_cte test behavior with HAVING/ORDER BY using original values

The masking fix (checking !b.isCTE && !b.buildingCTE) still works correctly
because b.isCTE is set to true in buildDataSourceFromCTEMerge, which skips
masking during CTE definition building.
… skip ID check for CTE columns, set buildingCTE in buildDataSourceFromCTEMerge
- Fix buildDataSourceFromCTEMerge to set both b.isCTE and b.buildingCTE
  to prevent masking during CTE definition building (buildTableRefs
  overwrites b.isCTE via buildResultSetNode(false), so b.buildingCTE
  serves as backup)
- Fix adjustCTEPlanOutputName to preserve OrigTblName before overwriting
  with CTE name, so findMaskingPolicy can look up the original table
- Fix findMaskingPolicy to skip column ID check for CTE-derived columns
  (OrigTblName != TblName) since getResultCTESchema reallocates IDs
- Fix tryBuildCTE inline merge to truncate b.outerCTEs and set
  b.buildingCTE=false before calling buildDataSourceFromCTEMerge
- Fix mask_partial function signature: change from (str, pad, start, length)
  to (str, preserveLeft, preserveRight, pad) to match policy definition
  MASK_PARTIAL(col, prefix_keep, suffix_keep, mask_char)
- Clear masking policy expression cache on SET ROLE (current_role()
  is session-dependent but cache only invalidated on schema version)
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 10, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

📖 For more info, you can check the "Contribute Code" section in the development guide.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 10, 2026

@tiancaiamao: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev c5d72e7 link true /test check-dev
idc-jenkins-ci-tidb/check_dev_2 c5d72e7 link true /test check-dev2
idc-jenkins-ci-tidb/build c5d72e7 link true /test build
pull-build-next-gen c5d72e7 link true /test pull-build-next-gen
idc-jenkins-ci-tidb/mysql-test c5d72e7 link true /test mysql-test
idc-jenkins-ci-tidb/unit-test c5d72e7 link true /test unit-test
pull-integration-realcluster-test-next-gen c5d72e7 link true /test pull-integration-realcluster-test-next-gen
pull-unit-test-ddlv1 c5d72e7 link true /test pull-unit-test-ddlv1

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/statistics do-not-merge/needs-tests-checked needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants