Skip to content

[SPARK-56520][SQL] Persist SQL PATH in views and SQL functions, expose in DESCRIBE#55383

Closed
srielau wants to merge 1 commit intoapache:masterfrom
srielau:SPARK-56520-persist-path
Closed

[SPARK-56520][SQL] Persist SQL PATH in views and SQL functions, expose in DESCRIBE#55383
srielau wants to merge 1 commit intoapache:masterfrom
srielau:SPARK-56520-persist-path

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented Apr 17, 2026

What changes were proposed in this pull request?

When spark.sql.path.enabled is true, persist the effective resolution path at creation time for views and SQL functions, and expose it in DESCRIBE output.

View persistence (views.scala):

  • Store the frozen path as JSON in view.resolutionPath property at CREATE VIEW time.
  • Persisted views strip system.session from the stored path since persistent views cannot reference temporary objects (session scope). Temporary views keep it because temp objects can reference other temp objects.

SQL function persistence (CreateSQLFunctionCommand.scala, SQLFunction.scala):

  • Store the frozen path as JSON in function.resolutionPath property at CREATE FUNCTION time.
  • Same stripping rules as views.

Storage format:

  • Path entries are serialized as JSON arrays: [["spark_catalog","default"],["system","builtin"]]
  • This naturally supports multi-level namespaces (nested schemas), e.g. [["catalog","ns1","ns2"]].

DESCRIBE output for views (interface.scala):

  • DESCRIBE EXTENDED / DESCRIBE FORMATTED shows SQL Path with backtick-quoted entries (e.g. `spark_catalog`.`default`, `system`.`builtin`).
  • DESCRIBE ... AS JSON exposes sql_path as an array of objects, consistent with describeIdentifier:
    "sql_path": [
      {"catalog_name": "spark_catalog", "namespace": ["default"]},
      {"catalog_name": "system", "namespace": ["builtin"]}
    ]

DESCRIBE FUNCTION EXTENDED for SQL UDFs (functions.scala):

  • Shows SQL Path line for SQL UDFs when PATH is enabled and a stored path exists.
  • For persistent functions, reads from catalog metadata; for temp SQL UDFs, extracts from the usage JSON blob.

Backward compatibility:

  • Old views/functions created without PATH enabled have no resolutionPath property. They fall back to the default resolution path (resolutionSearchPath with session function resolution order).

The stored path is not yet used during analysis -- frozen path resolution comes in a follow-up PR. This PR only persists and displays the metadata.

Why are the changes needed?

Views and SQL functions need to capture the resolution path at creation time so that their body can later resolve with the same path, independent of the caller's session path. This is the SQL-standard behavior for view and routine body resolution.

Part of SPARK-54810. Depends on SPARK-56501 (SET PATH syntax).

Does this PR introduce any user-facing change?

Yes.

  • DESCRIBE EXTENDED on views shows SQL Path when PATH is enabled.
  • DESCRIBE FUNCTION EXTENDED on SQL UDFs shows SQL Path when PATH is enabled.
  • DESCRIBE ... AS JSON includes sql_path field for views when PATH is enabled.
  • View and SQL function metadata now includes the frozen resolution path.

How was this patch tested?

  • DescribeTableSuite: new test verifying DESCRIBE EXTENDED view AS JSON includes sql_path with correct structure, and regular DESCRIBE EXTENDED shows SQL Path (both V1 catalog V1 and V2 command variants).
  • DescribeTableSuiteBase: added SqlPathEntry case class and sql_path field to DescribeTableJson for JSON deserialization.
  • SetPathSuite: all 22 existing tests pass (no regressions).
  • Compiled and tested locally with SBT.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

@srielau srielau force-pushed the SPARK-56520-persist-path branch 16 times, most recently from f802878 to ba25924 Compare April 22, 2026 15:07
Copy link
Copy Markdown
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala Outdated
@srielau srielau force-pushed the SPARK-56520-persist-path branch 5 times, most recently from a994a5c to bd3ec19 Compare April 23, 2026 00:10
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor since the first round — the extracted SqlPathFormat, DescribeFunctionCommandUtils, and shared CatalogManager.pathEntriesForPersistence / serializePathEntries helpers address the earlier dedup, narrow-catch, and file-health feedback cleanly.

Remaining items below. The main themes are:

  • The scaladoc for both VIEW_RESOLUTION_PATH and FUNCTION_RESOLUTION_PATH describes the stored value as comma-separated, but it's actually a JSON array-of-arrays produced by CatalogManager.serializePathEntries.
  • View DESCRIBE and function DESCRIBE each have their own display formatter for the same stored JSON shape — these should share one renderer.
  • The DESCRIBE output is gated on the current session's SQLConf.get.pathEnabled, which hides the persisted property whenever a user later disables the flag.
  • Test coverage for DESCRIBE FUNCTION EXTENDED on SQL UDFs is missing — the whole DescribeFunctionCommandUtils codepath is currently untested end-to-end, as is the temp-vs-persistent system.session stripping rule.

Additional notes that didn't fit inline:

  • Please add a test that creates a SQL UDF (both persistent and TEMPORARY) with an explicit SET PATH, runs DESCRIBE FUNCTION EXTENDED, and asserts the SQL Path: … row. That covers DescribeFunctionCommandUtils (persistent catalog branch and temp UDF usage-blob fallback), which has no test right now.
  • Tests for the persistence semantics themselves (persistent view/function strips system.session, temp view keeps it, SET PATH round-trip, PATH_ENABLED toggle) would probably sit more naturally in SetPathSuite alongside the rest of the PATH-feature tests than in DescribeTableSuite.

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala Outdated
Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala Outdated
@srielau srielau force-pushed the SPARK-56520-persist-path branch 2 times, most recently from cb3169f to 106c994 Compare April 23, 2026 13:57
Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR is well-structured and the serialization/display separation is clean. One correctness issue and three doc/access-modifier nits below.

Comment thread sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala Outdated
@srielau srielau force-pushed the SPARK-56520-persist-path branch from 106c994 to 8ccfd95 Compare April 23, 2026 15:56
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 re-review. Status: 9 addressed, 2 remaining, 1 new.

  • Addressed: the VIEW_RESOLUTION_PATH / FUNCTION_RESOLUTION_PATH scaladoc now describes the JSON array format; formatStoredPath delegates to SqlPathFormat; removeResolutionPath helper mirrors the other remove* helpers; empty-parts .quoted guard added; non-JArray entries in SqlPathFormat.toDescribeJson now drop instead of pass through; the exact List(SqlPathEntry(...), ...) assertion is pinned; and generateViewProperties now takes captureNewPath, so AlterViewSchemaBindingCommand preserves the frozen path instead of overwriting it.
  • Remaining: DESCRIBE FUNCTION EXTENDED on SQL UDFs still has no test for the DescribeFunctionCommandUtils code path (persistent catalog branch and temp UDF usage-blob fallback). And the PATH persistence tests (persistent view/function strips system.session, temp view keeps it, SET PATH round-trip) are still under DescribeTableSuite rather than alongside the other PATH-feature tests in SetPathSuite. On the pathEnabled DESCRIBE gate I am accepting your "kill switch" rationale as a deliberate design choice; not re-raising.
  • New: the catch on storedResolutionPathString was widened from NoSuchFunctionException all the way to AnalysisException. Since UserDefinedFunction.fromCatalogFunction wraps any parse failure into AnalysisException("CORRUPTED_CATALOG_FUNCTION"), this silently masks real catalog corruption on DESCRIBE FUNCTION EXTENDED. My earlier ask was specifically to include NoSuchDatabaseException; please narrow back to NoSuchFunctionException | NoSuchDatabaseException so CORRUPTED_CATALOG_FUNCTION and any other genuine analysis error propagates to the user. Inline suggestion below.

Note: the re-review diff could not sub-classify new findings (newly introduced vs. late catch) because history between the prior review commit and the current HEAD was rewritten by a force-push.

…e in DESCRIBE

When PATH is enabled, persist the effective resolution path at creation
time for views and SQL functions:

- Views: store frozen path in view.resolutionPath property; strip
  system.session for persisted views, keep for temp views.
- SQL functions: store frozen path in function.resolutionPath property;
  same stripping rules as views.
- DESCRIBE EXTENDED/JSON for views shows "SQL Path" when PATH is enabled
  and a stored path exists.
- DESCRIBE FUNCTION EXTENDED for SQL UDFs shows "SQL Path" when PATH
  is enabled and a stored path exists.
- CapturesConfig denies spark.sql.session.path from the sqlConfig blob
  (path is stored separately in the dedicated property).
- Old views/functions without a stored path use the default resolution
  path (resolutionSearchPath with session function resolution order).

The stored path is not yet used during analysis (frozen path resolution
comes in a follow-up). This PR only persists and displays the metadata.

Part of SPARK-54810.
@srielau srielau force-pushed the SPARK-56520-persist-path branch from 8ccfd95 to 95da8aa Compare April 23, 2026 16:55
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4 re-review. Status: 1 addressed, 2 remaining, 0 new.

  • Addressed: the catch on storedResolutionPathString is narrowed back to NoSuchFunctionException | NoSuchDatabaseException. CORRUPTED_CATALOG_FUNCTION and other AnalysisExceptions now propagate to the user instead of being silently masked as "no path stored." Thanks.
  • Remaining (still valid, not re-posting inline): (1) DESCRIBE FUNCTION EXTENDED on SQL UDFs still has no automated coverage for the DescribeFunctionCommandUtils code path (persistent catalog branch and temp-UDF usage-blob fallback). (2) The PATH persistence tests (persistent view/function strips system.session, temp view keeps it, SET PATH round-trip) still live under DescribeTableSuite rather than alongside the other PATH-feature tests in SetPathSuite.
  • New: none. The only diff since the prior review is the catch narrowing.

Note: history between the prior review commit and the current HEAD was rewritten by force-push, so new-vs-late-catch sub-classification is not applicable this round (no new findings anyway).

@srielau
Copy link
Copy Markdown
Contributor Author

srielau commented Apr 23, 2026

Round 4 re-review. Status: 1 addressed, 2 remaining, 0 new.

  • Addressed: the catch on storedResolutionPathString is narrowed back to NoSuchFunctionException | NoSuchDatabaseException. CORRUPTED_CATALOG_FUNCTION and other AnalysisExceptions now propagate to the user instead of being silently masked as "no path stored." Thanks.
  • Remaining (still valid, not re-posting inline): (1) DESCRIBE FUNCTION EXTENDED on SQL UDFs still has no automated coverage for the DescribeFunctionCommandUtils code path (persistent catalog branch and temp-UDF usage-blob fallback). (2) The PATH persistence tests (persistent view/function strips system.session, temp view keeps it, SET PATH round-trip) still live under DescribeTableSuite rather than alongside the other PATH-feature tests in SetPathSuite.
  • New: none. The only diff since the prior review is the catch narrowing.

Note: history between the prior review commit and the current HEAD was rewritten by force-push, so new-vs-late-catch sub-classification is not applicable this round (no new findings anyway).

DESCRIBE for SQL UDF (whcih is where PATH matters) is no really implemented. Thee is no formatted output of teh function. Adding proper DESRIBE (and QA for it) shodul be a separate PR.

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 5 re-review. Status: 0 addressed, 0 remaining, 0 new.

  • No code changes since the Round 4 HEAD (95da8aa); the only new input is your pushback on DESCRIBE FUNCTION test coverage.
  • Remaining finding #1 (DESCRIBE FUNCTION EXTENDED SQL UDF test coverage): your scope argument is accepted. Confirmed by reading functions.scalaDescribeFunctionCommand.run() for SQL UDFs only emits Function/Class/Usage/Extended Usage plus the new SQL Path row; there is no fleshed-out DESCRIBE FUNCTION output, so bundling QA for it with a follow-up PR that implements proper DESCRIBE for SQL UDFs is reasonable.
  • Remaining finding #2 (persistence tests under DescribeTableSuite vs. SetPathSuite): accepting your implicit decision. The added test is genuinely DESCRIBE-focused (it asserts the JSON shape and the backtick-quoted display string), so DescribeTableSuite is a defensible home. Dedicated persistence-semantics tests (temp keeps system.session, persistent strips it, round-trip) don't exist yet, but that's a coverage gap to track separately — not something to keep re-raising here.

No further AI concerns. Closing the re-review loop.

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 7d17224 Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants