feat: Replace information schema with describe calls#1432
Conversation
Add alternate code path for metadata fetching in _describe_relation methods using DESCRIBE TABLE EXTENDED AS JSON (DBR 17.3+). Replaces 4 information_schema queries in IncrementalTableAPI and get_view_description in MaterializedViewAPI/ViewAPI. Falls back to legacy queries when capability is unavailable. PECOBLR-2546
Wrap long docstrings and expressions to satisfy E501, and add the missing return annotation on DatabricksRelation.is_foreign_table for mypy.
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||
Fix primary key and foreign key constraint parser to handle non-delimited identifiers within backticks.
|
/integration-test |
|
Integration tests dispatched for PR #1432 by @sd-db. Track progress in the Actions tab. |
|
Integration results for PR #1432 — UC cluster ✅ success · SQL warehouse ❌ failure · All-purpose cluster ✅ success · Shard coverage ✅ success |
|
On the failure, this is known flaky test so can be ignored (I will look to solve for this separately) |
sd-db
left a comment
There was a problem hiding this comment.
PR Review 1/x. Added a few comments on structure
| f"Current connection does not meet this requirement." | ||
| ) | ||
|
|
||
| def is_describe_as_json_supported(self, relation: DatabricksRelation) -> bool: |
There was a problem hiding this comment.
Again I feel we should rename this, this check is specifically for using DECRIBE TABLE EXTENDED AS JSON for getting metadata we should mark it as such as there can be other places where we are using DESCRIBE... AS JSON that don't go through this check. Some suggestions is_describe_as_json_relation_metadata_supported() or maybe can_use_describe_json_for_relation_metadata()
There was a problem hiding this comment.
This method specifically is to check whether "DESCRIBE TABLE EXTENDED AS JSON" is supported. The method should not take responsibility of how the call response is consumed.
There was a problem hiding this comment.
The reason for raising this was actually for the behaviour flag check --> bool(self.behavior.use_describe_as_json_for_relation_metadata.no_warn); since this is dependent on the flag it is a more specific version/flavour, we can look to either move this out or rename
| ) | ||
|
|
||
| @classmethod | ||
| def parse_column_masks(cls, json_metadata: dict[str, Any]) -> "Table": |
There was a problem hiding this comment.
For all the parse methods we are heavily relying on the structure of the reponse from the DESCRIBE...JSON call. While this works. I feel it might have been much better to define our out class/data-model and load the results from the DESCRIBE...JSON call into the data-model. Basically a DAO layer. Here we are at the mercy of the server and whatever validations we might/might not add on correctness and there can still be edge-cases.
sd-db
left a comment
There was a problem hiding this comment.
Overall looks good, made some minor comments
|
/integration-test |
|
Integration tests dispatched for PR #1432 by @tejassp-db. Track progress in the Actions tab. |
|
Integration results for PR #1432 — UC cluster ✅ success · SQL warehouse ✅ success · All-purpose cluster ✅ success · Shard coverage ✅ success |
|
/integration-test |
|
Integration tests dispatched for PR #1432 by @tejassp-db. Track progress in the Actions tab. |
|
/integration-test |
|
Integration tests dispatched for PR #1432 by @tejassp-db. Track progress in the Actions tab. |
|
Integration results for PR #1432 — UC cluster ❌ cancelled · SQL warehouse ❌ failure · All-purpose cluster ✅ success · Shard coverage ❌ failure |
|
Integration results for PR #1432 — UC cluster ✅ success · SQL warehouse ❌ failure · All-purpose cluster ✅ success · Shard coverage ✅ success |
Flaky integration test, this code path does not touch those tests or their code path. |
Description
Replace expensive
information_schemaqueries with a singleDESCRIBE TABLE EXTENDED ... AS JSONcall for metadata fetching in_describe_relationmethods. Gated behindDBRCapability.DESCRIBE_TABLE_EXTENDED_AS_JSON(DBR 17.3+), falls back to existing info_schema queries on older runtimes.Changes:
IncrementalTableAPI._describe_relation: replaces 4 info_schema queries (PK, FK, non-null constraints, column masks) with AS JSON parsingMaterializedViewAPI._describe_relation: replacesget_view_description(info_schema.views) with AS JSON parsingViewAPI._describe_relation: same as MVDatabricksDescribeJsonMetadataparser class with composite PK/FK supportis_describe_as_json_supported()gating method (checks HMS, foreign table, capability)is_foreign_tableproperty onDatabricksRelationdescribe_table_extended_as_jsonJinja macroTesting:
is_describe_as_json_supportedunit tests (UC, HMS, foreign table, no capability)Checklist
CHANGELOG.mdand added information about my change to the "dbt-databricks next" section.PECOBLR-2546