Skip to content

Expand testEnumUnionString comment with upstream-bug context#605

Open
YogeshKothari26 wants to merge 1 commit into
linkedin:masterfrom
YogeshKothari26:yokothar/testenumunionstring-doc
Open

Expand testEnumUnionString comment with upstream-bug context#605
YogeshKothari26 wants to merge 1 commit into
linkedin:masterfrom
YogeshKothari26:yokothar/testenumunionstring-doc

Conversation

@YogeshKothari26
Copy link
Copy Markdown
Contributor

@YogeshKothari26 YogeshKothari26 commented Apr 28, 2026

What changes are proposed in this pull request, and why are they necessary?

Documents context for the disabled testEnumUnionString test (disabled in #585):

  • The failure is an upstream Hive 2.3.9 parser bug — SemanticAnalyzer throws AssertionError in UnparseTranslator.addTranslation during CREATE VIEW parse for a UNION ALL between an Avro enum column and a string column.
  • Coral's mergeUnionSchema (added in #282) is unaffected and still produces a correct STRING-typed schema for already-created views at translation time.
  • The bug fires at CREATE VIEW parse time on Hive 2.3.9. The case stays unsupported in Coral CI/i-tests (which embed Hive 2.3.9).
  • Test will be re-enabled once the upstream Hive parser issue is resolved.

Comment-only change.

How was this patch tested?

./gradlew :coral-schema:spotlessCheck — PASS. No code change.

Documents that the disabled test reflects an upstream Hive 2.3.9
parser bug (CREATE VIEW UNION ALL with Avro enum vs string column),
not a Coral schema-merge regression. Notes that the case stays
unsupported in Coral CI but does not surface against HMS servers on
Hive 1.1.0.
@YogeshKothari26 YogeshKothari26 force-pushed the yokothar/testenumunionstring-doc branch from 883013c to 2681c80 Compare April 29, 2026 05:31
@aastha25
Copy link
Copy Markdown
Contributor

if its a parser bug, does is mean its a hive 2.3 client side bug? we have internal view creation pipelines which use spark (and spark is internally using hive 2.3 client libs). is that job impacted or not?

@YogeshKothari26
Copy link
Copy Markdown
Contributor Author

@aastha25 — verified Spark CREATE VIEW path is not impacted by this bug.

Test: Created a Hive-backed Avro table with an enum column, ran the equivalent CREATE VIEW v AS SELECT enum_field FROM t UNION ALL SELECT 'literal_string' FROM t on Spark 3.5 with spark.sql.hive.metastore.version = 2.3.9.1. View created successfully, no AssertionError.

Why Spark is safe: Spark parses CREATE VIEW DDL with its own SqlBaseParser and stores the view body verbatim via HiveExternalCatalog.createTableHive.createTable (metadata API only). Hive's SemanticAnalyzer / UnparseTranslator — where this bug lives — are never invoked in Spark's flow. DESCRIBE EXTENDED confirms this empirically: View Text and View Original Text are byte-identical, meaning Spark stored the SQL as-is without re-canonicalizing. If UnparseTranslator had run, View Text would be the re-serialized form.

The bug fires only when Hive's parser is directly invoked (e.g. Hive CLI, embedded Hive in Coral's HiveTester). Spark-based view creation pipelines remain unaffected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants