Skip to content

Upgrade Hive 1.2.2 → 2.3.9#585

Merged
aastha25 merged 1 commit into
linkedin:masterfrom
YogeshKothari26:pr2-hive-239
Apr 23, 2026
Merged

Upgrade Hive 1.2.2 → 2.3.9#585
aastha25 merged 1 commit into
linkedin:masterfrom
YogeshKothari26:pr2-hive-239

Conversation

@YogeshKothari26
Copy link
Copy Markdown
Contributor

@YogeshKothari26 YogeshKothari26 commented Mar 17, 2026

What changes are proposed in this pull request, and why are they necessary?

This PR upgrades Coral's Hive dependency from 1.2.2 to 2.3.9, and bumps the version from 2.3.x to 2.4.x to reflect the scope of this change.

Part 2 of 3: #580 Gradlethis PR (Hive) → #596 Java 17. Stays on Java 8.

Dependencies

  • Hive 1.2.2 → 2.3.9
  • DataNucleus/Derby/javax.jdo test deps consolidated in root build.gradle
  • Pentaho exclusion (not in Maven Central)
  • Hive CBO disabled via systemProperty (CalcitePlanner incompatible with Calcite 1.21.0.265)
  • Excluded problematic transitives from published POM to protect downstream consumers

Source fixes

  • MetastoreProvider: getProxy(conf)getProxy(conf, true) (single-arg overload removed in 2.3.9). The new boolean allowEmbedded parameter controls whether an in-process HMS is permitted when hive.metastore.uris is unset. Passing true preserves 1.2.2 behavior, which coral-service relies on for local/test flows.

coral-spark-catalog compatibility

After merging master (which added coral-spark-catalog via #584), the new module's tests needed Hive 2.3.9 compatibility fixes:

  • Added DataNucleus/Derby/hive-serde test deps (same as other modules)
  • Excluded Jackson from hive-metastore and hive-exec-core (Hive 2.3.9 brings Jackson 2.6.x which conflicts with Spark 3.5's Jackson 2.15)
  • Disabled Hive CBO (same Calcite conflict as other modules)
  • Test-only changes — no impact on published artifacts

Version bump

  • version.properties: 2.3.* → 2.4.* (minor bump for Hive dependency upgrade)

Tests disabled

  • testEnumUnionString — Hive 2.3.9 SemanticAnalyzer throws AssertionError in UnparseTranslator.addTranslation during CREATE VIEW with UNION ALL between Avro enum and string columns
HMS API Changelog: 1.2.2 → 2.3.9

Core read APIs — UNCHANGED:

getTable, getDatabase, getTables, getAllDatabases, getAllTables, getFields, getSchema, listPartitions, getPartitionsByNames, listPartitionNames, listPartitionsByFilter, tableExists — all identical signatures.

Breaking changes (4, all handled within Coral):

  • RetryingMetaStoreClient.getProxy(HiveConf) — removed, replaced by getProxy(HiveConf, boolean)
  • getProxy() Map param → ConcurrentHashMap
  • dropPartitions() ignoreProtection param removed
  • getPartition() param order swapped (tblName, dbName → dbName, tblName)

New methods: 25 additive (constraints, bulk metadata, partition values, etc.)

Thrift: 0.9.2 → 0.9.3 (wire compatible)

Full source comparison

Transitive dependency analysis

Compared coral-common compile classpath: master (146 artifacts) → this PR (221 artifacts).

+102 new (from Hive 2.3.9 dependency tree), -27 removed (from Hive 1.2.2).

Excluded transitives — verified NOT leaking in published POM:

Artifact Why excluded
org.apache.logging.log4j:log4j-core CVE (log4shell)
org.eclipse.jetty.orbit:javax.servlet Conflicts with consumer servlet APIs
org.slf4j:slf4j-log4j12 Conflicts with consumer SLF4J bindings

Shadow JAR: coral-trino-parser verified identical to master — 3,450 entries, zero differences.

How was this patch tested?

  • ./gradlew clean build — all unit tests pass
  • Regression tested Trino SQL and Spark SQL translation against a large corpus of Hive views — zero regressions, zero nullability changes
  • SDK integration verified end-to-end (HMS and OpenHouse catalog datasets)
  • Shadow JAR (coral-trino-parser) byte-identical to master
  • Excluded transitives verified not leaking in published POM

@YogeshKothari26 YogeshKothari26 marked this pull request as ready for review March 17, 2026 12:27
@YogeshKothari26 YogeshKothari26 marked this pull request as draft April 2, 2026 04:36
@YogeshKothari26 YogeshKothari26 marked this pull request as ready for review April 3, 2026 12:41
@aastha25
Copy link
Copy Markdown
Contributor

Please squash the commits to have a clean up. And also do a minor version bump given the size and impact of this change.

@YogeshKothari26
Copy link
Copy Markdown
Contributor Author

Please squash the commits to have a clean up. And also do a minor version bump given the size and impact of this change.

Done

@Test
// Disabled: Hive 2.3.9 SemanticAnalyzer throws AssertionError in UnparseTranslator.addTranslation
// during CREATE VIEW with UNION ALL between Avro enum and string columns (HIVE-specific bug)
@Test(enabled = false)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for finding this gap. this requires more visibility & tracking. We need to understand if it will fail any existing production views. can you please create a follow up ticket for identifying blast radius and mitigation strategy?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you look at the git blame history for this unit test, you might be able to find out why this feature was needed at all

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points — addressing both.

Origin (per git blame): testEnumUnionString was added in PR #282 (commit 6d6f10e, July 2022) alongside the SchemaUtilities.mergeUnionSchema enum∪string merge logic.

Root cause: The failure is in Hive 2.3.9's SemanticAnalyzer.UnparseTranslator.addTranslation during CREATE VIEW parsing — upstream of Coral. mergeUnionSchema itself is unchanged; queries against already-created views using this pattern still translate correctly.

Follow-up ticket: Filing one to track:

  • How many such views exist in prod today (enum∪string UNION pattern)
  • Mitigation options (e.g., explicit CAST(enum AS string) at view-author level, upstream Hive parser fix)
  • Re-enabling path for testEnumUnionString once a fix lands

Should this be a GitHub issue on this repo, or our internal tracker? Happy either way — let me know the preference.

UserGroupInformation.loginUserFromKeytab(clientPrincipal, clientKeytab);
}
return RetryingMetaStoreClient.getProxy(conf);
return RetryingMetaStoreClient.getProxy(conf, true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the new parameter in this API and what is the behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The new second arg is boolean allowEmbedded.

Hive 2.3.9 removed the single-arg RetryingMetaStoreClient.getProxy(HiveConf). The closest replacement is getProxy(HiveConf, boolean allowEmbedded):

  • true → when hive.metastore.uris is unset (or points to localhost), the client brings up an in-process (embedded) HMS. This matches the Hive 1.2.2 behavior exactly — the old single-arg form always permitted the embedded path.
  • false → throws MetaException in that situation.

coral-service relies on the embedded path for local/test flows where no real metastore is configured, so true is the behavior-preserving 1:1 swap.

I've expanded the "Source fixes" section in the PR summary to include this semantic note, so future readers have the context inline.

Comment thread coral-common/build.gradle Outdated
Dependencies:
- Hive 1.2.2 → 2.3.9
- DataNucleus/Derby/javax.jdo test deps consolidated in root build.gradle
- Pentaho exclusion (not in Maven Central)
- Excluded problematic transitives (log4j-core, javax.servlet, slf4j-log4j12)

Source fixes:
- MetastoreProvider: getProxy(conf) → getProxy(conf, true) (single-arg removed in 2.3.9)

Build config:
- Hive CBO disabled via systemProperty (CalcitePlanner incompatible with Calcite 1.21.0.265)
- coral-spark-catalog: Jackson exclusion + test deps for Hive 2.3.9 compatibility

Tests disabled:
- testEnumUnionString: Hive 2.3.9 AssertionError in UnparseTranslator.addTranslation
  during CREATE VIEW with UNION ALL between Avro enum and string columns
@aastha25 aastha25 merged commit cacdbfb into linkedin:master Apr 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants