Skip to content

Upgrade to Hive 4.0.1#14

Merged
tdcmeehan merged 1 commit into
prestodb:masterfrom
imjalpreet:hiveUpgrade
Apr 22, 2026
Merged

Upgrade to Hive 4.0.1#14
tdcmeehan merged 1 commit into
prestodb:masterfrom
imjalpreet:hiveUpgrade

Conversation

@imjalpreet
Copy link
Copy Markdown
Member

@imjalpreet imjalpreet commented Apr 20, 2026

Required for prestodb/presto#24571

The changes have already been tested with CI in the above-linked PR.

Depends on #13

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 20, 2026

Reviewer's Guide

Upgrades the project to Hive 4.0.1 and the Apache Hadoop 3.x stack, updating ORC SerDe APIs and timestamp handling to match the newer Hive interfaces and types, along with minor type/collection cleanups.

Sequence diagram for timestamp write/read with Hive 4 TimestampWritableV2

sequenceDiagram
  participant Client
  participant OrcSerde
  participant WriterImpl_TimestampTreeWriter as TimestampTreeWriter
  participant Storage as ORCFile
  participant LazyTimestampTreeReader as TimestampTreeReader
  participant OrcLazyTimestampOI as OrcLazyTimestampObjectInspector

  Client->>OrcSerde: serialize(row)
  OrcSerde->>WriterImpl_TimestampTreeWriter: write(timestamp)
  activate WriterImpl_TimestampTreeWriter
  WriterImpl_TimestampTreeWriter->>WriterImpl_TimestampTreeWriter: val = TimestampObjectInspector.getPrimitiveJavaObject(obj)
  WriterImpl_TimestampTreeWriter->>WriterImpl_TimestampTreeWriter: seconds = val.toEpochMilli() / MILLIS_PER_SECOND - BASE_TIMESTAMP
  WriterImpl_TimestampTreeWriter->>Storage: write seconds and nanos
  deactivate WriterImpl_TimestampTreeWriter

  Client->>OrcSerde: deserialize(row)
  OrcSerde->>LazyTimestampTreeReader: next(previous TimestampWritableV2)
  activate LazyTimestampTreeReader
  LazyTimestampTreeReader->>LazyTimestampTreeReader: millis = (data.next() + BASE_TIMESTAMP) * MILLIS_PER_SECOND
  LazyTimestampTreeReader->>LazyTimestampTreeReader: adjust millis by nanos
  LazyTimestampTreeReader->>LazyTimestampTreeReader: timestamp.setTimeInMillis(millis)
  LazyTimestampTreeReader->>LazyTimestampTreeReader: timestamp.setNanos(newNanos)
  LazyTimestampTreeReader-->>OrcSerde: TimestampWritableV2
  deactivate LazyTimestampTreeReader
  OrcSerde-->>Client: row with Timestamp via OrcLazyTimestampObjectInspector
Loading

Class diagram for updated ORC SerDe and inspectors

classDiagram
  class OrcSerde {
    +OrcSerde()
    -OrcSerdeRow row
    -ObjectInspector inspector
    +void initialize(Configuration conf, Properties table, Properties partition)
    +Object deserialize(Writable blob)
    +ObjectInspector getObjectInspector()
    +SerDeStats getSerDeStats()
    +Class getSerializedClass()
    +Writable serialize(Object obj, ObjectInspector objInspector)
  }

  class AbstractSerDe {
    <<abstract>>
    +void initialize(Configuration conf, Properties table, Properties partition)
    +Object deserialize(Writable blob)
    +ObjectInspector getObjectInspector()
    +SerDeStats getSerDeStats()
    +Class getSerializedClass()
    +Writable serialize(Object obj, ObjectInspector objInspector)
  }

  class SerDe {
    <<interface>>
  }

  OrcSerde --|> AbstractSerDe
  AbstractSerDe ..|> SerDe

  class OrcStruct {
    +List~String~ fieldNames
    +List~Object~ fields
    +List~String~ getFieldNames()
    +void setFieldNames(List~String~ fieldNames)
  }

  class OrcStruct_Field {
    +String fieldName
    +int offset
    +String getFieldName()
    +ObjectInspector getFieldObjectInspector()
    +String getFieldComment()
    +int getFieldID()
  }

  class StructField {
    <<interface>>
    +String getFieldName()
    +ObjectInspector getFieldObjectInspector()
    +String getFieldComment()
    +int getFieldID()
  }

  OrcStruct_Field ..|> StructField
  OrcStruct_Field --> OrcStruct

  class OrcStructInspector {
    +List~StructField~ fields
    +OrcStructInspector(StructTypeInfo info)
  }

  OrcStructInspector --> OrcStruct_Field

  class OrcLazyStructObjectInspector {
    +List~StructField~ fields
    +OrcLazyStructObjectInspector(StructTypeInfo info)
    +Object getStructFieldData(Object data, StructField fieldRef)
  }

  class OrcLazyRowObjectInspector {
    +OrcLazyRowObjectInspector(StructTypeInfo info)
    +Object getStructFieldData(Object data, StructField fieldRef)
  }

  OrcLazyRowObjectInspector --|> OrcLazyStructObjectInspector
  OrcLazyStructObjectInspector --> OrcStruct_Field

  class OrcLazyTimestampObjectInspector {
    +OrcLazyTimestampObjectInspector()
    +Timestamp getPrimitiveJavaObject(Object o)
  }

  class OrcLazyPrimitiveObjectInspector~T,W~ {
    <<abstract>>
    +W getPrimitiveWritableObject(Object o)
  }

  class TimestampWritableV2
  class Timestamp

  OrcLazyTimestampObjectInspector --|> OrcLazyPrimitiveObjectInspector~OrcLazyTimestamp,TimestampWritableV2~
  OrcLazyTimestampObjectInspector --> TimestampWritableV2
  OrcLazyTimestampObjectInspector --> Timestamp

  class OrcLazyTimestamp {
    +Object previous
    +OrcLazyTimestamp(LazyTimestampTreeReader treeReader)
    +OrcLazyTimestamp(OrcLazyTimestamp copy)
  }

  class LazyTimestampTreeReader {
    +Object next(Object previous)
  }

  OrcLazyTimestamp --> LazyTimestampTreeReader

  class OrcInputFormat {
    +boolean validateInput(FileSystem fs, HiveConf conf, List~FileStatus~ files)
  }

  class WriterImpl {
    +static int MILLIS_PER_SECOND
    +static long BASE_TIMESTAMP
  }

  class TimestampTreeWriter {
    +void write(Object obj)
  }

  WriterImpl o-- TimestampTreeWriter
  TimestampTreeWriter --> Timestamp

  class LazyTimestampTreeReader_TimestampWritableV2 {
    +Object next(Object previous)
  }

  LazyTimestampTreeReader_TimestampWritableV2 --> TimestampWritableV2
  LazyTimestampTreeReader_TimestampWritableV2 --> Timestamp
Loading

File-Level Changes

Change Details Files
Switch build dependencies from CDH Hadoop and Hive 0.8 to Apache Hadoop 3.4.1 and Hive 4.0.1, adding missing provided APIs required by newer Hive.
  • Replace com.facebook.presto.hadoop:hadoop-cdh4 with hadoop-apache at version 3.4.1-1 in the Maven POM with provided scope
  • Bump com.facebook.presto.hive:hive-apache from 0.8 to 4.0.1-1 in the Maven POM with provided scope
  • Add javax.annotation:javax.annotation-api and org.slf4j:slf4j-api as provided-scope dependencies to satisfy Hive 4 and Hadoop 3 compile-time requirements
pom.xml
Adapt OrcSerde to the Hive 4 AbstractSerDe API and update tests to use the new initialize signature.
  • Change OrcSerde class to extend AbstractSerDe instead of implementing SerDe directly
  • Update OrcSerde.initialize to the new three-argument signature (Configuration, table Properties, partition Properties)
  • Adjust all test usages of OrcSerde to use the concrete OrcSerde type and call initialize with the new third parameter (null for partition properties)
src/main/java/com/facebook/hive/orc/OrcSerde.java
src/test/java/com/facebook/hive/orc/TestInputOutputFormat.java
Migrate timestamp handling to Hive 4 Timestamp/TimestampWritableV2 APIs and corresponding epoch-millis semantics.
  • Change OrcLazyTimestampObjectInspector to use org.apache.hadoop.hive.common.type.Timestamp and TimestampWritableV2 and propagate those types through getPrimitiveJavaObject
  • Update WriterImpl timestamp base constant and write path to use Timestamp.toEpochMilli instead of getTime
  • Update LazyTimestampTreeReader to use TimestampWritableV2, adjust object creation/casting, and use Timestamp.setTimeInMillis
  • Change OrcLazyTimestamp to use TimestampWritableV2 and adjust its copy constructor
  • Update TestOrcFile timestamp tests to expect TimestampWritableV2 from OrcLazyTimestamp.materialize
src/main/java/com/facebook/hive/orc/lazy/OrcLazyTimestampObjectInspector.java
src/main/java/com/facebook/hive/orc/WriterImpl.java
src/main/java/com/facebook/hive/orc/lazy/LazyTimestampTreeReader.java
src/main/java/com/facebook/hive/orc/lazy/OrcLazyTimestamp.java
src/test/java/com/facebook/hive/orc/TestOrcFile.java
Align struct and lazy struct object inspectors with newer Hive interfaces, using generic lists and StructField.getFieldID.
  • Update OrcStruct.OrcStructInspector and OrcLazyRowObjectInspector/OrcLazyStructObjectInspector to use List/List instead of ArrayList-specific types when reading from StructTypeInfo
  • Change internal Field implementation in OrcStruct to override getFieldID() instead of a custom getOffset() method, and update consumers to call StructField.getFieldID() when resolving offsets
src/main/java/com/facebook/hive/orc/OrcStruct.java
src/main/java/com/facebook/hive/orc/lazy/OrcLazyRowObjectInspector.java
src/main/java/com/facebook/hive/orc/lazy/OrcLazyStructObjectInspector.java
General API alignment with Hive 4 and Hadoop 3 collection types.
  • Change OrcInputFormat.validateInput signature to accept a generic List instead of ArrayList for compatibility with newer interfaces
src/main/java/com/facebook/hive/orc/OrcInputFormat.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In TestMemoryManager, the new lambda-based matcher in the verify call makes the closeTo helper unused; consider removing closeTo (and any now-unused imports) to keep the tests lean.
  • The POM now globally skips FindBugs, PMD, and JaCoCo; if these were disabled only to get the Hive 4 / Java 17 upgrade building, consider re-enabling them or scoping the skips more narrowly to avoid losing static analysis across the whole module.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In TestMemoryManager, the new lambda-based matcher in the verify call makes the closeTo helper unused; consider removing closeTo (and any now-unused imports) to keep the tests lean.
- The POM now globally skips FindBugs, PMD, and JaCoCo; if these were disabled only to get the Hive 4 / Java 17 upgrade building, consider re-enabling them or scoping the skips more narrowly to avoid losing static analysis across the whole module.

## Individual Comments

### Comment 1
<location path="pom.xml" line_range="52-54" />
<code_context>

     <properties>
+        <!-- Java 17 Configuration -->
+        <maven.compiler.source>17</maven.compiler.source>
+        <maven.compiler.target>17</maven.compiler.target>
+        <maven.compiler.release>17</maven.compiler.release>
+        <project.build.targetJdk>17</project.build.targetJdk>
+        
</code_context>
<issue_to_address>
**suggestion:** Using `source`/`target` together with `release` is redundant and can be confusing; prefer one or the other.

For Java 9+, `<maven.compiler.release>` is sufficient, as it sets both language level and target API. To avoid redundant configuration and potential mismatches, remove `source` and `target` and rely solely on `<maven.compiler.release>17</maven.compiler.release>`.
</issue_to_address>

### Comment 2
<location path="src/test/java/com/facebook/hive/orc/TestMemoryManager.java" line_range="96-105" />
<code_context>
-
-  private static DoubleMatcher closeTo(double value, double error) {
-    return new DoubleMatcher(value, error);
+  private static ArgumentMatcher<Double> closeTo(double expected, double error) {
+    return new ArgumentMatcher<Double>() {
+      @Override
+      public boolean matches(Double val) {
+        return val != null && Math.abs(val - expected) <= error;
+      }
+      
+      @Override
+      public String toString() {
+        return "close to " + expected + " (±" + error + ")";
+      }
+    };
   }

</code_context>
<issue_to_address>
**suggestion:** Helper matcher `closeTo` is no longer used in the test suite

This new helper looks like a good replacement for `DoubleMatcher`, but after switching `testCallback` to a lambda it’s now unused. Please either adopt `closeTo` in the `verify(...).checkMemory(...)` calls or remove it to avoid dead code and keep the tests simple.

Suggested implementation:

```java

```

If there are any remaining references to `closeTo` elsewhere in `TestMemoryManager.java` (e.g. in `verify(...).checkMemory(doubleThat(closeTo(...)))`), those will now fail to compile and should be replaced with either `anyDouble()`, `eq(...)`, or an inline matcher consistent with the existing tests’ intent.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread pom.xml
Comment on lines +52 to +54
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<maven.compiler.release>17</maven.compiler.release>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Using source/target together with release is redundant and can be confusing; prefer one or the other.

For Java 9+, <maven.compiler.release> is sufficient, as it sets both language level and target API. To avoid redundant configuration and potential mismatches, remove source and target and rely solely on <maven.compiler.release>17</maven.compiler.release>.

Comment on lines +96 to +105
private static ArgumentMatcher<Double> closeTo(double expected, double error) {
return new ArgumentMatcher<Double>() {
@Override
public boolean matches(Double val) {
return val != null && Math.abs(val - expected) <= error;
}

@Override
public String toString() {
return "close to " + expected + " (±" + error + ")";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Helper matcher closeTo is no longer used in the test suite

This new helper looks like a good replacement for DoubleMatcher, but after switching testCallback to a lambda it’s now unused. Please either adopt closeTo in the verify(...).checkMemory(...) calls or remove it to avoid dead code and keep the tests simple.

Suggested implementation:

If there are any remaining references to closeTo elsewhere in TestMemoryManager.java (e.g. in verify(...).checkMemory(doubleThat(closeTo(...)))), those will now fail to compile and should be replaced with either anyDouble(), eq(...), or an inline matcher consistent with the existing tests’ intent.

@imjalpreet imjalpreet marked this pull request as ready for review April 22, 2026 18:21
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Now that OrcSerde extends AbstractSerDe and only overrides initialize(Configuration, Properties, Properties), consider also adding a delegating initialize(Configuration, Properties) overload so existing callers typed as SerDe can still use the two-arg signature without needing to downcast to OrcSerde (and the tests can keep using the interface type).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Now that `OrcSerde` extends `AbstractSerDe` and only overrides `initialize(Configuration, Properties, Properties)`, consider also adding a delegating `initialize(Configuration, Properties)` overload so existing callers typed as `SerDe` can still use the two-arg signature without needing to downcast to `OrcSerde` (and the tests can keep using the interface type).

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@imjalpreet
Copy link
Copy Markdown
Member Author

Full Build Result since we don't have a CI pipeline here:

./mvnw clean install 

[INFO] Scanning for projects...
[INFO] 
[INFO] --------------------< com.facebook.hive:hive-dwrf >---------------------
[INFO] Building hive-dwrf 0.8.8-SNAPSHOT
[INFO]   from pom.xml
[INFO] --------------------------------[ jar ]---------------------------------

...
...

[INFO] --- maven-surefire-plugin:3.0.0-M7:test (default-test) @ hive-dwrf ---
[INFO] Tests will run in random order. To reproduce ordering use flag -Dsurefire.runOrder.random.seed=640093951642416
[INFO] Using auto detected provider org.apache.maven.surefire.junit4.JUnit4Provider
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------

...
...

[INFO] Results:
[INFO] 
[INFO] Tests run: 146, Failures: 0, Errors: 0, Skipped: 0

...
...

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:15 min
[INFO] Finished at: 2026-04-22T23:55:11+05:30
[INFO] ------------------------------------------------------------------------

@imjalpreet imjalpreet requested a review from tdcmeehan April 22, 2026 18:28
@imjalpreet
Copy link
Copy Markdown
Member Author

@tdcmeehan, thank you for reviewing the other PRs. I have rebased the PR on the latest master. Please have a look whenever you get a chance. Thanks!

@tdcmeehan tdcmeehan merged commit 6fc449b into prestodb:master Apr 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants