Upgrade to Hive 4.0.1#14
Conversation
Reviewer's GuideUpgrades the project to Hive 4.0.1 and the Apache Hadoop 3.x stack, updating ORC SerDe APIs and timestamp handling to match the newer Hive interfaces and types, along with minor type/collection cleanups. Sequence diagram for timestamp write/read with Hive 4 TimestampWritableV2sequenceDiagram
participant Client
participant OrcSerde
participant WriterImpl_TimestampTreeWriter as TimestampTreeWriter
participant Storage as ORCFile
participant LazyTimestampTreeReader as TimestampTreeReader
participant OrcLazyTimestampOI as OrcLazyTimestampObjectInspector
Client->>OrcSerde: serialize(row)
OrcSerde->>WriterImpl_TimestampTreeWriter: write(timestamp)
activate WriterImpl_TimestampTreeWriter
WriterImpl_TimestampTreeWriter->>WriterImpl_TimestampTreeWriter: val = TimestampObjectInspector.getPrimitiveJavaObject(obj)
WriterImpl_TimestampTreeWriter->>WriterImpl_TimestampTreeWriter: seconds = val.toEpochMilli() / MILLIS_PER_SECOND - BASE_TIMESTAMP
WriterImpl_TimestampTreeWriter->>Storage: write seconds and nanos
deactivate WriterImpl_TimestampTreeWriter
Client->>OrcSerde: deserialize(row)
OrcSerde->>LazyTimestampTreeReader: next(previous TimestampWritableV2)
activate LazyTimestampTreeReader
LazyTimestampTreeReader->>LazyTimestampTreeReader: millis = (data.next() + BASE_TIMESTAMP) * MILLIS_PER_SECOND
LazyTimestampTreeReader->>LazyTimestampTreeReader: adjust millis by nanos
LazyTimestampTreeReader->>LazyTimestampTreeReader: timestamp.setTimeInMillis(millis)
LazyTimestampTreeReader->>LazyTimestampTreeReader: timestamp.setNanos(newNanos)
LazyTimestampTreeReader-->>OrcSerde: TimestampWritableV2
deactivate LazyTimestampTreeReader
OrcSerde-->>Client: row with Timestamp via OrcLazyTimestampObjectInspector
Class diagram for updated ORC SerDe and inspectorsclassDiagram
class OrcSerde {
+OrcSerde()
-OrcSerdeRow row
-ObjectInspector inspector
+void initialize(Configuration conf, Properties table, Properties partition)
+Object deserialize(Writable blob)
+ObjectInspector getObjectInspector()
+SerDeStats getSerDeStats()
+Class getSerializedClass()
+Writable serialize(Object obj, ObjectInspector objInspector)
}
class AbstractSerDe {
<<abstract>>
+void initialize(Configuration conf, Properties table, Properties partition)
+Object deserialize(Writable blob)
+ObjectInspector getObjectInspector()
+SerDeStats getSerDeStats()
+Class getSerializedClass()
+Writable serialize(Object obj, ObjectInspector objInspector)
}
class SerDe {
<<interface>>
}
OrcSerde --|> AbstractSerDe
AbstractSerDe ..|> SerDe
class OrcStruct {
+List~String~ fieldNames
+List~Object~ fields
+List~String~ getFieldNames()
+void setFieldNames(List~String~ fieldNames)
}
class OrcStruct_Field {
+String fieldName
+int offset
+String getFieldName()
+ObjectInspector getFieldObjectInspector()
+String getFieldComment()
+int getFieldID()
}
class StructField {
<<interface>>
+String getFieldName()
+ObjectInspector getFieldObjectInspector()
+String getFieldComment()
+int getFieldID()
}
OrcStruct_Field ..|> StructField
OrcStruct_Field --> OrcStruct
class OrcStructInspector {
+List~StructField~ fields
+OrcStructInspector(StructTypeInfo info)
}
OrcStructInspector --> OrcStruct_Field
class OrcLazyStructObjectInspector {
+List~StructField~ fields
+OrcLazyStructObjectInspector(StructTypeInfo info)
+Object getStructFieldData(Object data, StructField fieldRef)
}
class OrcLazyRowObjectInspector {
+OrcLazyRowObjectInspector(StructTypeInfo info)
+Object getStructFieldData(Object data, StructField fieldRef)
}
OrcLazyRowObjectInspector --|> OrcLazyStructObjectInspector
OrcLazyStructObjectInspector --> OrcStruct_Field
class OrcLazyTimestampObjectInspector {
+OrcLazyTimestampObjectInspector()
+Timestamp getPrimitiveJavaObject(Object o)
}
class OrcLazyPrimitiveObjectInspector~T,W~ {
<<abstract>>
+W getPrimitiveWritableObject(Object o)
}
class TimestampWritableV2
class Timestamp
OrcLazyTimestampObjectInspector --|> OrcLazyPrimitiveObjectInspector~OrcLazyTimestamp,TimestampWritableV2~
OrcLazyTimestampObjectInspector --> TimestampWritableV2
OrcLazyTimestampObjectInspector --> Timestamp
class OrcLazyTimestamp {
+Object previous
+OrcLazyTimestamp(LazyTimestampTreeReader treeReader)
+OrcLazyTimestamp(OrcLazyTimestamp copy)
}
class LazyTimestampTreeReader {
+Object next(Object previous)
}
OrcLazyTimestamp --> LazyTimestampTreeReader
class OrcInputFormat {
+boolean validateInput(FileSystem fs, HiveConf conf, List~FileStatus~ files)
}
class WriterImpl {
+static int MILLIS_PER_SECOND
+static long BASE_TIMESTAMP
}
class TimestampTreeWriter {
+void write(Object obj)
}
WriterImpl o-- TimestampTreeWriter
TimestampTreeWriter --> Timestamp
class LazyTimestampTreeReader_TimestampWritableV2 {
+Object next(Object previous)
}
LazyTimestampTreeReader_TimestampWritableV2 --> TimestampWritableV2
LazyTimestampTreeReader_TimestampWritableV2 --> Timestamp
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In TestMemoryManager, the new lambda-based matcher in the verify call makes the closeTo helper unused; consider removing closeTo (and any now-unused imports) to keep the tests lean.
- The POM now globally skips FindBugs, PMD, and JaCoCo; if these were disabled only to get the Hive 4 / Java 17 upgrade building, consider re-enabling them or scoping the skips more narrowly to avoid losing static analysis across the whole module.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In TestMemoryManager, the new lambda-based matcher in the verify call makes the closeTo helper unused; consider removing closeTo (and any now-unused imports) to keep the tests lean.
- The POM now globally skips FindBugs, PMD, and JaCoCo; if these were disabled only to get the Hive 4 / Java 17 upgrade building, consider re-enabling them or scoping the skips more narrowly to avoid losing static analysis across the whole module.
## Individual Comments
### Comment 1
<location path="pom.xml" line_range="52-54" />
<code_context>
<properties>
+ <!-- Java 17 Configuration -->
+ <maven.compiler.source>17</maven.compiler.source>
+ <maven.compiler.target>17</maven.compiler.target>
+ <maven.compiler.release>17</maven.compiler.release>
+ <project.build.targetJdk>17</project.build.targetJdk>
+
</code_context>
<issue_to_address>
**suggestion:** Using `source`/`target` together with `release` is redundant and can be confusing; prefer one or the other.
For Java 9+, `<maven.compiler.release>` is sufficient, as it sets both language level and target API. To avoid redundant configuration and potential mismatches, remove `source` and `target` and rely solely on `<maven.compiler.release>17</maven.compiler.release>`.
</issue_to_address>
### Comment 2
<location path="src/test/java/com/facebook/hive/orc/TestMemoryManager.java" line_range="96-105" />
<code_context>
-
- private static DoubleMatcher closeTo(double value, double error) {
- return new DoubleMatcher(value, error);
+ private static ArgumentMatcher<Double> closeTo(double expected, double error) {
+ return new ArgumentMatcher<Double>() {
+ @Override
+ public boolean matches(Double val) {
+ return val != null && Math.abs(val - expected) <= error;
+ }
+
+ @Override
+ public String toString() {
+ return "close to " + expected + " (±" + error + ")";
+ }
+ };
}
</code_context>
<issue_to_address>
**suggestion:** Helper matcher `closeTo` is no longer used in the test suite
This new helper looks like a good replacement for `DoubleMatcher`, but after switching `testCallback` to a lambda it’s now unused. Please either adopt `closeTo` in the `verify(...).checkMemory(...)` calls or remove it to avoid dead code and keep the tests simple.
Suggested implementation:
```java
```
If there are any remaining references to `closeTo` elsewhere in `TestMemoryManager.java` (e.g. in `verify(...).checkMemory(doubleThat(closeTo(...)))`), those will now fail to compile and should be replaced with either `anyDouble()`, `eq(...)`, or an inline matcher consistent with the existing tests’ intent.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| <maven.compiler.source>17</maven.compiler.source> | ||
| <maven.compiler.target>17</maven.compiler.target> | ||
| <maven.compiler.release>17</maven.compiler.release> |
There was a problem hiding this comment.
suggestion: Using source/target together with release is redundant and can be confusing; prefer one or the other.
For Java 9+, <maven.compiler.release> is sufficient, as it sets both language level and target API. To avoid redundant configuration and potential mismatches, remove source and target and rely solely on <maven.compiler.release>17</maven.compiler.release>.
| private static ArgumentMatcher<Double> closeTo(double expected, double error) { | ||
| return new ArgumentMatcher<Double>() { | ||
| @Override | ||
| public boolean matches(Double val) { | ||
| return val != null && Math.abs(val - expected) <= error; | ||
| } | ||
|
|
||
| @Override | ||
| public String toString() { | ||
| return "close to " + expected + " (±" + error + ")"; |
There was a problem hiding this comment.
suggestion: Helper matcher closeTo is no longer used in the test suite
This new helper looks like a good replacement for DoubleMatcher, but after switching testCallback to a lambda it’s now unused. Please either adopt closeTo in the verify(...).checkMemory(...) calls or remove it to avoid dead code and keep the tests simple.
Suggested implementation:
If there are any remaining references to closeTo elsewhere in TestMemoryManager.java (e.g. in verify(...).checkMemory(doubleThat(closeTo(...)))), those will now fail to compile and should be replaced with either anyDouble(), eq(...), or an inline matcher consistent with the existing tests’ intent.
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Now that
OrcSerdeextendsAbstractSerDeand only overridesinitialize(Configuration, Properties, Properties), consider also adding a delegatinginitialize(Configuration, Properties)overload so existing callers typed asSerDecan still use the two-arg signature without needing to downcast toOrcSerde(and the tests can keep using the interface type).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Now that `OrcSerde` extends `AbstractSerDe` and only overrides `initialize(Configuration, Properties, Properties)`, consider also adding a delegating `initialize(Configuration, Properties)` overload so existing callers typed as `SerDe` can still use the two-arg signature without needing to downcast to `OrcSerde` (and the tests can keep using the interface type).Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Full Build Result since we don't have a CI pipeline here: |
|
@tdcmeehan, thank you for reviewing the other PRs. I have rebased the PR on the latest master. Please have a look whenever you get a chance. Thanks! |
Required for prestodb/presto#24571
The changes have already been tested with CI in the above-linked PR.
Depends on #13