[Iceberg v3] VARIANT type#27753
Conversation
56b80e4 to
9336d65
Compare
|
Thanks for putting this up. We talked about it a few weeks ago over Slack, so I wanted to capture my thoughts here. SQL 2023 defines a rich SQL/JSON data model and set of operations and semantics that's almost a 1-to-1 mapping with Iceberg Variant. Note that, importantly, SQL/JSON does not mandate a textual encoding, but only conversion functions to and from text that produce RFC 8259 JSON.
Instead of adding a new data type, we should:
This gives us:
|
|
@martint Thanks for writing this up — I agree that SQL:2023 defines a rich SQL/JSON data model, and at the abstract model level there is significant overlap with Iceberg Variant (null / scalar / array / object / sequence). However, I think it is important to separate data model similarity from type semantics, interoperability guarantees, and physical representation. This PR is intentionally focused on first-class Iceberg Variant support. That implies a number of constraints that are part of the Iceberg specification and cannot be relaxed or evolved independently:
Concretely, Iceberg Variant has a number of limitations that are part of the contract:
Because these constraints are fundamental to Variant, equating it with SQL/JSON is dangerous in both directions:
For these reasons, representing Variant as a distinct Trino type is deliberate. It makes the interoperability boundary explicit and avoids accidental promises about future extensibility or semantic compatibility. It is also important to note that JSON is already a trivial cast away. This PR supports casts in and out of Variant, including JSON ↔ VARIANT, and users who want JSON semantics can get them immediately via explicit casts. The hard problem here is not JSON manipulation — it is correctness and fidelity of Variant itself. More broadly, if Trino were to implement deeper SQL/JSON compliance, Variant is not a good internal encoding for that purpose. Variant values are expensive and difficult to construct:
These characteristics are acceptable and necessary for an interchange format, but they are not a good foundation for a general-purpose, extensible SQL/JSON implementation. Refactoring Trino’s JSON type to adopt a new internal encoding and updating the full SQL/JSON function surface would be a valuable effort, but it is a large, orthogonal project with broad impact. This PR is intentionally scoped to deliver correct, interoperable Iceberg Variant support without blocking on that work. In short, this PR is about delivering real Variant semantics with strong interoperability guarantees. SQL/JSON compliance and JSON internals can evolve independently without conflating the two or weakening either. |
In that case, we should keep the type and functions confined to the Iceberg plugin.
That's fair. There would need to be some conversion/checking on write to ensure the contents have the expected precision that the connector needs. Alternatively, we could have an envelope with metadata about the contained data types so that the connector can decide whether to store as is or convert.
SQL/JSON doesn't have those either, so that's not a problem.
This could be a problem, yes. But in what ways? Can you elaborate?
Not really. It doesn't encourage extensions -- that's a choice we would make.
I could get behind that as long as we don't then go and try to replicate all the operations we have for JSON to work for variant. It's a slippery slope.
Not sure I understand this. Variant is not just an interchange format. If that were the purpose, we'd might as well use varbinary instead.
There are only a handful of functions that expect JSON type at this point:
Databases implement variant semantics in different ways, with no interoperable semantics. In particular databrick's, snowflake's and Iceberg variants differ from each other in important ways. Other databases use different types to achieve those semantics:
So, if by interoperable we mean interoperable with other engines that read/write Iceberg, this should by an Iceberg connector feature, not an engine-level feature. |
As much as i don't like duplicated logic especially if it's complex, I also want Trino to be less performant than it could be (and less performant than other systems). For example, having to convert whole variant document to json type in order to perform a path lookup would sound wasteful. My point is not really to decide on the direction here. Rather, I only want us not to make promises that we may regret in the future, and to keep the door open for future improvements around Trino's JSON and VARIANT story. |
The type is required by DeltaLake. I also think Hive supports this. The problem is the stack type is a Java object and for these system to interop you need the type to be available to both and that means the SPI. This is a type that is defined by a specification that is well established. IMO we should just have this in the SPI, so any connector can use it. I expect that this is something we might want to use in the snowflake connector. I don't want to get into a debate about types, but the IMO the isolation of types into plugins does not work, and I think we should back this feature out of Trino. Referencing a type by name and trying to do anything non trivial with the type is a total mess. The complexity isn't worth the effort to keep types out. If you insist I'll do it because I want this feature in more than I want to debate this point, but I feel extremely strongly that this is the wrong direction (and I feel the same about the two other isolated types in trino).
My point is ther is no freedom. I think we are much better off having connectors expose their native type and then they can trivially be cast to our internal JSON to use the JSON syntax and functions (this would be super trivial if we landed the postgress style casts). So Postgres can expose raw jsonb, redshift can expose it custom thing, and so on. You can manipulate them directly without loosing any fidelity. If you want to move to a different type, you just cast. IMO this is the best of both worlds. You can work directly with your native type, and then you can move to something else if you want a different feature or if you want to interoperate with a different system. JSON IMO isn't always the best.
This is bad in both directions. What if Iceberg adds interval (there is a proposal), then what do you do? Does SQL/JSON support UUID, because Iceberg Variant already has that. There will always be an impendency mismatch here.
The Iceberg spec defines equivilence classes, which tell you how you are supposed to extract and compare types. One strange thing is that exact number, float, and decimal are separate types, and therefore not equal.
Again cuts both ways. The main point is we do not control the variant spec. Expanding it is bad, not fully support it is bad, not having choices in Trino would suck.
I don't think we should be reimplementing the json operations. We have JSON to do that. I think the function set we already have (casts+dereference) is what we need.
Variant is a good for a format for Iceberg and Delta to store data. It does not make a good format for in memory trino manipulation. If we want a good format for dynamic structures like this, I can suggest a better design. |
|
@ebyhr @raunaqmorarka @findepi, I split out the code to enable v3 and the commit to reject tables/commands for unsupported v3 features into PR #27786. That PR is designed to hande and test all cases you have mentioned and any thing else I could think of. The variant branch is not based on that PR. |
That would work but have undesired effect that direct field lookup is significantly faster on a large variant value compared with equivalent json function call. I envision a future where Trino JSON type can be simply backed by Iceberg VARIANT value directly, without any data conversion and json functions can operate on those documents. I acknowledge that Trino JSON may want to be more type-rich than Iceberg VARIANT, so that runtime representation doesn't have to be the only one allowed. Why? For intersection of two important assumptions
My point is not really to decide on the direction here. Rather, I only want us not to make promises that we may regret in the future, and to keep the door open for future improvements around Trino's JSON and VARIANT story. |
75a8a6b to
ee3ffd3
Compare
|
I updated this PR with support for Iceberg Variant with ORC and AVRO formats, so all file formats are now supported |
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis pull request implements comprehensive support for the VARIANT data type in Trino according to the Apache Iceberg specification. It introduces the core VARIANT type with block encoding, operators, and cast functions; defines variant encoding/decoding utilities; adds ORC and Parquet read/write support; integrates with the Iceberg plugin; and includes client-side protocol support. The implementation spans SPI layer (VariantType, VariantBlock), core Trino server functions, connector-specific readers/writers, and documentation updates. Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
|
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java (1)
313-318:⚠️ Potential issue | 🔴 CriticalMissing
returnafter handling FIXED type causes fall-through to heap buffer wrapping.When
icebergType.typeId().equals(FIXED)is true, the method writes the byte array but then falls through to line 317 which attempts to castobjecttoByteBuffer. This will throw aClassCastExceptionsinceobjectis abyte[]for FIXED types.🐛 Proposed fix
if (type instanceof VarbinaryType) { if (icebergType.typeId().equals(FIXED)) { VARBINARY.writeSlice(builder, Slices.wrappedBuffer((byte[]) object)); + return; } VARBINARY.writeSlice(builder, wrappedHeapBuffer((ByteBuffer) object)); return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java` around lines 313 - 318, In IcebergAvroDataConversion, the VarbinaryType branch incorrectly falls through after handling FIXED: inside the conditional that checks icebergType.typeId().equals(FIXED) (within the VarbinaryType handling) you must write the byte[] to the builder and then immediately return to avoid the subsequent cast to ByteBuffer; update the method (the VarbinaryType handling block in IcebergAvroDataConversion) to add a return after VARBINARY.writeSlice(builder, Slices.wrappedBuffer((byte[]) object)) so FIXED byte[] cases don't reach the wrappedHeapBuffer((ByteBuffer) object) call.
🧹 Nitpick comments (14)
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java (1)
5039-5060: UseTestTable(try-with-resources) to avoid cleanup leaks on assertion failure.If an assertion fails before Line 5059,
test_nested_variantmay remain and cascade into follow-up failures.♻️ Suggested cleanup hardening
`@Test` public void testNestedVariant() { // Tests variant nested inside array and row types - assertUpdate("CREATE TABLE test_nested_variant (" + - " variant_array array(variant), " + - " variant_row row(v variant, i integer)) " + - "WITH (FORMAT_VERSION = 3)"); - - assertUpdate("INSERT INTO test_nested_variant VALUES (" + - "ARRAY[CAST(1 AS VARIANT), CAST('hello' AS VARIANT), CAST(NULL AS VARIANT)], " + - "CAST(ROW(42, 123) AS ROW(v variant, i integer)))", 1); - assertUpdate("INSERT INTO test_nested_variant VALUES (NULL, NULL)", 1); - - assertThat(query("SELECT * FROM test_nested_variant")) + try (TestTable table = newTrinoTable( + "test_nested_variant", + "(variant_array array(variant), variant_row row(v variant, i integer)) WITH (FORMAT_VERSION = 3)")) { + assertUpdate("INSERT INTO " + table.getName() + " VALUES (" + + "ARRAY[CAST(1 AS VARIANT), CAST('hello' AS VARIANT), CAST(NULL AS VARIANT)], " + + "CAST(ROW(42, 123) AS ROW(v variant, i integer)))", 1); + assertUpdate("INSERT INTO " + table.getName() + " VALUES (NULL, NULL)", 1); + + assertThat(query("SELECT * FROM " + table.getName())) .matches("VALUES (" + "ARRAY[CAST(1 AS VARIANT), CAST('hello' AS VARIANT), CAST(NULL AS VARIANT)], " + "CAST(ROW(42, 123) AS ROW(v variant, i integer))), " + "(NULL, NULL)"); - - assertUpdate("DROP TABLE test_nested_variant"); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java` around lines 5039 - 5060, The test method testNestedVariant should use a TestTable in a try-with-resources block instead of manually creating and dropping "test_nested_variant" so the table is always cleaned up on assertion failure; replace the explicit assertUpdate("CREATE TABLE ...") and final assertUpdate("DROP TABLE ...") with creating a TestTable (new TestTable(this::queryRunner, "test_nested_variant", "variant_array array(variant), variant_row row(v variant, i integer) WITH (FORMAT_VERSION = 3)")) in a try-with-resources and use table.getName() in the subsequent INSERT/SELECT assertions so the resource is auto-closed and the table removed even if assertions fail.core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java (1)
22-23: Clarify or guardDEFAULT_RETURN_VALUEsentinel ambiguity.
get()/putIfAbsent()return-1for “not found”, but-1is also a legal stored value. Consider either rejecting-1values or documenting that callers must pair reads withcontainsKey()to disambiguate.Optional guard-based fix
public int putIfAbsent(final int k, final int v) { + if (v == DEFAULT_RETURN_VALUE) { + throw new IllegalArgumentException("Value -1 is reserved as DEFAULT_RETURN_VALUE"); + } final int pos = find(k); if (pos >= 0) { return value[pos]; } insert(-pos - 1, k, v); return DEFAULT_RETURN_VALUE; }Also applies to: 97-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java` around lines 22 - 23, The sentinel DEFAULT_RETURN_VALUE = -1 conflicts with legal stored values; update Int2IntOpenHashMap to guard against storing -1 by validating in put(...) and putIfAbsent(...) (and any insert helpers) and throw IllegalArgumentException if value == DEFAULT_RETURN_VALUE, and add a short javadoc to get(), putIfAbsent(), DEFAULT_RETURN_VALUE stating callers must not store -1 and to use containsKey() to disambiguate if they relied on the old behavior; this makes get()/putIfAbsent() safe to keep returning DEFAULT_RETURN_VALUE while preventing ambiguous stored values.testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java (1)
337-343: Add defensive payload type checks in VARIANT decoding.The conversion works for valid data, but explicit checks will fail faster with clearer errors than
ClassCastExceptionif payload shape changes.♻️ Suggested hardening diff
if (type == VARIANT) { - List<?> variantList = (List<?>) value; + checkArgument(value instanceof List<?>, "Expected variant value to be a list"); + List<?> variantList = (List<?>) value; checkArgument(variantList.size() == 2, "Expected variant value to be a list of size 2"); + Object metadataObject = variantList.getFirst(); + Object dataObject = variantList.getLast(); + checkArgument(metadataObject instanceof byte[], "Expected variant metadata to be binary"); + checkArgument(dataObject instanceof byte[], "Expected variant value to be binary"); return Variant.from( - Metadata.from(Slices.wrappedBuffer((byte[]) variantList.getFirst())), - Slices.wrappedBuffer((byte[]) variantList.getLast())); + Metadata.from(Slices.wrappedBuffer((byte[]) metadataObject)), + Slices.wrappedBuffer((byte[]) dataObject)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java` around lines 337 - 343, In the VARIANT decoding branch in TestingTrinoClient (the block checking if (type == VARIANT)), add explicit defensive checks before casting: verify that value is a List, that the list size is exactly 2, and that both elements are byte[] (use instanceof checks) so you can throw a clear IllegalArgumentException with a descriptive message if any check fails; only then call Variant.from(Metadata.from(Slices.wrappedBuffer((byte[]) first)), Slices.wrappedBuffer((byte[]) second)). Ensure you reference the local variables value, variantList, Variant.from, Metadata.from, and Slices.wrappedBuffer in your checks and error messages.lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java (1)
108-113: Address or remove TODO comments before merging.Lines 109-110 contain development notes that should be resolved:
- "todo is there validation here" — The
checkArgumenton line 111 validates field count, but the comment suggests uncertainty about completeness- "todo should I extract the metadata and values fields here" — Clarify whether additional field extraction is needed
If no additional validation or extraction is required, remove the TODOs to avoid confusion.
🧹 Suggested cleanup
if (LogicalTypeAnnotation.variantType(Header.VERSION).equals(annotation)) { - // todo is there validation here - // todo should I extract the metadata and values fields here checkArgument(group.getFieldCount() == 2, "Invalid variant: expected 2 fields (metadata, value): %s", group); return visitor.variant(group); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java` around lines 108 - 113, The two inline TODO comments inside the LogicalTypeAnnotation.variantType(Header.VERSION) branch are stale—either implement the missing validation/extraction or remove them; since checkArgument(group.getFieldCount() == 2, ...) already enforces the expected structure, remove the "todo is there validation here" comment and either remove or implement the "todo should I extract the metadata and values fields here": if additional checks are required, extract fields from group (use group.getFields()/group.getField(int) and validate types) and pass the extracted metadata/value to visitor.variant or validate before calling visitor.variant(group), otherwise delete both TODO lines to leave a clear code path around LogicalTypeAnnotation.variantType(Header.VERSION), group, and visitor.variant.lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java (1)
153-209: Consider extracting shared logic betweenVariantRepLevelWriterProviderandRowRepLevelWriterProvider.The
writeRepetitionLevelsimplementation is nearly identical toRowRepLevelWriterProvider(lines 106-150). Both follow the same null-handling pattern: writeparentLeveldirectly for nulls, batch consecutive non-nulls to the nested writer.A shared base class or a parameterized helper could reduce duplication.
♻️ Potential refactor approach
// Extract a common abstract class or helper method: private static RepetitionLevelWriter createNestedBlockWriter( Block block, RepetitionLevelWriter nestedWriter, ColumnDescriptorValuesWriter encoder) { return new RepetitionLevelWriter() { private int offset; `@Override` public void writeRepetitionLevels(int parentLevel) { writeRepetitionLevels(parentLevel, block.getPositionCount()); } `@Override` public void writeRepetitionLevels(int parentLevel, int positionsCount) { // shared implementation } }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java` around lines 153 - 209, VariantRepLevelWriterProvider and RowRepLevelWriterProvider duplicate the same writeRepetitionLevels logic; extract the shared logic into a single helper or abstract factory and have both providers use it. Implement a private static factory (e.g., createNestedBlockWriter or an abstract base class) that accepts the Block, the nested RepetitionLevelWriter (from nestedWriterOptional.orElseThrow()), and the ColumnDescriptorValuesWriter encoder, and returns a RepetitionLevelWriter which implements writeRepetitionLevels (both overloads) using the existing null-handling + batching logic; then replace the anonymous RepetitionLevelWriter creation in VariantRepLevelWriterProvider and RowRepLevelWriterProvider to call that helper. Ensure you keep the checks (requireNonNull, checkArgument) in place and preserve offset state inside the returned writer.core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java (1)
46-58: Clarify the-1sentinel values in the positions array.The positions array
{-1, 3, 0, 2, -1}contains-1values at indices 0 and 4. While the method callappendPositions(source, positions, 1, length)skips index 0 (starts at offset 1) and only processeslength=3positions, having-1as array elements is unusual and potentially confusing.If this is intentional test padding to verify offset handling, consider adding a comment explaining the intent. If positions are meant to be valid indices,
-1would cause anArrayIndexOutOfBoundsExceptionif accidentally accessed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java` around lines 46 - 58, In testAppendPositionsUpdatesStatusWithOffsetSource, the positions array passed to VariantBlockBuilder.appendPositions contains -1 sentinel values which are only there to pad entries outside the processed range; update the test to either replace the -1s with valid indices or (preferable) add a brief inline comment next to the positions array declaration explaining that -1 is intentional padding and that appendPositions(source, positions, 1, length) starts at offset 1 and processes only the three middle entries, so the -1s will never be accessed; reference the test method name testAppendPositionsUpdatesStatusWithOffsetSource and the appendPositions call when making this change.core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java (1)
43-64: Potential partial state corruption on exception duringfinalize().If an exception occurs during the nested
plannedValue.finalize()calls (lines 58-60), thefieldIdsarray may be partially remapped whilesizeremains-1. A subsequent call tofinalize()would pass the guard check but re-applysortedFieldIdMappingto already-remapped IDs, producing incorrect results.Consider either:
- Setting a separate
finalizedboolean flag at the start- Computing remapped IDs into a local array first, then assigning after all nested finalizations succeed
♻️ Suggested approach using a flag
private int size = -1; + private boolean finalizing; `@Override` public void finalize(IntUnaryOperator sortedFieldIdMapping) { if (size >= 0) { throw new IllegalStateException("finalize() already called"); } + if (finalizing) { + throw new IllegalStateException("finalize() already in progress"); + } + finalizing = true; int maxFieldId = -1;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java` around lines 43 - 64, The finalize method currently mutates fieldIds before calling each VariantWriter.PlannedValue.finalize, risking partial remapping if a nested finalize throws; change finalize(IntUnaryOperator sortedFieldIdMapping) to first compute remapped IDs into a local int[] remappedFieldIds (loop over fieldIds and apply sortedFieldIdMapping), then run the loop that calls plannedValue.finalize(sortedFieldIdMapping) and accumulates totalElementLength without touching fieldIds, and only after all nested finalizations succeed assign fieldIds = remappedFieldIds and set size = encodedObjectSize(...); keep the existing guard on size so failed attempts leave the original state intact.core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java (1)
92-100: UnusedconnectorSessionparameter.The
connectorSessionparameter is declared but not used in thetoRowmethod. If it's required for future use or to match a method handle signature convention, consider adding a comment. Otherwise, this creates unnecessary parameter passing overhead.Is ConnectorSession commonly passed to Trino cast methods for future extensibility even when unused?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java` around lines 92 - 100, The connectorSession parameter in VariantToRowCast.toRow is declared but never used; either remove it from the signature (and update any generated method handles/call sites) or explicitly mark it as intentionally unused to avoid confusion and warnings. If the parameter must remain to satisfy a method-handle/signature convention, rename it to unusedConnectorSession or add a short comment and a `@SuppressWarnings`("unused") on the parameter (or method) to document intent; otherwise remove connectorSession from the toRow method and adjust callers/generation so the signature no longer includes it.core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java (1)
644-695: Cache the filtered set of recursive row cast functions or extract the CAST function lookup outside the loop.Both
isRecursiveCastFromRowandisRecursiveCastToRowiterate through all registered CAST functions viametadata.getFunctions(...)on each invocation. While the underlying function metadata is cached in immutable maps, newCatalogFunctionMetadatawrapper objects are created on each call (seeMetadataManager.getBuiltinFunctionslines 2716-2718). Since these methods are called fromcanCast()during type constraint solving, they may be invoked multiple times per query, creating unnecessary wrapper objects and redundant iterations.Consider:
- Pre-computing and caching the set of CAST functions with row type constraints during initialization
- Storing them as instance fields in
SignatureBinderto avoid repeated lookups- At minimum, cache the result within the constraint-solving phase to avoid re-filtering the same functions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java` around lines 644 - 695, The methods isRecursiveCastFromRow and isRecursiveCastToRow repeatedly call metadata.getFunctions(...) and re-create wrapper objects on each invocation; cache the filtered CAST functions (or their Signatures) once and reuse them. Extract the CatalogFunctionMetadata lookup and filtering logic out of both methods (e.g., run once in the SignatureBinder constructor or via a lazily-initialized final field) to produce a List/Set of CAST signatures that have exactly one type variable constraint (no long constraints) and a row variadic bound (typeVariableConstraint.isRowType()); then have isRecursiveCastFromRow and isRecursiveCastToRow iterate that cached collection and only perform the lightweight checks (matching toType/fromType against signature.getReturnType()/getArgumentTypes()). Ensure the cached container uses the same identifying symbols: metadata.getFunctions(...), mangleOperatorName(CAST), Signature, TypeVariableConstraint, isRecursiveCastFromRow, isRecursiveCastToRow.lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java (1)
345-410: Factor duplicated nullable-group def-level logic and add a type guard.
VariantDefLevelWriterProvidercurrently duplicates the same null-run batching logic used byRowDefLevelWriterProvider, which increases divergence risk. Also, unlike the row provider constructor, it does not fail fast on a wrong block kind.Proposed hardening change
VariantDefLevelWriterProvider(Block block, int maxDefinitionLevel) { this.block = requireNonNull(block, "block is null"); this.maxDefinitionLevel = maxDefinitionLevel; + checkArgument(block.getUnderlyingValueBlock() instanceof VariantBlock, "block is not a variant block"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java` around lines 345 - 410, VariantDefLevelWriterProvider duplicates the nullable-group batching from RowDefLevelWriterProvider and lacks a constructor type-guard; refactor by extracting the null-run batching loop into a shared helper (e.g., a private static method in DefLevelWriterProviders like processNullableRuns or writeNullableRuns) that takes the Block, offset, positionsCount, nestedWriter and encoder and returns ValuesCount, then have both VariantDefLevelWriterProvider.getDefinitionLevelWriter and RowDefLevelWriterProvider call that helper instead of duplicating logic; also add the same fast-fail check in VariantDefLevelWriterProvider's constructor that RowDefLevelWriterProvider has (validate block.getBlockKind() == Block.Kind.VARIANT or the project’s equivalent) and throw an IllegalArgumentException if it doesn't match.lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java (2)
98-106: Clarify variable reuse forreadOffset.The variable
readOffsetis reassigned topresentStream.countBitsSet(readOffset)on line 102, which can be confusing since it changes the meaning from "positions to skip" to "non-null values to skip in children". This works correctly but could benefit from a clarifying comment or using a separate variable.♻️ Optional: Use a separate variable for clarity
if (readOffset > 0) { if (presentStream != null) { // skip ahead the present bit reader, but count the set bits // and use this as the skip size for the field readers - readOffset = presentStream.countBitsSet(readOffset); + int nonNullsToSkip = presentStream.countBitsSet(readOffset); + metadataReader.prepareNextRead(nonNullsToSkip); + valueReader.prepareNextRead(nonNullsToSkip); + } + else { + metadataReader.prepareNextRead(readOffset); + valueReader.prepareNextRead(readOffset); } - metadataReader.prepareNextRead(readOffset); - valueReader.prepareNextRead(readOffset); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java` around lines 98 - 106, The reuse of readOffset (reassigned from positions-to-skip to non-null child-count via presentStream.countBitsSet(readOffset)) is confusing; change this by introducing a new variable (e.g., nonNullToSkip) to hold presentStream.countBitsSet(readOffset) and pass that to metadataReader.prepareNextRead(...) and valueReader.prepareNextRead(...), or at minimum add a brief comment above the reassignment explaining that readOffset is being converted from total positions to non-null child count before calling metadataReader.prepareNextRead and valueReader.prepareNextRead; update references to use the new variable where appropriate.
132-136: Minor: Extract duplicate RLE null block creation.The same pattern for creating an RLE null block is duplicated for both
metadataBlockandvalueBlock.♻️ Optional: Extract to a helper or constant
+ private static final Block NULL_VARBINARY_BLOCK = VARBINARY.createBlockBuilder(null, 1).appendNull().build(); + else { // All values are null - metadataBlock = RunLengthEncodedBlock.create(VARBINARY.createBlockBuilder(null, 0).appendNull().build(), nextBatchSize); - valueBlock = RunLengthEncodedBlock.create(VARBINARY.createBlockBuilder(null, 0).appendNull().build(), nextBatchSize); + metadataBlock = RunLengthEncodedBlock.create(NULL_VARBINARY_BLOCK, nextBatchSize); + valueBlock = RunLengthEncodedBlock.create(NULL_VARBINARY_BLOCK, nextBatchSize); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java` around lines 132 - 136, The duplicated creation of an RLE null block for metadataBlock and valueBlock should be extracted into a shared helper/constant to avoid repetition; add a private static helper (e.g., createRleNullBlock(int runLength) or a constant RLE_NULL_BLOCK factory) that builds the VARBINARY.createBlockBuilder(...).appendNull().build() and wraps it with RunLengthEncodedBlock.create(..., runLength), then replace both assignments to metadataBlock and valueBlock in VariantColumnReader with calls to that helper/constant.core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java (1)
389-397: Minor:expectedEntriesparameter not passed to nested builder factories.The
newBlockBuilderLikemethod creates nested builders usingnewBlockBuilderLike(blockBuilderStatus)without passingexpectedEntries. The nested builders will use their default sizing.♻️ Optional: Pass expectedEntries to nested builders
`@Override` public BlockBuilder newBlockBuilderLike(int expectedEntries, BlockBuilderStatus blockBuilderStatus) { return new VariantBlockBuilder( blockBuilderStatus, - (VariableWidthBlockBuilder) metadataBlockBuilder.newBlockBuilderLike(blockBuilderStatus), - (VariableWidthBlockBuilder) valuesBlockBuilder.newBlockBuilderLike(blockBuilderStatus), + (VariableWidthBlockBuilder) metadataBlockBuilder.newBlockBuilderLike(expectedEntries, blockBuilderStatus), + (VariableWidthBlockBuilder) valuesBlockBuilder.newBlockBuilderLike(expectedEntries, blockBuilderStatus), new boolean[expectedEntries]); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java` around lines 389 - 397, The nested builders are created without forwarding expectedEntries, so change the two calls to metadataBlockBuilder.newBlockBuilderLike(...) and valuesBlockBuilder.newBlockBuilderLike(...) to pass expectedEntries as the first argument (i.e., call newBlockBuilderLike(expectedEntries, blockBuilderStatus)) when constructing a new VariantBlockBuilder in VariantBlockBuilder.newBlockBuilderLike; keep the VariantBlockBuilder constructor usage and the new boolean[expectedEntries] as-is.core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java (1)
99-112: Document the invariant for 3-byte offset reads more prominently.The comment on lines 104-108 explains the assumption that reading 4 bytes is safe for 3-byte offsets, but this invariant is subtle. If future code changes violate this assumption, it could cause out-of-bounds reads.
Consider adding a validation or making the invariant more explicit in the encoding layer that guarantees this property.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java` around lines 99 - 112, The readOffset method assumes a safe extra byte when handling 3-byte offsets (case 3 -> data.getInt(offset) & 0xFFFFFF), which is a subtle invariant that can lead to OOB reads if violated; update the code to validate this invariant or enforce it in the encoding layer by (a) adding an explicit bounds check/assert in readOffset (or a helper like validateOffsetBytes) to ensure data.length - offset >= 4 before reading 4 bytes, and/or (b) making the encoder guarantee a padding byte after every 3-byte offset and documenting that in the encoder’s write method so callers of readOffset (and the VariantUtils class) can rely on it. Ensure references to readOffset and any encoder method (the encoder write method that serializes field IDs/offsets) are used so reviewers can find and verify the added validation or encoding guarantee.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@core/trino-main/src/main/java/io/trino/type/VariantOperators.java`:
- Around line 42-68: The dereference operators currently call
Variant#getArrayLength() and Variant#getObjectField() directly and will let
wrong-kind cases throw IllegalStateException; update both dereference overloads
to validate the Variant kind first and throw a
TrinoException(INVALID_FUNCTION_ARGUMENT, ...) with a clear user-facing message
when the value is not the expected container: for the long-index overload,
verify the Variant is an array (e.g., Variant.isArray() or equivalent) before
calling getArrayLength() and then run checkArrayIndex; for the fieldName
overload, verify the Variant is an object (e.g., Variant.isObject() or
equivalent) before calling getObjectField() and return null or the field value
as before.
In `@core/trino-main/src/main/java/io/trino/util/variant/VariantUtil.java`:
- Around line 447-475: The STRING and SHORT_STRING branches in asJavaDecimal
currently call new BigDecimal(...) which can throw NumberFormatException and
bypass the intended INVALID_CAST_ARGUMENT handling; wrap the
string-to-BigDecimal conversion inside a try/catch for NumberFormatException in
both the STRING case (inside the PRIMITIVE -> STRING branch) and the
SHORT_STRING case and rethrow a TrinoException(INVALID_CAST_ARGUMENT, "Cannot
cast input variant to DECIMAL(" + precision + "," + scale + ")") (or similar
message) so malformed decimal strings are mapped to the proper
INVALID_CAST_ARGUMENT path instead of leaking NumberFormatException.
- Around line 203-239: The PR misses support for TIME_NTZ_MICROS in string/JSON
casts: add a case for TIME_NTZ_MICROS in the PRIMITIVE branch of
asVarchar()/asVarchar-like logic in VariantUtil (the switch handling
variant.primitiveType()) to format TIME_NTZ_MICROS into a VARCHAR using the same
DateTimes formatting utilities used for other temporal types, and update
toJsonValue()/toJsonValue-like method (and the similar handling around the other
occurrence at ~1248-1249) to return a JSON string for TIME_NTZ_MICROS instead of
throwing; reference the primitive name TIME_NTZ_MICROS and the methods
DateTimes.format...(), asVarchar()/asVarchar-like switch, and
toJsonValue()/toJsonValue-like method to locate the edits.
- Around line 519-529: The time-of-day extraction uses the % operator which
yields negative remainders for negative timestamps; update the two branches in
VariantUtil (cases TIMESTAMP_UTC_MICROS/TIMESTAMP_NTZ_MICROS and
TIMESTAMP_UTC_NANOS/TIMESTAMP_NTZ_NANOS) to use Math.floorMod when computing
micros and nanos (replace "variant.getTimestampMicros() % MICROSECONDS_PER_DAY"
and "variant.getTimestampNanos() % NANOSECONDS_PER_DAY") and also use
Math.floorMod for the final wrap after round (replace "% PICOSECONDS_PER_DAY")
so timePicos and the rounded picoseconds always fall into [0,
PICOSECONDS_PER_DAY). Ensure you reference the local variables micros, nanos,
timePicos, round(...) and constants MICROSECONDS_PER_DAY, NANOSECONDS_PER_DAY,
PICOSECONDS_PER_DAY.
- Around line 166-197: The predicate VariantUtil.canCastFromVariant currently
omits JsonType, preventing specialization for VARIANT -> ARRAY<JSON>,
MAP<VARCHAR, JSON>, and JSON row fields even though createBlockBuilderAppender
provides a JsonBlockBuilderAppender; add JsonType to the initial instanceof
whitelist in canCastFromVariant so it returns true for JsonType (and thus allows
ArrayType/MapType/RowType paths involving JSON to pass). Update the list of
allowed types in canCastFromVariant to include JsonType, keeping the existing
ArrayType, MapType (with Varchar key check) and RowType handling unchanged.
In `@core/trino-main/src/test/java/io/trino/type/TestVariantOperators.java`:
- Around line 1163-1176: The test rebuilds shared metadata by calling
Variant.ofArray to create leafArray, which sorts the ("b","a") ids and prevents
exercising the same-size array-header remap; replace the Variant.ofArray usage
and instead construct the array payload without rebuilding metadata so the
unsorted Metadata instance (Metadata.of(List.of(utf8Slice("b"),
utf8Slice("a")))) is preserved—either by adding a small helper that encodes the
array payload bytes directly or by constructing the Variant array node using the
same low-level constructor used by createObjectWithSortedFields so leafArray
retains the unsorted metadata and the remap path in the test (symbols: Metadata,
Variant.ofArray, leafArray, createObjectWithSortedFields).
In `@core/trino-spi/src/main/java/io/trino/spi/block/VariantBlock.java`:
- Around line 55-66: The factory currently ignores child nulls when
isNullOptional is empty; update create(...) so when isNullOptional.isEmpty() you
assert the child blocks contain no nulls (e.g., call Block.mayHaveNull() and
fail or iterate/verify positions if mayHaveNull() is true) for both metadata and
values before returning createInternal; keep existing checks (checkArrayRange,
verifyPositionsAreNull) when isNullOptional.isPresent() and use
createInternal(0, positionCount, isNullOptional.orElse(null), metadata, values)
as before.
- Around line 239-253: copyWithAppendedNull() currently appends the null to the
underlying full metadata/values and keeps the original startOffset, so sliced
VariantBlock instances end up referencing an existing element instead of the
appended null; change copyWithAppendedNull() to first take the current logical
slice of metadata and values (e.g. use metadata.getRegion(startOffset,
positionCount) and values.getRegion(startOffset, positionCount) or their
equivalent) when startOffset != 0 or the slice is not the whole block, then call
copyWithAppendedNull() on those sliced blocks, reset the new startOffset to 0,
build a new isNull sized for positionCount+1 (copying only the logical slice),
and return the new VariantBlock(startOffset=0, positionCount+1, newIsNull,
newMetadata, newValues) so the appended null is at the new logical last
position.
In `@core/trino-spi/src/main/java/io/trino/spi/variant/Header.java`:
- Around line 202-207: PrimitiveType.fromHeader currently computes
primitiveTypeBits and indexes values()[primitiveTypeBits] which can throw
ArrayIndexOutOfBoundsException for malformed headers; update fromHeader(byte
header) to validate that primitiveTypeBits is within
0..(PrimitiveType.values().length-1) and, if out of range, throw a clear
IllegalArgumentException (or a custom CorruptHeaderException) that includes the
header byte and the computed primitiveTypeBits; keep the existing behavior for
valid values and return values()[primitiveTypeBits] when in-range.
In `@core/trino-spi/src/main/java/io/trino/spi/variant/Metadata.java`:
- Around line 277-291: The strict increasing check in validateFully() rejects
zero-length field names because consecutive offsets can be equal; update the
validation in the loop that uses readOffset(metadata, position, offsetSize) and
the checkArgument(i == 0 || value > previous) so it permits non-decreasing
offsets (use >= instead of >) while keeping the first-offset check (i != 0 ||
value == 0) and the final equality check against dictionaryLength intact; modify
the checkArgument call in validateFully() accordingly.
- Around line 103-165: Metadata.of(Collection<Slice>) currently allows duplicate
field names which breaks assumptions in id(Slice) and lookups; add a
duplicate-name guard early in Metadata.of (before computing
dictionarySize/dictionaryLength or before building offsets) by iterating
fieldNames and tracking seen names (e.g., a HashSet of Slice or
Slice.toString/byte content comparison) and if a duplicate is detected throw an
IllegalArgumentException with a clear message ("duplicate field name: " + name);
keep the rest of the method unchanged so offsets/dictionary building still use
the original iteration order and preserve use of VariantUtils.isSorted and
metadataHeader.
In `@core/trino-spi/src/main/java/io/trino/spi/variant/VariantFieldRemapper.java`:
- Around line 117-130: The SAME_SIZE optimization is wrongly enabled whenever
originalFieldIdEncodedWidth equals getOffsetSize(maxFieldId); update the
condition that sets remapMode = RemapMode.SAME_SIZE to require both widths be 1
byte to avoid skipping recalculation for nested objects (check
originalFieldIdEncodedWidth and getOffsetSize(maxFieldId) explicitly equal 1).
Locate the branch that assigns remapMode (the identity/else-if/else block around
originalFieldIdEncodedWidth, getOffsetSize, and RemapMode.SAME_SIZE) and change
the predicate so SAME_SIZE is only chosen when both sizes are 1; otherwise fall
through to RemapMode.RESIZE. Ensure references to originalFieldIdEncodedWidth,
getOffsetSize(maxFieldId), remapMode, and RemapMode.SAME_SIZE/RESIZE are
preserved.
In `@core/trino-spi/src/test/java/io/trino/spi/variant/TestVariant.java`:
- Around line 103-122: The test arrays in testByte (method testByte) contain a
duplicate -42 where one should be positive 42; update the byte array in testByte
(and analogous arrays in testShort, testInt, and testLong methods) to replace
the duplicated negative value with the corresponding positive value (e.g.,
change one -42 to 42) so both positive and negative cases are exercised; locate
the arrays used with Variant.ofByte / ofShort / ofInt / ofLong in those test
methods and modify the literal values accordingly.
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java`:
- Around line 254-268: The code directly casts block to VariantBlock which can
throw ClassCastException when block is wrapped (e.g., DictionaryBlock or
RunLengthEncodedBlock); before casting, unwrap to the underlying value block
(call block.getUnderlyingValueBlock()) and then cast that result to
VariantBlock, and continue using variantBlock.getRawMetadata(),
variantBlock.getRawValues(), and variantBlock.getRawOffset() as before so
wrapped encodings are handled safely.
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java`:
- Around line 51-52: The TODO comment in HiveSchemaUtil referencing "Support
variant type" is stale because the VARIANT case is already implemented (case
VARIANT -> "struct<metadata:binary,value:binary>"). Remove or update that TODO
so the code no longer claims VARIANT is unsupported; ensure the remaining
comment (if any) correctly reflects that VARIANT is handled in the VARIANT
branch of HiveSchemaUtil.
---
Outside diff comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java`:
- Around line 313-318: In IcebergAvroDataConversion, the VarbinaryType branch
incorrectly falls through after handling FIXED: inside the conditional that
checks icebergType.typeId().equals(FIXED) (within the VarbinaryType handling)
you must write the byte[] to the builder and then immediately return to avoid
the subsequent cast to ByteBuffer; update the method (the VarbinaryType handling
block in IcebergAvroDataConversion) to add a return after
VARBINARY.writeSlice(builder, Slices.wrappedBuffer((byte[]) object)) so FIXED
byte[] cases don't reach the wrappedHeapBuffer((ByteBuffer) object) call.
---
Nitpick comments:
In `@core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java`:
- Around line 644-695: The methods isRecursiveCastFromRow and
isRecursiveCastToRow repeatedly call metadata.getFunctions(...) and re-create
wrapper objects on each invocation; cache the filtered CAST functions (or their
Signatures) once and reuse them. Extract the CatalogFunctionMetadata lookup and
filtering logic out of both methods (e.g., run once in the SignatureBinder
constructor or via a lazily-initialized final field) to produce a List/Set of
CAST signatures that have exactly one type variable constraint (no long
constraints) and a row variadic bound (typeVariableConstraint.isRowType()); then
have isRecursiveCastFromRow and isRecursiveCastToRow iterate that cached
collection and only perform the lightweight checks (matching toType/fromType
against signature.getReturnType()/getArgumentTypes()). Ensure the cached
container uses the same identifying symbols: metadata.getFunctions(...),
mangleOperatorName(CAST), Signature, TypeVariableConstraint,
isRecursiveCastFromRow, isRecursiveCastToRow.
In
`@core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java`:
- Around line 92-100: The connectorSession parameter in VariantToRowCast.toRow
is declared but never used; either remove it from the signature (and update any
generated method handles/call sites) or explicitly mark it as intentionally
unused to avoid confusion and warnings. If the parameter must remain to satisfy
a method-handle/signature convention, rename it to unusedConnectorSession or add
a short comment and a `@SuppressWarnings`("unused") on the parameter (or method)
to document intent; otherwise remove connectorSession from the toRow method and
adjust callers/generation so the signature no longer includes it.
In `@core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java`:
- Around line 43-64: The finalize method currently mutates fieldIds before
calling each VariantWriter.PlannedValue.finalize, risking partial remapping if a
nested finalize throws; change finalize(IntUnaryOperator sortedFieldIdMapping)
to first compute remapped IDs into a local int[] remappedFieldIds (loop over
fieldIds and apply sortedFieldIdMapping), then run the loop that calls
plannedValue.finalize(sortedFieldIdMapping) and accumulates totalElementLength
without touching fieldIds, and only after all nested finalizations succeed
assign fieldIds = remappedFieldIds and set size = encodedObjectSize(...); keep
the existing guard on size so failed attempts leave the original state intact.
In `@core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java`:
- Around line 389-397: The nested builders are created without forwarding
expectedEntries, so change the two calls to
metadataBlockBuilder.newBlockBuilderLike(...) and
valuesBlockBuilder.newBlockBuilderLike(...) to pass expectedEntries as the first
argument (i.e., call newBlockBuilderLike(expectedEntries, blockBuilderStatus))
when constructing a new VariantBlockBuilder in
VariantBlockBuilder.newBlockBuilderLike; keep the VariantBlockBuilder
constructor usage and the new boolean[expectedEntries] as-is.
In `@core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java`:
- Around line 22-23: The sentinel DEFAULT_RETURN_VALUE = -1 conflicts with legal
stored values; update Int2IntOpenHashMap to guard against storing -1 by
validating in put(...) and putIfAbsent(...) (and any insert helpers) and throw
IllegalArgumentException if value == DEFAULT_RETURN_VALUE, and add a short
javadoc to get(), putIfAbsent(), DEFAULT_RETURN_VALUE stating callers must not
store -1 and to use containsKey() to disambiguate if they relied on the old
behavior; this makes get()/putIfAbsent() safe to keep returning
DEFAULT_RETURN_VALUE while preventing ambiguous stored values.
In `@core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java`:
- Around line 99-112: The readOffset method assumes a safe extra byte when
handling 3-byte offsets (case 3 -> data.getInt(offset) & 0xFFFFFF), which is a
subtle invariant that can lead to OOB reads if violated; update the code to
validate this invariant or enforce it in the encoding layer by (a) adding an
explicit bounds check/assert in readOffset (or a helper like
validateOffsetBytes) to ensure data.length - offset >= 4 before reading 4 bytes,
and/or (b) making the encoder guarantee a padding byte after every 3-byte offset
and documenting that in the encoder’s write method so callers of readOffset (and
the VariantUtils class) can rely on it. Ensure references to readOffset and any
encoder method (the encoder write method that serializes field IDs/offsets) are
used so reviewers can find and verify the added validation or encoding
guarantee.
In
`@core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java`:
- Around line 46-58: In testAppendPositionsUpdatesStatusWithOffsetSource, the
positions array passed to VariantBlockBuilder.appendPositions contains -1
sentinel values which are only there to pad entries outside the processed range;
update the test to either replace the -1s with valid indices or (preferable) add
a brief inline comment next to the positions array declaration explaining that
-1 is intentional padding and that appendPositions(source, positions, 1, length)
starts at offset 1 and processes only the three middle entries, so the -1s will
never be accessed; reference the test method name
testAppendPositionsUpdatesStatusWithOffsetSource and the appendPositions call
when making this change.
In `@lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java`:
- Around line 98-106: The reuse of readOffset (reassigned from positions-to-skip
to non-null child-count via presentStream.countBitsSet(readOffset)) is
confusing; change this by introducing a new variable (e.g., nonNullToSkip) to
hold presentStream.countBitsSet(readOffset) and pass that to
metadataReader.prepareNextRead(...) and valueReader.prepareNextRead(...), or at
minimum add a brief comment above the reassignment explaining that readOffset is
being converted from total positions to non-null child count before calling
metadataReader.prepareNextRead and valueReader.prepareNextRead; update
references to use the new variable where appropriate.
- Around line 132-136: The duplicated creation of an RLE null block for
metadataBlock and valueBlock should be extracted into a shared helper/constant
to avoid repetition; add a private static helper (e.g., createRleNullBlock(int
runLength) or a constant RLE_NULL_BLOCK factory) that builds the
VARBINARY.createBlockBuilder(...).appendNull().build() and wraps it with
RunLengthEncodedBlock.create(..., runLength), then replace both assignments to
metadataBlock and valueBlock in VariantColumnReader with calls to that
helper/constant.
In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java`:
- Around line 108-113: The two inline TODO comments inside the
LogicalTypeAnnotation.variantType(Header.VERSION) branch are stale—either
implement the missing validation/extraction or remove them; since
checkArgument(group.getFieldCount() == 2, ...) already enforces the expected
structure, remove the "todo is there validation here" comment and either remove
or implement the "todo should I extract the metadata and values fields here": if
additional checks are required, extract fields from group (use
group.getFields()/group.getField(int) and validate types) and pass the extracted
metadata/value to visitor.variant or validate before calling
visitor.variant(group), otherwise delete both TODO lines to leave a clear code
path around LogicalTypeAnnotation.variantType(Header.VERSION), group, and
visitor.variant.
In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java`:
- Around line 345-410: VariantDefLevelWriterProvider duplicates the
nullable-group batching from RowDefLevelWriterProvider and lacks a constructor
type-guard; refactor by extracting the null-run batching loop into a shared
helper (e.g., a private static method in DefLevelWriterProviders like
processNullableRuns or writeNullableRuns) that takes the Block, offset,
positionsCount, nestedWriter and encoder and returns ValuesCount, then have both
VariantDefLevelWriterProvider.getDefinitionLevelWriter and
RowDefLevelWriterProvider call that helper instead of duplicating logic; also
add the same fast-fail check in VariantDefLevelWriterProvider's constructor that
RowDefLevelWriterProvider has (validate block.getBlockKind() ==
Block.Kind.VARIANT or the project’s equivalent) and throw an
IllegalArgumentException if it doesn't match.
In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java`:
- Around line 153-209: VariantRepLevelWriterProvider and
RowRepLevelWriterProvider duplicate the same writeRepetitionLevels logic;
extract the shared logic into a single helper or abstract factory and have both
providers use it. Implement a private static factory (e.g.,
createNestedBlockWriter or an abstract base class) that accepts the Block, the
nested RepetitionLevelWriter (from nestedWriterOptional.orElseThrow()), and the
ColumnDescriptorValuesWriter encoder, and returns a RepetitionLevelWriter which
implements writeRepetitionLevels (both overloads) using the existing
null-handling + batching logic; then replace the anonymous RepetitionLevelWriter
creation in VariantRepLevelWriterProvider and RowRepLevelWriterProvider to call
that helper. Ensure you keep the checks (requireNonNull, checkArgument) in place
and preserve offset state inside the returned writer.
In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java`:
- Around line 5039-5060: The test method testNestedVariant should use a
TestTable in a try-with-resources block instead of manually creating and
dropping "test_nested_variant" so the table is always cleaned up on assertion
failure; replace the explicit assertUpdate("CREATE TABLE ...") and final
assertUpdate("DROP TABLE ...") with creating a TestTable (new
TestTable(this::queryRunner, "test_nested_variant", "variant_array
array(variant), variant_row row(v variant, i integer) WITH (FORMAT_VERSION =
3)")) in a try-with-resources and use table.getName() in the subsequent
INSERT/SELECT assertions so the resource is auto-closed and the table removed
even if assertions fail.
In
`@testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java`:
- Around line 337-343: In the VARIANT decoding branch in TestingTrinoClient (the
block checking if (type == VARIANT)), add explicit defensive checks before
casting: verify that value is a List, that the list size is exactly 2, and that
both elements are byte[] (use instanceof checks) so you can throw a clear
IllegalArgumentException with a descriptive message if any check fails; only
then call Variant.from(Metadata.from(Slices.wrappedBuffer((byte[]) first)),
Slices.wrappedBuffer((byte[]) second)). Ensure you reference the local variables
value, variantList, Variant.from, Metadata.from, and Slices.wrappedBuffer in
your checks and error messages.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e25debe9-7795-4cfd-b02a-0abe31fa4a4d
📒 Files selected for processing (91)
client/trino-client/src/main/java/io/trino/client/ClientStandardTypes.javaclient/trino-client/src/main/java/io/trino/client/JsonDecodingUtils.javacore/trino-main/src/main/java/io/trino/metadata/BlockEncodingManager.javacore/trino-main/src/main/java/io/trino/metadata/SignatureBinder.javacore/trino-main/src/main/java/io/trino/metadata/SystemFunctionBundle.javacore/trino-main/src/main/java/io/trino/metadata/TypeRegistry.javacore/trino-main/src/main/java/io/trino/operator/scalar/ArrayToVariantCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/JsonToRowCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/MapToVariantCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/RowToJsonCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/RowToVariantCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/VariantToArrayCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/VariantToMapCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/annotations/OperatorValidator.javacore/trino-main/src/main/java/io/trino/operator/scalar/timestamp/VarcharToTimestampCast.javacore/trino-main/src/main/java/io/trino/operator/scalar/timestamptz/VarcharToTimestampWithTimeZoneCast.javacore/trino-main/src/main/java/io/trino/server/protocol/JsonEncodingUtils.javacore/trino-main/src/main/java/io/trino/type/DecimalCasts.javacore/trino-main/src/main/java/io/trino/type/VariantFunctions.javacore/trino-main/src/main/java/io/trino/type/VariantOperators.javacore/trino-main/src/main/java/io/trino/util/variant/ArrayVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/JsonVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/MapVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/NullPlannedValue.javacore/trino-main/src/main/java/io/trino/util/variant/PlannedArrayValue.javacore/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.javacore/trino-main/src/main/java/io/trino/util/variant/PrimitiveArrayVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/PrimitiveMapVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/PrimitiveVariantEncoder.javacore/trino-main/src/main/java/io/trino/util/variant/RowVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/VariantCastException.javacore/trino-main/src/main/java/io/trino/util/variant/VariantUtil.javacore/trino-main/src/main/java/io/trino/util/variant/VariantVariantWriter.javacore/trino-main/src/main/java/io/trino/util/variant/VariantWriter.javacore/trino-main/src/test/java/io/trino/block/TestVariantBlock.javacore/trino-main/src/test/java/io/trino/metadata/TestSignatureBinder.javacore/trino-main/src/test/java/io/trino/type/AbstractTestType.javacore/trino-main/src/test/java/io/trino/type/TestRowOperators.javacore/trino-main/src/test/java/io/trino/type/TestVariantFunctions.javacore/trino-main/src/test/java/io/trino/type/TestVariantOperators.javacore/trino-main/src/test/java/io/trino/type/TestVariantType.javacore/trino-spi/pom.xmlcore/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockBuilder.javacore/trino-spi/src/main/java/io/trino/spi/block/VariantBlock.javacore/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.javacore/trino-spi/src/main/java/io/trino/spi/block/VariantBlockEncoding.javacore/trino-spi/src/main/java/io/trino/spi/type/StandardTypes.javacore/trino-spi/src/main/java/io/trino/spi/type/VariantType.javacore/trino-spi/src/main/java/io/trino/spi/variant/Header.javacore/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.javacore/trino-spi/src/main/java/io/trino/spi/variant/Metadata.javacore/trino-spi/src/main/java/io/trino/spi/variant/ObjectFieldIdValue.javacore/trino-spi/src/main/java/io/trino/spi/variant/Variant.javacore/trino-spi/src/main/java/io/trino/spi/variant/VariantDecoder.javacore/trino-spi/src/main/java/io/trino/spi/variant/VariantEncoder.javacore/trino-spi/src/main/java/io/trino/spi/variant/VariantFieldRemapper.javacore/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.javacore/trino-spi/src/main/java/module-info.javacore/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.javacore/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockEncoding.javacore/trino-spi/src/test/java/io/trino/spi/block/TestingBlockEncodingSerde.javacore/trino-spi/src/test/java/io/trino/spi/variant/TestMetadata.javacore/trino-spi/src/test/java/io/trino/spi/variant/TestVariant.javadocs/src/main/sphinx/connector/iceberg.mddocs/src/main/sphinx/functions.mddocs/src/main/sphinx/functions/variant.mddocs/src/main/sphinx/language/types.mdlib/trino-orc/src/main/java/io/trino/orc/reader/ColumnReaders.javalib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.javalib/trino-orc/src/main/java/io/trino/orc/writer/ColumnWriters.javalib/trino-orc/src/main/java/io/trino/orc/writer/VariantColumnWriter.javalib/trino-parquet/src/main/java/io/trino/parquet/ParquetTypeUtils.javalib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.javalib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.javalib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetWriters.javalib/trino-parquet/src/main/java/io/trino/parquet/writer/VariantColumnWriter.javalib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.javalib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergParquetColumnIOConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsWriter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcMetrics.javaplugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.javaplugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.javatesting/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java
💤 Files with no reviewable changes (3)
- core/trino-main/src/main/java/io/trino/operator/scalar/JsonToRowCast.java
- core/trino-main/src/main/java/io/trino/operator/scalar/RowToJsonCast.java
- core/trino-main/src/main/java/io/trino/operator/scalar/timestamp/VarcharToTimestampCast.java
7306fd5 to
18d8196
Compare
|
@coderabbitai check that this PR has sufficient test coverage, both for the variant code and for the Iceberg integration |
| /// This needs special handling because the cast is applied to each field of ROW individually. | ||
| private boolean isRecursiveCastFromRow(Type toType) | ||
| { | ||
| for (CatalogFunctionMetadata cast : metadata.getFunctions(null, new CatalogSchemaFunctionName(GlobalSystemConnector.NAME, BUILTIN_SCHEMA, mangleOperatorName(CAST)))) { |
There was a problem hiding this comment.
This can use
metadata.getBuiltinFunctions(mangleOperatorName(CAST))There was a problem hiding this comment.
Unfortunately, getBuiltInFunctions is not public
| /// This needs special handling because the cast is applied to each field of ROW individually. | ||
| private boolean isRecursiveCastToRow(Type fromType) | ||
| { | ||
| for (CatalogFunctionMetadata cast : metadata.getFunctions(null, new CatalogSchemaFunctionName(GlobalSystemConnector.NAME, BUILTIN_SCHEMA, mangleOperatorName(CAST)))) { |
There was a problem hiding this comment.
This can use
metadata.getBuiltinFunctions(mangleOperatorName(CAST))There was a problem hiding this comment.
Unfortunately, getBuiltInFunctions is not public
| if (parser.nextToken() != END_ARRAY) { | ||
| throw illegalToken(parser); | ||
| } | ||
| return List.of(metadata, value); |
There was a problem hiding this comment.
As a user, I expect the variant type to return a JSON-like result instead of a binary array. The metadata is an implementation detail, and I don’t see much benefit in exposing it.
trino> SELECT CAST('{"abc": 123}' AS variant);
_col0
----------------------------------------------------
[01 00 00, 31 7b 22 61 62 63 22 3a 20 31 32 33 7d]There was a problem hiding this comment.
I agree that we should return VARIANT as JSON text by default.
A client should be able to opt-in (client capabilities) into getting VARIANT binary data losslessly.
Trino CLI and JDBC can opt into this. JDBC can expose VARIANT via resultset.getobject.
There was a problem hiding this comment.
Please see the comment above from David. This was always intended as temporary code, but for now we're just going to use JSON and add the pure VARIANT version back later, with opt-in controls.
| "(NULL, NULL)"); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The test coverage in the Iceberg connector seems too limited. Please add more tests:
- All column types as value fields across the three file formats
- Product test to ensure that Spark can read tables using our variant type
There was a problem hiding this comment.
This is the base test class that is used to test all iceberg file formats, so all formats should be covered.
I also added a new test that has variants with every kind in it. I do not think this is necessary, as if you look at the code in Iceberg, it never actually inspects the variants. It's binary in and binary out.
I added a spark product test covering all variant types.
The variant implementation is the stack type for the variant type. This includes the necessary code to manipulate variants for the casts necessary in Trino
The existing Delta Lake variant to JSON mapping is untouched.
|
I talked to @martint and he has OKed merging this. |
|
When are we planning to release 481? |




This PR introduces support for the
VARIANTtype in Trino, based on theApache Iceberg Variant specification.
Key points:
VARIANTtype to the SPI and engineVARIANT, including arrays, rows, maps,and JSON
VARIANTVARIANTVARIANTfunctions, operators, casts, and Iceberg constraintsIceberg notes:
VARIANTis supported only for Iceberg tables using format version 3This PR does not modify existing Delta Lake variant-to-JSON behavior.
Release notes
(x) Release notes are required, with the following suggested text: