Skip to content

[Iceberg v3] VARIANT type#27753

Merged
dain merged 9 commits into
trinodb:masterfrom
dain:variant
Mar 30, 2026
Merged

[Iceberg v3] VARIANT type#27753
dain merged 9 commits into
trinodb:masterfrom
dain:variant

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Dec 24, 2025

This PR introduces support for the VARIANT type in Trino, based on the
Apache Iceberg Variant specification.

Key points:

  • Adds a new VARIANT type to the SPI and engine
  • Implements casts to and from VARIANT, including arrays, rows, maps,
    and JSON
  • Supports dereferencing using the SQL subscript operator
  • Adds client protocol support for VARIANT
  • Adds Iceberg integration for VARIANT
  • Documents VARIANT functions, operators, casts, and Iceberg constraints

Iceberg notes:

  • VARIANT is supported only for Iceberg tables using format version 3
  • Variant is currently transported to clients as variant spec binary format.
  • Variant is not integrated into clients, and clients will either error or render as binary.

This PR does not modify existing Delta Lake variant-to-JSON behavior.

Release notes

(x) Release notes are required, with the following suggested text:

# Iceberg
* Add experimental support for the `VARIANT` type for Iceberg tables using format version 3.

# SPI
* Add `VARIANT` type defined by the Iceberg specification.

@cla-bot cla-bot Bot added the cla-signed label Dec 24, 2025
@github-actions github-actions Bot added docs iceberg Iceberg connector labels Dec 24, 2025
@ebyhr ebyhr requested a review from martint December 24, 2025 05:53
@dain dain force-pushed the variant branch 3 times, most recently from 56b80e4 to 9336d65 Compare December 25, 2025 04:00
@dain dain marked this pull request as ready for review December 25, 2025 04:00
@dain dain requested a review from electrum December 25, 2025 04:01
@martint
Copy link
Copy Markdown
Member

martint commented Dec 25, 2025

Thanks for putting this up. We talked about it a few weeks ago over Slack, so I wanted to capture my thoughts here.

SQL 2023 defines a rich SQL/JSON data model and set of operations and semantics that's almost a 1-to-1 mapping with Iceberg Variant. Note that, importantly, SQL/JSON does not mandate a textual encoding, but only conversion functions to and from text that produce RFC 8259 JSON.

image (4) image (3) image (2) image (1)

Instead of adding a new data type, we should:

  • Change Trino's JSON data type to use a more efficient encoding internally (we can adopt Variant's encoding if that's suitable enough)
  • Add support for conversions to/from the additional data types (datetime types, etc)
  • Adjust the existing JSON functions (json_query, json_value, json_table, etc) to operate on the new encoding
  • Map Iceberg's and Delta Variant to Trino JSON. We already do that in Delta, but we'll need to adjust the implementation a bit to account for the change in format.

This gives us:

  • A single data type that has all the semantics we need. That means less confusion for users on when to use one vs the other.
  • Trino's JSON type, which predates SQL's standardization of the JSON type, up to compliance with the specification
  • All the functions and syntax that work for JSON will work out of the box, without having to define a parallel set of functions to operate on Variant.

@dain
Copy link
Copy Markdown
Member Author

dain commented Dec 27, 2025

@martint Thanks for writing this up — I agree that SQL:2023 defines a rich SQL/JSON data model, and at the abstract model level there is significant overlap with Iceberg Variant (null / scalar / array / object / sequence).

However, I think it is important to separate data model similarity from type semantics, interoperability guarantees, and physical representation.

This PR is intentionally focused on first-class Iceberg Variant support. That implies a number of constraints that are part of the Iceberg specification and cannot be relaxed or evolved independently:

  • A closed and spec-defined scalar set
  • A fixed binary encoding that must round-trip exactly
  • Stable semantics across engines for read/write interoperability

Concretely, Iceberg Variant has a number of limitations that are part of the contract:

  • Timestamps are limited to microsecond and nanosecond precision
  • There is no stored time zone information
  • There are no interval types
  • There are no IP address types
  • There are no geometry or geography types (including Iceberg’s own geo extensions)
  • Equality and comparison semantics are explicitly defined by the Variant spec and differ from SQL equality in important ways

Because these constraints are fundamental to Variant, equating it with SQL/JSON is dangerous in both directions:

  • SQL/JSON allows (and encourages) richer and potentially extensible scalar semantics that Variant cannot support without a spec revision
  • Treating JSON as backed by Variant would implicitly restrict JSON semantics and create surprising behavior or broken round-tripping guarantees

For these reasons, representing Variant as a distinct Trino type is deliberate. It makes the interoperability boundary explicit and avoids accidental promises about future extensibility or semantic compatibility.

It is also important to note that JSON is already a trivial cast away. This PR supports casts in and out of Variant, including JSON ↔ VARIANT, and users who want JSON semantics can get them immediately via explicit casts. The hard problem here is not JSON manipulation — it is correctness and fidelity of Variant itself.

More broadly, if Trino were to implement deeper SQL/JSON compliance, Variant is not a good internal encoding for that purpose. Variant values are expensive and difficult to construct:

  • Values must be sorted according to Variant rules
  • Sizes are encoded as prefixes, which prevents true streaming construction
  • Building a Variant often requires multiple passes over the data (this PR uses three passes)

These characteristics are acceptable and necessary for an interchange format, but they are not a good foundation for a general-purpose, extensible SQL/JSON implementation.

Refactoring Trino’s JSON type to adopt a new internal encoding and updating the full SQL/JSON function surface would be a valuable effort, but it is a large, orthogonal project with broad impact. This PR is intentionally scoped to deliver correct, interoperable Iceberg Variant support without blocking on that work.

In short, this PR is about delivering real Variant semantics with strong interoperability guarantees. SQL/JSON compliance and JSON internals can evolve independently without conflating the two or weakening either.

ebyhr
ebyhr previously requested changes Dec 27, 2025
@martint
Copy link
Copy Markdown
Member

martint commented Dec 29, 2025

This PR is intentionally focused on first-class Iceberg Variant support. That implies a number of constraints that are part of the Iceberg specification and cannot be relaxed or evolved independently:

In that case, we should keep the type and functions confined to the Iceberg plugin.

Concretely, Iceberg Variant has a number of limitations that are part of the contract:
Timestamps are limited to microsecond and nanosecond precision
There is no stored time zone information

That's fair. There would need to be some conversion/checking on write to ensure the contents have the expected precision that the connector needs. Alternatively, we could have an envelope with metadata about the contained data types so that the connector can decide whether to store as is or convert.

There are no interval types
There are no IP address types
There are no geometry or geography types (including Iceberg’s own geo extensions)

SQL/JSON doesn't have those either, so that's not a problem.

Equality and comparison semantics are explicitly defined by the Variant spec and differ from SQL equality in important ways

This could be a problem, yes. But in what ways? Can you elaborate?

SQL/JSON allows (and encourages) richer and potentially extensible scalar semantics that Variant cannot support without a spec revision

Not really. It doesn't encourage extensions -- that's a choice we would make.

It is also important to note that JSON is already a trivial cast away. This PR supports casts in and out of Variant, including JSON ↔ VARIANT, and users who want JSON semantics can get them immediately via explicit casts.

I could get behind that as long as we don't then go and try to replicate all the operations we have for JSON to work for variant. It's a slippery slope.

These characteristics are acceptable and necessary for an interchange format, but they are not a good foundation for a general-purpose, extensible SQL/JSON implementation.

Not sure I understand this. Variant is not just an interchange format. If that were the purpose, we'd might as well use varbinary instead.

Refactoring Trino’s JSON type to adopt a new internal encoding and updating the full SQL/JSON function surface would be a valuable effort, but it is a large, orthogonal project with broad impact.

There are only a handful of functions that expect JSON type at this point:

  • is_json_scalar, json_array_contains, json_array_length, which are pretty trivial
  • json_array_get, which is currently broken, anyway
  • json_extract, json_extract_scalar, json_format, json_parse, which could be implemented via an internal conversion to json first and then invoke the existing logic
  • casts, which you've already implemented

In short, this PR is about delivering real Variant semantics with strong interoperability guarantees. SQL/JSON compliance and JSON internals can evolve independently without conflating the two or weakening either.

Databases implement variant semantics in different ways, with no interoperable semantics. In particular databrick's, snowflake's and Iceberg variants differ from each other in important ways. Other databases use different types to achieve those semantics:

Database Variant / Equivalent
Snowflake VARIANT
SQL Server sql_variant
Redshift SUPER
Databricks VARIANT
Oracle ANYDATA, JSON
PostgreSQL JSON, JSONB
MySQL JSON
SQLite Dynamic typing
MongoDB BSON documents
BigQuery JSON, STRUCT

So, if by interoperable we mean interoperable with other engines that read/write Iceberg, this should by an Iceberg connector feature, not an engine-level feature.

@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 29, 2025

It is also important to note that JSON is already a trivial cast away. This PR supports casts in and out of Variant, including JSON ↔ VARIANT, and users who want JSON semantics can get them immediately via explicit casts.

I could get behind that as long as we don't then go and try to replicate all the operations we have for JSON to work for variant

As much as i don't like duplicated logic especially if it's complex, I also want Trino to be less performant than it could be (and less performant than other systems).

For example, having to convert whole variant document to json type in order to perform a path lookup would sound wasteful.
FWIW Snowflake has JSON functions, but doesn't have a JSON type. All the JSON functions operate on their VARIANT type, which is quite similar to (but not same as) Parquet/Iceberg VARIANT.

My point is not really to decide on the direction here. Rather, I only want us not to make promises that we may regret in the future, and to keep the door open for future improvements around Trino's JSON and VARIANT story.

@dain
Copy link
Copy Markdown
Member Author

dain commented Dec 29, 2025

This PR is intentionally focused on first-class Iceberg Variant support. That implies a number of constraints that are part of the Iceberg specification and cannot be relaxed or evolved independently:

In that case, we should keep the type and functions confined to the Iceberg plugin.

The type is required by DeltaLake. I also think Hive supports this. The problem is the stack type is a Java object and for these system to interop you need the type to be available to both and that means the SPI. This is a type that is defined by a specification that is well established. IMO we should just have this in the SPI, so any connector can use it. I expect that this is something we might want to use in the snowflake connector.

I don't want to get into a debate about types, but the IMO the isolation of types into plugins does not work, and I think we should back this feature out of Trino. Referencing a type by name and trying to do anything non trivial with the type is a total mess. The complexity isn't worth the effort to keep types out.

If you insist I'll do it because I want this feature in more than I want to debate this point, but I feel extremely strongly that this is the wrong direction (and I feel the same about the two other isolated types in trino).

Concretely, Iceberg Variant has a number of limitations that are part of the contract:
Timestamps are limited to microsecond and nanosecond precision
There is no stored time zone information

That's fair. There would need to be some conversion/checking on write to ensure the contents have the expected precision that the connector needs. Alternatively, we could have an envelope with metadata about the contained data types so that the connector can decide whether to store as is or convert.

My point is ther is no freedom. I think we are much better off having connectors expose their native type and then they can trivially be cast to our internal JSON to use the JSON syntax and functions (this would be super trivial if we landed the postgress style casts). So Postgres can expose raw jsonb, redshift can expose it custom thing, and so on. You can manipulate them directly without loosing any fidelity. If you want to move to a different type, you just cast.

IMO this is the best of both worlds. You can work directly with your native type, and then you can move to something else if you want a different feature or if you want to interoperate with a different system. JSON IMO isn't always the best.

There are no interval types
There are no IP address types
There are no geometry or geography types (including Iceberg’s own geo extensions)

SQL/JSON doesn't have those either, so that's not a problem.

This is bad in both directions. What if Iceberg adds interval (there is a proposal), then what do you do? Does SQL/JSON support UUID, because Iceberg Variant already has that. There will always be an impendency mismatch here.

Equality and comparison semantics are explicitly defined by the Variant spec and differ from SQL equality in important ways

This could be a problem, yes. But in what ways? Can you elaborate?

The Iceberg spec defines equivilence classes, which tell you how you are supposed to extract and compare types. One strange thing is that exact number, float, and decimal are separate types, and therefore not equal.

SQL/JSON allows (and encourages) richer and potentially extensible scalar semantics that Variant cannot support without a spec revision

Not really. It doesn't encourage extensions -- that's a choice we would make.

Again cuts both ways. The main point is we do not control the variant spec. Expanding it is bad, not fully support it is bad, not having choices in Trino would suck.

It is also important to note that JSON is already a trivial cast away. This PR supports casts in and out of Variant, including JSON ↔ VARIANT, and users who want JSON semantics can get them immediately via explicit casts.

I could get behind that as long as we don't then go and try to replicate all the operations we have for JSON to work for variant. It's a slippery slope.

I don't think we should be reimplementing the json operations. We have JSON to do that. I think the function set we already have (casts+dereference) is what we need.

These characteristics are acceptable and necessary for an interchange format, but they are not a good foundation for a general-purpose, extensible SQL/JSON implementation.

Not sure I understand this. Variant is not just an interchange format. If that were the purpose, we'd might as well use varbinary instead.

Variant is a good for a format for Iceberg and Delta to store data. It does not make a good format for in memory trino manipulation. If we want a good format for dynamic structures like this, I can suggest a better design.

@dain
Copy link
Copy Markdown
Member Author

dain commented Dec 30, 2025

@ebyhr @raunaqmorarka @findepi, I split out the code to enable v3 and the commit to reject tables/commands for unsupported v3 features into PR #27786. That PR is designed to hande and test all cases you have mentioned and any thing else I could think of. The variant branch is not based on that PR.

@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 30, 2025

I think we are much better off having connectors expose their native type and then they can trivially be cast to our internal JSON to use the JSON syntax and functions

That would work but have undesired effect that direct field lookup is significantly faster on a large variant value compared with equivalent json function call.

I envision a future where Trino JSON type can be simply backed by Iceberg VARIANT value directly, without any data conversion and json functions can operate on those documents. I acknowledge that Trino JSON may want to be more type-rich than Iceberg VARIANT, so that runtime representation doesn't have to be the only one allowed.

Why? For intersection of two important assumptions

  • Iceberg Variant is going to be popular source of unstructured data Trino will be processing in the future
    • it's not ubiquitous today, because it doesn't exist yet. But it's so cool that i bet it will be ubiquitous
  • user queries often involve more than single-level field lookup
    • in real-world queries dealing with Snowflake Variant values, complex lookups are not uncommon. We definitely won't want to express them as chained single-step lookups for efficiency reasons, which will naturally lead to more demand for JSON path-like traversal of variant values.

My point is not really to decide on the direction here. Rather, I only want us not to make promises that we may regret in the future, and to keep the door open for future improvements around Trino's JSON and VARIANT story.

@dain dain force-pushed the variant branch 2 times, most recently from 75a8a6b to ee3ffd3 Compare January 3, 2026 08:57
@dain
Copy link
Copy Markdown
Member Author

dain commented Jan 3, 2026

I updated this PR with support for Iceberg Variant with ORC and AVRO formats, so all file formats are now supported

@dain dain changed the title Add VARIANT type with Iceberg Variant support (Parquet-only) Add VARIANT type with Iceberg Variant support Jan 9, 2026
@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 21, 2026

@CodeRabbit full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

This pull request implements comprehensive support for the VARIANT data type in Trino according to the Apache Iceberg specification. It introduces the core VARIANT type with block encoding, operators, and cast functions; defines variant encoding/decoding utilities; adds ORC and Parquet read/write support; integrates with the Iceberg plugin; and includes client-side protocol support. The implementation spans SPI layer (VariantType, VariantBlock), core Trino server functions, connector-specific readers/writers, and documentation updates.

Assessment against linked issues

Objective Addressed Explanation
Implement VARIANT type in Trino type system [#24538]
Support VARIANT in Iceberg plugin (type conversion, schema mapping) [#24538]
Implement VARIANT operators and casts (SUBSCRIPT, CAST to/from scalar/container types) [#24538]
Add ORC/Parquet reader/writer support for VARIANT columns [#24538]
Implement variant encoding/decoding and metadata dictionary utilities [#24538]

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Client-side VariantDecoder implementation (client/trino-client/src/main/java/io/trino/client/JsonDecodingUtils.java) Adds client protocol support for VARIANT, which extends beyond server-side Iceberg variant support. This appears to enable JSON wire protocol handling for VARIANT values in client communications rather than direct Iceberg functionality.
Visibility changes for timestamp cast methods (core/trino-main/src/main/java/io/trino/operator/scalar/timestamp/VarcharToTimestampCast.java, VarcharToTimestampWithTimeZoneCast.java) These visibility changes appear to support VARIANT casting to/from timestamp types but do not directly implement Iceberg variant support. The connection is indirect through the broader VARIANT feature scope. While related to VARIANT implementation, these changes support conversion utilities rather than Iceberg-specific requirements.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java (1)

313-318: ⚠️ Potential issue | 🔴 Critical

Missing return after handling FIXED type causes fall-through to heap buffer wrapping.

When icebergType.typeId().equals(FIXED) is true, the method writes the byte array but then falls through to line 317 which attempts to cast object to ByteBuffer. This will throw a ClassCastException since object is a byte[] for FIXED types.

🐛 Proposed fix
         if (type instanceof VarbinaryType) {
             if (icebergType.typeId().equals(FIXED)) {
                 VARBINARY.writeSlice(builder, Slices.wrappedBuffer((byte[]) object));
+                return;
             }
             VARBINARY.writeSlice(builder, wrappedHeapBuffer((ByteBuffer) object));
             return;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java`
around lines 313 - 318, In IcebergAvroDataConversion, the VarbinaryType branch
incorrectly falls through after handling FIXED: inside the conditional that
checks icebergType.typeId().equals(FIXED) (within the VarbinaryType handling)
you must write the byte[] to the builder and then immediately return to avoid
the subsequent cast to ByteBuffer; update the method (the VarbinaryType handling
block in IcebergAvroDataConversion) to add a return after
VARBINARY.writeSlice(builder, Slices.wrappedBuffer((byte[]) object)) so FIXED
byte[] cases don't reach the wrappedHeapBuffer((ByteBuffer) object) call.
🧹 Nitpick comments (14)
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java (1)

5039-5060: Use TestTable (try-with-resources) to avoid cleanup leaks on assertion failure.

If an assertion fails before Line 5059, test_nested_variant may remain and cascade into follow-up failures.

♻️ Suggested cleanup hardening
 `@Test`
 public void testNestedVariant()
 {
     // Tests variant nested inside array and row types
-    assertUpdate("CREATE TABLE test_nested_variant (" +
-            "  variant_array array(variant), " +
-            "  variant_row row(v variant, i integer)) " +
-            "WITH (FORMAT_VERSION = 3)");
-
-    assertUpdate("INSERT INTO test_nested_variant VALUES (" +
-            "ARRAY[CAST(1 AS VARIANT), CAST('hello' AS VARIANT), CAST(NULL AS VARIANT)], " +
-            "CAST(ROW(42, 123) AS ROW(v variant, i integer)))", 1);
-    assertUpdate("INSERT INTO test_nested_variant VALUES (NULL, NULL)", 1);
-
-    assertThat(query("SELECT * FROM test_nested_variant"))
+    try (TestTable table = newTrinoTable(
+            "test_nested_variant",
+            "(variant_array array(variant), variant_row row(v variant, i integer)) WITH (FORMAT_VERSION = 3)")) {
+        assertUpdate("INSERT INTO " + table.getName() + " VALUES (" +
+                "ARRAY[CAST(1 AS VARIANT), CAST('hello' AS VARIANT), CAST(NULL AS VARIANT)], " +
+                "CAST(ROW(42, 123) AS ROW(v variant, i integer)))", 1);
+        assertUpdate("INSERT INTO " + table.getName() + " VALUES (NULL, NULL)", 1);
+
+        assertThat(query("SELECT * FROM " + table.getName()))
                 .matches("VALUES (" +
                         "ARRAY[CAST(1 AS VARIANT), CAST('hello' AS VARIANT), CAST(NULL AS VARIANT)], " +
                         "CAST(ROW(42, 123) AS ROW(v variant, i integer))), " +
                         "(NULL, NULL)");
-
-    assertUpdate("DROP TABLE test_nested_variant");
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java`
around lines 5039 - 5060, The test method testNestedVariant should use a
TestTable in a try-with-resources block instead of manually creating and
dropping "test_nested_variant" so the table is always cleaned up on assertion
failure; replace the explicit assertUpdate("CREATE TABLE ...") and final
assertUpdate("DROP TABLE ...") with creating a TestTable (new
TestTable(this::queryRunner, "test_nested_variant", "variant_array
array(variant), variant_row row(v variant, i integer) WITH (FORMAT_VERSION =
3)")) in a try-with-resources and use table.getName() in the subsequent
INSERT/SELECT assertions so the resource is auto-closed and the table removed
even if assertions fail.
core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java (1)

22-23: Clarify or guard DEFAULT_RETURN_VALUE sentinel ambiguity.

get()/putIfAbsent() return -1 for “not found”, but -1 is also a legal stored value. Consider either rejecting -1 values or documenting that callers must pair reads with containsKey() to disambiguate.

Optional guard-based fix
 public int putIfAbsent(final int k, final int v)
 {
+    if (v == DEFAULT_RETURN_VALUE) {
+        throw new IllegalArgumentException("Value -1 is reserved as DEFAULT_RETURN_VALUE");
+    }
     final int pos = find(k);
     if (pos >= 0) {
         return value[pos];
     }
     insert(-pos - 1, k, v);
     return DEFAULT_RETURN_VALUE;
 }

Also applies to: 97-105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java`
around lines 22 - 23, The sentinel DEFAULT_RETURN_VALUE = -1 conflicts with
legal stored values; update Int2IntOpenHashMap to guard against storing -1 by
validating in put(...) and putIfAbsent(...) (and any insert helpers) and throw
IllegalArgumentException if value == DEFAULT_RETURN_VALUE, and add a short
javadoc to get(), putIfAbsent(), DEFAULT_RETURN_VALUE stating callers must not
store -1 and to use containsKey() to disambiguate if they relied on the old
behavior; this makes get()/putIfAbsent() safe to keep returning
DEFAULT_RETURN_VALUE while preventing ambiguous stored values.
testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java (1)

337-343: Add defensive payload type checks in VARIANT decoding.

The conversion works for valid data, but explicit checks will fail faster with clearer errors than ClassCastException if payload shape changes.

♻️ Suggested hardening diff
         if (type == VARIANT) {
-            List<?> variantList = (List<?>) value;
+            checkArgument(value instanceof List<?>, "Expected variant value to be a list");
+            List<?> variantList = (List<?>) value;
             checkArgument(variantList.size() == 2, "Expected variant value to be a list of size 2");
+            Object metadataObject = variantList.getFirst();
+            Object dataObject = variantList.getLast();
+            checkArgument(metadataObject instanceof byte[], "Expected variant metadata to be binary");
+            checkArgument(dataObject instanceof byte[], "Expected variant value to be binary");
             return Variant.from(
-                    Metadata.from(Slices.wrappedBuffer((byte[]) variantList.getFirst())),
-                    Slices.wrappedBuffer((byte[]) variantList.getLast()));
+                    Metadata.from(Slices.wrappedBuffer((byte[]) metadataObject)),
+                    Slices.wrappedBuffer((byte[]) dataObject));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java`
around lines 337 - 343, In the VARIANT decoding branch in TestingTrinoClient
(the block checking if (type == VARIANT)), add explicit defensive checks before
casting: verify that value is a List, that the list size is exactly 2, and that
both elements are byte[] (use instanceof checks) so you can throw a clear
IllegalArgumentException with a descriptive message if any check fails; only
then call Variant.from(Metadata.from(Slices.wrappedBuffer((byte[]) first)),
Slices.wrappedBuffer((byte[]) second)). Ensure you reference the local variables
value, variantList, Variant.from, Metadata.from, and Slices.wrappedBuffer in
your checks and error messages.
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java (1)

108-113: Address or remove TODO comments before merging.

Lines 109-110 contain development notes that should be resolved:

  • "todo is there validation here" — The checkArgument on line 111 validates field count, but the comment suggests uncertainty about completeness
  • "todo should I extract the metadata and values fields here" — Clarify whether additional field extraction is needed

If no additional validation or extraction is required, remove the TODOs to avoid confusion.

🧹 Suggested cleanup
         if (LogicalTypeAnnotation.variantType(Header.VERSION).equals(annotation)) {
-            // todo is there validation here
-            // todo should I extract the metadata and values fields here
             checkArgument(group.getFieldCount() == 2, "Invalid variant: expected 2 fields (metadata, value): %s", group);
             return visitor.variant(group);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java`
around lines 108 - 113, The two inline TODO comments inside the
LogicalTypeAnnotation.variantType(Header.VERSION) branch are stale—either
implement the missing validation/extraction or remove them; since
checkArgument(group.getFieldCount() == 2, ...) already enforces the expected
structure, remove the "todo is there validation here" comment and either remove
or implement the "todo should I extract the metadata and values fields here": if
additional checks are required, extract fields from group (use
group.getFields()/group.getField(int) and validate types) and pass the extracted
metadata/value to visitor.variant or validate before calling
visitor.variant(group), otherwise delete both TODO lines to leave a clear code
path around LogicalTypeAnnotation.variantType(Header.VERSION), group, and
visitor.variant.
lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java (1)

153-209: Consider extracting shared logic between VariantRepLevelWriterProvider and RowRepLevelWriterProvider.

The writeRepetitionLevels implementation is nearly identical to RowRepLevelWriterProvider (lines 106-150). Both follow the same null-handling pattern: write parentLevel directly for nulls, batch consecutive non-nulls to the nested writer.

A shared base class or a parameterized helper could reduce duplication.

♻️ Potential refactor approach
// Extract a common abstract class or helper method:
private static RepetitionLevelWriter createNestedBlockWriter(
        Block block,
        RepetitionLevelWriter nestedWriter,
        ColumnDescriptorValuesWriter encoder) {
    return new RepetitionLevelWriter() {
        private int offset;
        
        `@Override`
        public void writeRepetitionLevels(int parentLevel) {
            writeRepetitionLevels(parentLevel, block.getPositionCount());
        }
        
        `@Override`
        public void writeRepetitionLevels(int parentLevel, int positionsCount) {
            // shared implementation
        }
    };
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java`
around lines 153 - 209, VariantRepLevelWriterProvider and
RowRepLevelWriterProvider duplicate the same writeRepetitionLevels logic;
extract the shared logic into a single helper or abstract factory and have both
providers use it. Implement a private static factory (e.g.,
createNestedBlockWriter or an abstract base class) that accepts the Block, the
nested RepetitionLevelWriter (from nestedWriterOptional.orElseThrow()), and the
ColumnDescriptorValuesWriter encoder, and returns a RepetitionLevelWriter which
implements writeRepetitionLevels (both overloads) using the existing
null-handling + batching logic; then replace the anonymous RepetitionLevelWriter
creation in VariantRepLevelWriterProvider and RowRepLevelWriterProvider to call
that helper. Ensure you keep the checks (requireNonNull, checkArgument) in place
and preserve offset state inside the returned writer.
core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java (1)

46-58: Clarify the -1 sentinel values in the positions array.

The positions array {-1, 3, 0, 2, -1} contains -1 values at indices 0 and 4. While the method call appendPositions(source, positions, 1, length) skips index 0 (starts at offset 1) and only processes length=3 positions, having -1 as array elements is unusual and potentially confusing.

If this is intentional test padding to verify offset handling, consider adding a comment explaining the intent. If positions are meant to be valid indices, -1 would cause an ArrayIndexOutOfBoundsException if accidentally accessed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java`
around lines 46 - 58, In testAppendPositionsUpdatesStatusWithOffsetSource, the
positions array passed to VariantBlockBuilder.appendPositions contains -1
sentinel values which are only there to pad entries outside the processed range;
update the test to either replace the -1s with valid indices or (preferable) add
a brief inline comment next to the positions array declaration explaining that
-1 is intentional padding and that appendPositions(source, positions, 1, length)
starts at offset 1 and processes only the three middle entries, so the -1s will
never be accessed; reference the test method name
testAppendPositionsUpdatesStatusWithOffsetSource and the appendPositions call
when making this change.
core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java (1)

43-64: Potential partial state corruption on exception during finalize().

If an exception occurs during the nested plannedValue.finalize() calls (lines 58-60), the fieldIds array may be partially remapped while size remains -1. A subsequent call to finalize() would pass the guard check but re-apply sortedFieldIdMapping to already-remapped IDs, producing incorrect results.

Consider either:

  1. Setting a separate finalized boolean flag at the start
  2. Computing remapped IDs into a local array first, then assigning after all nested finalizations succeed
♻️ Suggested approach using a flag
     private int size = -1;
+    private boolean finalizing;

     `@Override`
     public void finalize(IntUnaryOperator sortedFieldIdMapping)
     {
         if (size >= 0) {
             throw new IllegalStateException("finalize() already called");
         }
+        if (finalizing) {
+            throw new IllegalStateException("finalize() already in progress");
+        }
+        finalizing = true;

         int maxFieldId = -1;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java`
around lines 43 - 64, The finalize method currently mutates fieldIds before
calling each VariantWriter.PlannedValue.finalize, risking partial remapping if a
nested finalize throws; change finalize(IntUnaryOperator sortedFieldIdMapping)
to first compute remapped IDs into a local int[] remappedFieldIds (loop over
fieldIds and apply sortedFieldIdMapping), then run the loop that calls
plannedValue.finalize(sortedFieldIdMapping) and accumulates totalElementLength
without touching fieldIds, and only after all nested finalizations succeed
assign fieldIds = remappedFieldIds and set size = encodedObjectSize(...); keep
the existing guard on size so failed attempts leave the original state intact.
core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java (1)

92-100: Unused connectorSession parameter.

The connectorSession parameter is declared but not used in the toRow method. If it's required for future use or to match a method handle signature convention, consider adding a comment. Otherwise, this creates unnecessary parameter passing overhead.

Is ConnectorSession commonly passed to Trino cast methods for future extensibility even when unused?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java`
around lines 92 - 100, The connectorSession parameter in VariantToRowCast.toRow
is declared but never used; either remove it from the signature (and update any
generated method handles/call sites) or explicitly mark it as intentionally
unused to avoid confusion and warnings. If the parameter must remain to satisfy
a method-handle/signature convention, rename it to unusedConnectorSession or add
a short comment and a `@SuppressWarnings`("unused") on the parameter (or method)
to document intent; otherwise remove connectorSession from the toRow method and
adjust callers/generation so the signature no longer includes it.
core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java (1)

644-695: Cache the filtered set of recursive row cast functions or extract the CAST function lookup outside the loop.

Both isRecursiveCastFromRow and isRecursiveCastToRow iterate through all registered CAST functions via metadata.getFunctions(...) on each invocation. While the underlying function metadata is cached in immutable maps, new CatalogFunctionMetadata wrapper objects are created on each call (see MetadataManager.getBuiltinFunctions lines 2716-2718). Since these methods are called from canCast() during type constraint solving, they may be invoked multiple times per query, creating unnecessary wrapper objects and redundant iterations.

Consider:

  1. Pre-computing and caching the set of CAST functions with row type constraints during initialization
  2. Storing them as instance fields in SignatureBinder to avoid repeated lookups
  3. At minimum, cache the result within the constraint-solving phase to avoid re-filtering the same functions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java` around
lines 644 - 695, The methods isRecursiveCastFromRow and isRecursiveCastToRow
repeatedly call metadata.getFunctions(...) and re-create wrapper objects on each
invocation; cache the filtered CAST functions (or their Signatures) once and
reuse them. Extract the CatalogFunctionMetadata lookup and filtering logic out
of both methods (e.g., run once in the SignatureBinder constructor or via a
lazily-initialized final field) to produce a List/Set of CAST signatures that
have exactly one type variable constraint (no long constraints) and a row
variadic bound (typeVariableConstraint.isRowType()); then have
isRecursiveCastFromRow and isRecursiveCastToRow iterate that cached collection
and only perform the lightweight checks (matching toType/fromType against
signature.getReturnType()/getArgumentTypes()). Ensure the cached container uses
the same identifying symbols: metadata.getFunctions(...),
mangleOperatorName(CAST), Signature, TypeVariableConstraint,
isRecursiveCastFromRow, isRecursiveCastToRow.
lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java (1)

345-410: Factor duplicated nullable-group def-level logic and add a type guard.

VariantDefLevelWriterProvider currently duplicates the same null-run batching logic used by RowDefLevelWriterProvider, which increases divergence risk. Also, unlike the row provider constructor, it does not fail fast on a wrong block kind.

Proposed hardening change
 VariantDefLevelWriterProvider(Block block, int maxDefinitionLevel)
 {
     this.block = requireNonNull(block, "block is null");
     this.maxDefinitionLevel = maxDefinitionLevel;
+    checkArgument(block.getUnderlyingValueBlock() instanceof VariantBlock, "block is not a variant block");
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java`
around lines 345 - 410, VariantDefLevelWriterProvider duplicates the
nullable-group batching from RowDefLevelWriterProvider and lacks a constructor
type-guard; refactor by extracting the null-run batching loop into a shared
helper (e.g., a private static method in DefLevelWriterProviders like
processNullableRuns or writeNullableRuns) that takes the Block, offset,
positionsCount, nestedWriter and encoder and returns ValuesCount, then have both
VariantDefLevelWriterProvider.getDefinitionLevelWriter and
RowDefLevelWriterProvider call that helper instead of duplicating logic; also
add the same fast-fail check in VariantDefLevelWriterProvider's constructor that
RowDefLevelWriterProvider has (validate block.getBlockKind() ==
Block.Kind.VARIANT or the project’s equivalent) and throw an
IllegalArgumentException if it doesn't match.
lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java (2)

98-106: Clarify variable reuse for readOffset.

The variable readOffset is reassigned to presentStream.countBitsSet(readOffset) on line 102, which can be confusing since it changes the meaning from "positions to skip" to "non-null values to skip in children". This works correctly but could benefit from a clarifying comment or using a separate variable.

♻️ Optional: Use a separate variable for clarity
         if (readOffset > 0) {
             if (presentStream != null) {
                 // skip ahead the present bit reader, but count the set bits
                 // and use this as the skip size for the field readers
-                readOffset = presentStream.countBitsSet(readOffset);
+                int nonNullsToSkip = presentStream.countBitsSet(readOffset);
+                metadataReader.prepareNextRead(nonNullsToSkip);
+                valueReader.prepareNextRead(nonNullsToSkip);
+            }
+            else {
+                metadataReader.prepareNextRead(readOffset);
+                valueReader.prepareNextRead(readOffset);
             }
-            metadataReader.prepareNextRead(readOffset);
-            valueReader.prepareNextRead(readOffset);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java`
around lines 98 - 106, The reuse of readOffset (reassigned from
positions-to-skip to non-null child-count via
presentStream.countBitsSet(readOffset)) is confusing; change this by introducing
a new variable (e.g., nonNullToSkip) to hold
presentStream.countBitsSet(readOffset) and pass that to
metadataReader.prepareNextRead(...) and valueReader.prepareNextRead(...), or at
minimum add a brief comment above the reassignment explaining that readOffset is
being converted from total positions to non-null child count before calling
metadataReader.prepareNextRead and valueReader.prepareNextRead; update
references to use the new variable where appropriate.

132-136: Minor: Extract duplicate RLE null block creation.

The same pattern for creating an RLE null block is duplicated for both metadataBlock and valueBlock.

♻️ Optional: Extract to a helper or constant
+            private static final Block NULL_VARBINARY_BLOCK = VARBINARY.createBlockBuilder(null, 1).appendNull().build();
+
             else {
                 // All values are null
-                metadataBlock = RunLengthEncodedBlock.create(VARBINARY.createBlockBuilder(null, 0).appendNull().build(), nextBatchSize);
-                valueBlock = RunLengthEncodedBlock.create(VARBINARY.createBlockBuilder(null, 0).appendNull().build(), nextBatchSize);
+                metadataBlock = RunLengthEncodedBlock.create(NULL_VARBINARY_BLOCK, nextBatchSize);
+                valueBlock = RunLengthEncodedBlock.create(NULL_VARBINARY_BLOCK, nextBatchSize);
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java`
around lines 132 - 136, The duplicated creation of an RLE null block for
metadataBlock and valueBlock should be extracted into a shared helper/constant
to avoid repetition; add a private static helper (e.g., createRleNullBlock(int
runLength) or a constant RLE_NULL_BLOCK factory) that builds the
VARBINARY.createBlockBuilder(...).appendNull().build() and wraps it with
RunLengthEncodedBlock.create(..., runLength), then replace both assignments to
metadataBlock and valueBlock in VariantColumnReader with calls to that
helper/constant.
core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java (1)

389-397: Minor: expectedEntries parameter not passed to nested builder factories.

The newBlockBuilderLike method creates nested builders using newBlockBuilderLike(blockBuilderStatus) without passing expectedEntries. The nested builders will use their default sizing.

♻️ Optional: Pass expectedEntries to nested builders
     `@Override`
     public BlockBuilder newBlockBuilderLike(int expectedEntries, BlockBuilderStatus blockBuilderStatus)
     {
         return new VariantBlockBuilder(
                 blockBuilderStatus,
-                (VariableWidthBlockBuilder) metadataBlockBuilder.newBlockBuilderLike(blockBuilderStatus),
-                (VariableWidthBlockBuilder) valuesBlockBuilder.newBlockBuilderLike(blockBuilderStatus),
+                (VariableWidthBlockBuilder) metadataBlockBuilder.newBlockBuilderLike(expectedEntries, blockBuilderStatus),
+                (VariableWidthBlockBuilder) valuesBlockBuilder.newBlockBuilderLike(expectedEntries, blockBuilderStatus),
                 new boolean[expectedEntries]);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java`
around lines 389 - 397, The nested builders are created without forwarding
expectedEntries, so change the two calls to
metadataBlockBuilder.newBlockBuilderLike(...) and
valuesBlockBuilder.newBlockBuilderLike(...) to pass expectedEntries as the first
argument (i.e., call newBlockBuilderLike(expectedEntries, blockBuilderStatus))
when constructing a new VariantBlockBuilder in
VariantBlockBuilder.newBlockBuilderLike; keep the VariantBlockBuilder
constructor usage and the new boolean[expectedEntries] as-is.
core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java (1)

99-112: Document the invariant for 3-byte offset reads more prominently.

The comment on lines 104-108 explains the assumption that reading 4 bytes is safe for 3-byte offsets, but this invariant is subtle. If future code changes violate this assumption, it could cause out-of-bounds reads.

Consider adding a validation or making the invariant more explicit in the encoding layer that guarantees this property.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java` around
lines 99 - 112, The readOffset method assumes a safe extra byte when handling
3-byte offsets (case 3 -> data.getInt(offset) & 0xFFFFFF), which is a subtle
invariant that can lead to OOB reads if violated; update the code to validate
this invariant or enforce it in the encoding layer by (a) adding an explicit
bounds check/assert in readOffset (or a helper like validateOffsetBytes) to
ensure data.length - offset >= 4 before reading 4 bytes, and/or (b) making the
encoder guarantee a padding byte after every 3-byte offset and documenting that
in the encoder’s write method so callers of readOffset (and the VariantUtils
class) can rely on it. Ensure references to readOffset and any encoder method
(the encoder write method that serializes field IDs/offsets) are used so
reviewers can find and verify the added validation or encoding guarantee.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/trino-main/src/main/java/io/trino/type/VariantOperators.java`:
- Around line 42-68: The dereference operators currently call
Variant#getArrayLength() and Variant#getObjectField() directly and will let
wrong-kind cases throw IllegalStateException; update both dereference overloads
to validate the Variant kind first and throw a
TrinoException(INVALID_FUNCTION_ARGUMENT, ...) with a clear user-facing message
when the value is not the expected container: for the long-index overload,
verify the Variant is an array (e.g., Variant.isArray() or equivalent) before
calling getArrayLength() and then run checkArrayIndex; for the fieldName
overload, verify the Variant is an object (e.g., Variant.isObject() or
equivalent) before calling getObjectField() and return null or the field value
as before.

In `@core/trino-main/src/main/java/io/trino/util/variant/VariantUtil.java`:
- Around line 447-475: The STRING and SHORT_STRING branches in asJavaDecimal
currently call new BigDecimal(...) which can throw NumberFormatException and
bypass the intended INVALID_CAST_ARGUMENT handling; wrap the
string-to-BigDecimal conversion inside a try/catch for NumberFormatException in
both the STRING case (inside the PRIMITIVE -> STRING branch) and the
SHORT_STRING case and rethrow a TrinoException(INVALID_CAST_ARGUMENT, "Cannot
cast input variant to DECIMAL(" + precision + "," + scale + ")") (or similar
message) so malformed decimal strings are mapped to the proper
INVALID_CAST_ARGUMENT path instead of leaking NumberFormatException.
- Around line 203-239: The PR misses support for TIME_NTZ_MICROS in string/JSON
casts: add a case for TIME_NTZ_MICROS in the PRIMITIVE branch of
asVarchar()/asVarchar-like logic in VariantUtil (the switch handling
variant.primitiveType()) to format TIME_NTZ_MICROS into a VARCHAR using the same
DateTimes formatting utilities used for other temporal types, and update
toJsonValue()/toJsonValue-like method (and the similar handling around the other
occurrence at ~1248-1249) to return a JSON string for TIME_NTZ_MICROS instead of
throwing; reference the primitive name TIME_NTZ_MICROS and the methods
DateTimes.format...(), asVarchar()/asVarchar-like switch, and
toJsonValue()/toJsonValue-like method to locate the edits.
- Around line 519-529: The time-of-day extraction uses the % operator which
yields negative remainders for negative timestamps; update the two branches in
VariantUtil (cases TIMESTAMP_UTC_MICROS/TIMESTAMP_NTZ_MICROS and
TIMESTAMP_UTC_NANOS/TIMESTAMP_NTZ_NANOS) to use Math.floorMod when computing
micros and nanos (replace "variant.getTimestampMicros() % MICROSECONDS_PER_DAY"
and "variant.getTimestampNanos() % NANOSECONDS_PER_DAY") and also use
Math.floorMod for the final wrap after round (replace "% PICOSECONDS_PER_DAY")
so timePicos and the rounded picoseconds always fall into [0,
PICOSECONDS_PER_DAY). Ensure you reference the local variables micros, nanos,
timePicos, round(...) and constants MICROSECONDS_PER_DAY, NANOSECONDS_PER_DAY,
PICOSECONDS_PER_DAY.
- Around line 166-197: The predicate VariantUtil.canCastFromVariant currently
omits JsonType, preventing specialization for VARIANT -> ARRAY<JSON>,
MAP<VARCHAR, JSON>, and JSON row fields even though createBlockBuilderAppender
provides a JsonBlockBuilderAppender; add JsonType to the initial instanceof
whitelist in canCastFromVariant so it returns true for JsonType (and thus allows
ArrayType/MapType/RowType paths involving JSON to pass). Update the list of
allowed types in canCastFromVariant to include JsonType, keeping the existing
ArrayType, MapType (with Varchar key check) and RowType handling unchanged.

In `@core/trino-main/src/test/java/io/trino/type/TestVariantOperators.java`:
- Around line 1163-1176: The test rebuilds shared metadata by calling
Variant.ofArray to create leafArray, which sorts the ("b","a") ids and prevents
exercising the same-size array-header remap; replace the Variant.ofArray usage
and instead construct the array payload without rebuilding metadata so the
unsorted Metadata instance (Metadata.of(List.of(utf8Slice("b"),
utf8Slice("a")))) is preserved—either by adding a small helper that encodes the
array payload bytes directly or by constructing the Variant array node using the
same low-level constructor used by createObjectWithSortedFields so leafArray
retains the unsorted metadata and the remap path in the test (symbols: Metadata,
Variant.ofArray, leafArray, createObjectWithSortedFields).

In `@core/trino-spi/src/main/java/io/trino/spi/block/VariantBlock.java`:
- Around line 55-66: The factory currently ignores child nulls when
isNullOptional is empty; update create(...) so when isNullOptional.isEmpty() you
assert the child blocks contain no nulls (e.g., call Block.mayHaveNull() and
fail or iterate/verify positions if mayHaveNull() is true) for both metadata and
values before returning createInternal; keep existing checks (checkArrayRange,
verifyPositionsAreNull) when isNullOptional.isPresent() and use
createInternal(0, positionCount, isNullOptional.orElse(null), metadata, values)
as before.
- Around line 239-253: copyWithAppendedNull() currently appends the null to the
underlying full metadata/values and keeps the original startOffset, so sliced
VariantBlock instances end up referencing an existing element instead of the
appended null; change copyWithAppendedNull() to first take the current logical
slice of metadata and values (e.g. use metadata.getRegion(startOffset,
positionCount) and values.getRegion(startOffset, positionCount) or their
equivalent) when startOffset != 0 or the slice is not the whole block, then call
copyWithAppendedNull() on those sliced blocks, reset the new startOffset to 0,
build a new isNull sized for positionCount+1 (copying only the logical slice),
and return the new VariantBlock(startOffset=0, positionCount+1, newIsNull,
newMetadata, newValues) so the appended null is at the new logical last
position.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/Header.java`:
- Around line 202-207: PrimitiveType.fromHeader currently computes
primitiveTypeBits and indexes values()[primitiveTypeBits] which can throw
ArrayIndexOutOfBoundsException for malformed headers; update fromHeader(byte
header) to validate that primitiveTypeBits is within
0..(PrimitiveType.values().length-1) and, if out of range, throw a clear
IllegalArgumentException (or a custom CorruptHeaderException) that includes the
header byte and the computed primitiveTypeBits; keep the existing behavior for
valid values and return values()[primitiveTypeBits] when in-range.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/Metadata.java`:
- Around line 277-291: The strict increasing check in validateFully() rejects
zero-length field names because consecutive offsets can be equal; update the
validation in the loop that uses readOffset(metadata, position, offsetSize) and
the checkArgument(i == 0 || value > previous) so it permits non-decreasing
offsets (use >= instead of >) while keeping the first-offset check (i != 0 ||
value == 0) and the final equality check against dictionaryLength intact; modify
the checkArgument call in validateFully() accordingly.
- Around line 103-165: Metadata.of(Collection<Slice>) currently allows duplicate
field names which breaks assumptions in id(Slice) and lookups; add a
duplicate-name guard early in Metadata.of (before computing
dictionarySize/dictionaryLength or before building offsets) by iterating
fieldNames and tracking seen names (e.g., a HashSet of Slice or
Slice.toString/byte content comparison) and if a duplicate is detected throw an
IllegalArgumentException with a clear message ("duplicate field name: " + name);
keep the rest of the method unchanged so offsets/dictionary building still use
the original iteration order and preserve use of VariantUtils.isSorted and
metadataHeader.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/VariantFieldRemapper.java`:
- Around line 117-130: The SAME_SIZE optimization is wrongly enabled whenever
originalFieldIdEncodedWidth equals getOffsetSize(maxFieldId); update the
condition that sets remapMode = RemapMode.SAME_SIZE to require both widths be 1
byte to avoid skipping recalculation for nested objects (check
originalFieldIdEncodedWidth and getOffsetSize(maxFieldId) explicitly equal 1).
Locate the branch that assigns remapMode (the identity/else-if/else block around
originalFieldIdEncodedWidth, getOffsetSize, and RemapMode.SAME_SIZE) and change
the predicate so SAME_SIZE is only chosen when both sizes are 1; otherwise fall
through to RemapMode.RESIZE. Ensure references to originalFieldIdEncodedWidth,
getOffsetSize(maxFieldId), remapMode, and RemapMode.SAME_SIZE/RESIZE are
preserved.

In `@core/trino-spi/src/test/java/io/trino/spi/variant/TestVariant.java`:
- Around line 103-122: The test arrays in testByte (method testByte) contain a
duplicate -42 where one should be positive 42; update the byte array in testByte
(and analogous arrays in testShort, testInt, and testLong methods) to replace
the duplicated negative value with the corresponding positive value (e.g.,
change one -42 to 42) so both positive and negative cases are exercised; locate
the arrays used with Variant.ofByte / ofShort / ofInt / ofLong in those test
methods and modify the literal values accordingly.

In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java`:
- Around line 254-268: The code directly casts block to VariantBlock which can
throw ClassCastException when block is wrapped (e.g., DictionaryBlock or
RunLengthEncodedBlock); before casting, unwrap to the underlying value block
(call block.getUnderlyingValueBlock()) and then cast that result to
VariantBlock, and continue using variantBlock.getRawMetadata(),
variantBlock.getRawValues(), and variantBlock.getRawOffset() as before so
wrapped encodings are handled safely.

In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java`:
- Around line 51-52: The TODO comment in HiveSchemaUtil referencing "Support
variant type" is stale because the VARIANT case is already implemented (case
VARIANT -> "struct<metadata:binary,value:binary>"). Remove or update that TODO
so the code no longer claims VARIANT is unsupported; ensure the remaining
comment (if any) correctly reflects that VARIANT is handled in the VARIANT
branch of HiveSchemaUtil.

---

Outside diff comments:
In
`@plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java`:
- Around line 313-318: In IcebergAvroDataConversion, the VarbinaryType branch
incorrectly falls through after handling FIXED: inside the conditional that
checks icebergType.typeId().equals(FIXED) (within the VarbinaryType handling)
you must write the byte[] to the builder and then immediately return to avoid
the subsequent cast to ByteBuffer; update the method (the VarbinaryType handling
block in IcebergAvroDataConversion) to add a return after
VARBINARY.writeSlice(builder, Slices.wrappedBuffer((byte[]) object)) so FIXED
byte[] cases don't reach the wrappedHeapBuffer((ByteBuffer) object) call.

---

Nitpick comments:
In `@core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java`:
- Around line 644-695: The methods isRecursiveCastFromRow and
isRecursiveCastToRow repeatedly call metadata.getFunctions(...) and re-create
wrapper objects on each invocation; cache the filtered CAST functions (or their
Signatures) once and reuse them. Extract the CatalogFunctionMetadata lookup and
filtering logic out of both methods (e.g., run once in the SignatureBinder
constructor or via a lazily-initialized final field) to produce a List/Set of
CAST signatures that have exactly one type variable constraint (no long
constraints) and a row variadic bound (typeVariableConstraint.isRowType()); then
have isRecursiveCastFromRow and isRecursiveCastToRow iterate that cached
collection and only perform the lightweight checks (matching toType/fromType
against signature.getReturnType()/getArgumentTypes()). Ensure the cached
container uses the same identifying symbols: metadata.getFunctions(...),
mangleOperatorName(CAST), Signature, TypeVariableConstraint,
isRecursiveCastFromRow, isRecursiveCastToRow.

In
`@core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java`:
- Around line 92-100: The connectorSession parameter in VariantToRowCast.toRow
is declared but never used; either remove it from the signature (and update any
generated method handles/call sites) or explicitly mark it as intentionally
unused to avoid confusion and warnings. If the parameter must remain to satisfy
a method-handle/signature convention, rename it to unusedConnectorSession or add
a short comment and a `@SuppressWarnings`("unused") on the parameter (or method)
to document intent; otherwise remove connectorSession from the toRow method and
adjust callers/generation so the signature no longer includes it.

In `@core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java`:
- Around line 43-64: The finalize method currently mutates fieldIds before
calling each VariantWriter.PlannedValue.finalize, risking partial remapping if a
nested finalize throws; change finalize(IntUnaryOperator sortedFieldIdMapping)
to first compute remapped IDs into a local int[] remappedFieldIds (loop over
fieldIds and apply sortedFieldIdMapping), then run the loop that calls
plannedValue.finalize(sortedFieldIdMapping) and accumulates totalElementLength
without touching fieldIds, and only after all nested finalizations succeed
assign fieldIds = remappedFieldIds and set size = encodedObjectSize(...); keep
the existing guard on size so failed attempts leave the original state intact.

In `@core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java`:
- Around line 389-397: The nested builders are created without forwarding
expectedEntries, so change the two calls to
metadataBlockBuilder.newBlockBuilderLike(...) and
valuesBlockBuilder.newBlockBuilderLike(...) to pass expectedEntries as the first
argument (i.e., call newBlockBuilderLike(expectedEntries, blockBuilderStatus))
when constructing a new VariantBlockBuilder in
VariantBlockBuilder.newBlockBuilderLike; keep the VariantBlockBuilder
constructor usage and the new boolean[expectedEntries] as-is.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java`:
- Around line 22-23: The sentinel DEFAULT_RETURN_VALUE = -1 conflicts with legal
stored values; update Int2IntOpenHashMap to guard against storing -1 by
validating in put(...) and putIfAbsent(...) (and any insert helpers) and throw
IllegalArgumentException if value == DEFAULT_RETURN_VALUE, and add a short
javadoc to get(), putIfAbsent(), DEFAULT_RETURN_VALUE stating callers must not
store -1 and to use containsKey() to disambiguate if they relied on the old
behavior; this makes get()/putIfAbsent() safe to keep returning
DEFAULT_RETURN_VALUE while preventing ambiguous stored values.

In `@core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java`:
- Around line 99-112: The readOffset method assumes a safe extra byte when
handling 3-byte offsets (case 3 -> data.getInt(offset) & 0xFFFFFF), which is a
subtle invariant that can lead to OOB reads if violated; update the code to
validate this invariant or enforce it in the encoding layer by (a) adding an
explicit bounds check/assert in readOffset (or a helper like
validateOffsetBytes) to ensure data.length - offset >= 4 before reading 4 bytes,
and/or (b) making the encoder guarantee a padding byte after every 3-byte offset
and documenting that in the encoder’s write method so callers of readOffset (and
the VariantUtils class) can rely on it. Ensure references to readOffset and any
encoder method (the encoder write method that serializes field IDs/offsets) are
used so reviewers can find and verify the added validation or encoding
guarantee.

In
`@core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java`:
- Around line 46-58: In testAppendPositionsUpdatesStatusWithOffsetSource, the
positions array passed to VariantBlockBuilder.appendPositions contains -1
sentinel values which are only there to pad entries outside the processed range;
update the test to either replace the -1s with valid indices or (preferable) add
a brief inline comment next to the positions array declaration explaining that
-1 is intentional padding and that appendPositions(source, positions, 1, length)
starts at offset 1 and processes only the three middle entries, so the -1s will
never be accessed; reference the test method name
testAppendPositionsUpdatesStatusWithOffsetSource and the appendPositions call
when making this change.

In `@lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java`:
- Around line 98-106: The reuse of readOffset (reassigned from positions-to-skip
to non-null child-count via presentStream.countBitsSet(readOffset)) is
confusing; change this by introducing a new variable (e.g., nonNullToSkip) to
hold presentStream.countBitsSet(readOffset) and pass that to
metadataReader.prepareNextRead(...) and valueReader.prepareNextRead(...), or at
minimum add a brief comment above the reassignment explaining that readOffset is
being converted from total positions to non-null child count before calling
metadataReader.prepareNextRead and valueReader.prepareNextRead; update
references to use the new variable where appropriate.
- Around line 132-136: The duplicated creation of an RLE null block for
metadataBlock and valueBlock should be extracted into a shared helper/constant
to avoid repetition; add a private static helper (e.g., createRleNullBlock(int
runLength) or a constant RLE_NULL_BLOCK factory) that builds the
VARBINARY.createBlockBuilder(...).appendNull().build() and wraps it with
RunLengthEncodedBlock.create(..., runLength), then replace both assignments to
metadataBlock and valueBlock in VariantColumnReader with calls to that
helper/constant.

In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java`:
- Around line 108-113: The two inline TODO comments inside the
LogicalTypeAnnotation.variantType(Header.VERSION) branch are stale—either
implement the missing validation/extraction or remove them; since
checkArgument(group.getFieldCount() == 2, ...) already enforces the expected
structure, remove the "todo is there validation here" comment and either remove
or implement the "todo should I extract the metadata and values fields here": if
additional checks are required, extract fields from group (use
group.getFields()/group.getField(int) and validate types) and pass the extracted
metadata/value to visitor.variant or validate before calling
visitor.variant(group), otherwise delete both TODO lines to leave a clear code
path around LogicalTypeAnnotation.variantType(Header.VERSION), group, and
visitor.variant.

In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java`:
- Around line 345-410: VariantDefLevelWriterProvider duplicates the
nullable-group batching from RowDefLevelWriterProvider and lacks a constructor
type-guard; refactor by extracting the null-run batching loop into a shared
helper (e.g., a private static method in DefLevelWriterProviders like
processNullableRuns or writeNullableRuns) that takes the Block, offset,
positionsCount, nestedWriter and encoder and returns ValuesCount, then have both
VariantDefLevelWriterProvider.getDefinitionLevelWriter and
RowDefLevelWriterProvider call that helper instead of duplicating logic; also
add the same fast-fail check in VariantDefLevelWriterProvider's constructor that
RowDefLevelWriterProvider has (validate block.getBlockKind() ==
Block.Kind.VARIANT or the project’s equivalent) and throw an
IllegalArgumentException if it doesn't match.

In
`@lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java`:
- Around line 153-209: VariantRepLevelWriterProvider and
RowRepLevelWriterProvider duplicate the same writeRepetitionLevels logic;
extract the shared logic into a single helper or abstract factory and have both
providers use it. Implement a private static factory (e.g.,
createNestedBlockWriter or an abstract base class) that accepts the Block, the
nested RepetitionLevelWriter (from nestedWriterOptional.orElseThrow()), and the
ColumnDescriptorValuesWriter encoder, and returns a RepetitionLevelWriter which
implements writeRepetitionLevels (both overloads) using the existing
null-handling + batching logic; then replace the anonymous RepetitionLevelWriter
creation in VariantRepLevelWriterProvider and RowRepLevelWriterProvider to call
that helper. Ensure you keep the checks (requireNonNull, checkArgument) in place
and preserve offset state inside the returned writer.

In
`@plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java`:
- Around line 5039-5060: The test method testNestedVariant should use a
TestTable in a try-with-resources block instead of manually creating and
dropping "test_nested_variant" so the table is always cleaned up on assertion
failure; replace the explicit assertUpdate("CREATE TABLE ...") and final
assertUpdate("DROP TABLE ...") with creating a TestTable (new
TestTable(this::queryRunner, "test_nested_variant", "variant_array
array(variant), variant_row row(v variant, i integer) WITH (FORMAT_VERSION =
3)")) in a try-with-resources and use table.getName() in the subsequent
INSERT/SELECT assertions so the resource is auto-closed and the table removed
even if assertions fail.

In
`@testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java`:
- Around line 337-343: In the VARIANT decoding branch in TestingTrinoClient (the
block checking if (type == VARIANT)), add explicit defensive checks before
casting: verify that value is a List, that the list size is exactly 2, and that
both elements are byte[] (use instanceof checks) so you can throw a clear
IllegalArgumentException with a descriptive message if any check fails; only
then call Variant.from(Metadata.from(Slices.wrappedBuffer((byte[]) first)),
Slices.wrappedBuffer((byte[]) second)). Ensure you reference the local variables
value, variantList, Variant.from, Metadata.from, and Slices.wrappedBuffer in
your checks and error messages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e25debe9-7795-4cfd-b02a-0abe31fa4a4d

📥 Commits

Reviewing files that changed from the base of the PR and between 40f81aa and 0bcbd0d.

📒 Files selected for processing (91)
  • client/trino-client/src/main/java/io/trino/client/ClientStandardTypes.java
  • client/trino-client/src/main/java/io/trino/client/JsonDecodingUtils.java
  • core/trino-main/src/main/java/io/trino/metadata/BlockEncodingManager.java
  • core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java
  • core/trino-main/src/main/java/io/trino/metadata/SystemFunctionBundle.java
  • core/trino-main/src/main/java/io/trino/metadata/TypeRegistry.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/ArrayToVariantCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/JsonToRowCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/MapToVariantCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/RowToJsonCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/RowToVariantCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/VariantToArrayCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/VariantToMapCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/VariantToRowCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/annotations/OperatorValidator.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/timestamp/VarcharToTimestampCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/timestamptz/VarcharToTimestampWithTimeZoneCast.java
  • core/trino-main/src/main/java/io/trino/server/protocol/JsonEncodingUtils.java
  • core/trino-main/src/main/java/io/trino/type/DecimalCasts.java
  • core/trino-main/src/main/java/io/trino/type/VariantFunctions.java
  • core/trino-main/src/main/java/io/trino/type/VariantOperators.java
  • core/trino-main/src/main/java/io/trino/util/variant/ArrayVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/JsonVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/MapVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/NullPlannedValue.java
  • core/trino-main/src/main/java/io/trino/util/variant/PlannedArrayValue.java
  • core/trino-main/src/main/java/io/trino/util/variant/PlannedObjectValue.java
  • core/trino-main/src/main/java/io/trino/util/variant/PrimitiveArrayVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/PrimitiveMapVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/PrimitiveVariantEncoder.java
  • core/trino-main/src/main/java/io/trino/util/variant/RowVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/VariantCastException.java
  • core/trino-main/src/main/java/io/trino/util/variant/VariantUtil.java
  • core/trino-main/src/main/java/io/trino/util/variant/VariantVariantWriter.java
  • core/trino-main/src/main/java/io/trino/util/variant/VariantWriter.java
  • core/trino-main/src/test/java/io/trino/block/TestVariantBlock.java
  • core/trino-main/src/test/java/io/trino/metadata/TestSignatureBinder.java
  • core/trino-main/src/test/java/io/trino/type/AbstractTestType.java
  • core/trino-main/src/test/java/io/trino/type/TestRowOperators.java
  • core/trino-main/src/test/java/io/trino/type/TestVariantFunctions.java
  • core/trino-main/src/test/java/io/trino/type/TestVariantOperators.java
  • core/trino-main/src/test/java/io/trino/type/TestVariantType.java
  • core/trino-spi/pom.xml
  • core/trino-spi/src/main/java/io/trino/spi/block/VariableWidthBlockBuilder.java
  • core/trino-spi/src/main/java/io/trino/spi/block/VariantBlock.java
  • core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockBuilder.java
  • core/trino-spi/src/main/java/io/trino/spi/block/VariantBlockEncoding.java
  • core/trino-spi/src/main/java/io/trino/spi/type/StandardTypes.java
  • core/trino-spi/src/main/java/io/trino/spi/type/VariantType.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/Header.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/Int2IntOpenHashMap.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/Metadata.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/ObjectFieldIdValue.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/Variant.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/VariantDecoder.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/VariantEncoder.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/VariantFieldRemapper.java
  • core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java
  • core/trino-spi/src/main/java/module-info.java
  • core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockBuilder.java
  • core/trino-spi/src/test/java/io/trino/spi/block/TestVariantBlockEncoding.java
  • core/trino-spi/src/test/java/io/trino/spi/block/TestingBlockEncodingSerde.java
  • core/trino-spi/src/test/java/io/trino/spi/variant/TestMetadata.java
  • core/trino-spi/src/test/java/io/trino/spi/variant/TestVariant.java
  • docs/src/main/sphinx/connector/iceberg.md
  • docs/src/main/sphinx/functions.md
  • docs/src/main/sphinx/functions/variant.md
  • docs/src/main/sphinx/language/types.md
  • lib/trino-orc/src/main/java/io/trino/orc/reader/ColumnReaders.java
  • lib/trino-orc/src/main/java/io/trino/orc/reader/VariantColumnReader.java
  • lib/trino-orc/src/main/java/io/trino/orc/writer/ColumnWriters.java
  • lib/trino-orc/src/main/java/io/trino/orc/writer/VariantColumnWriter.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/ParquetTypeUtils.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetTypeVisitor.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetWriters.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/writer/VariantColumnWriter.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/DefLevelWriterProviders.java
  • lib/trino-parquet/src/main/java/io/trino/parquet/writer/repdef/RepLevelWriterProviders.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ExpressionConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergAvroDataConversion.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergParquetColumnIOConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsWriter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TypeConverter.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/HiveSchemaUtil.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcMetrics.java
  • plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/util/OrcTypeConverter.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
  • testing/trino-testing/src/main/java/io/trino/testing/TestingTrinoClient.java
💤 Files with no reviewable changes (3)
  • core/trino-main/src/main/java/io/trino/operator/scalar/JsonToRowCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/RowToJsonCast.java
  • core/trino-main/src/main/java/io/trino/operator/scalar/timestamp/VarcharToTimestampCast.java

Comment thread core/trino-main/src/main/java/io/trino/type/VariantOperators.java
Comment thread core/trino-spi/src/main/java/io/trino/spi/variant/Metadata.java
Comment thread core/trino-spi/src/test/java/io/trino/spi/variant/TestVariant.java
@dain dain force-pushed the variant branch 2 times, most recently from 7306fd5 to 18d8196 Compare March 22, 2026 20:59
@electrum
Copy link
Copy Markdown
Member

@coderabbitai check that this PR has sufficient test coverage, both for the variant code and for the Iceberg integration

/// This needs special handling because the cast is applied to each field of ROW individually.
private boolean isRecursiveCastFromRow(Type toType)
{
for (CatalogFunctionMetadata cast : metadata.getFunctions(null, new CatalogSchemaFunctionName(GlobalSystemConnector.NAME, BUILTIN_SCHEMA, mangleOperatorName(CAST)))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use

metadata.getBuiltinFunctions(mangleOperatorName(CAST))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, getBuiltInFunctions is not public

/// This needs special handling because the cast is applied to each field of ROW individually.
private boolean isRecursiveCastToRow(Type fromType)
{
for (CatalogFunctionMetadata cast : metadata.getFunctions(null, new CatalogSchemaFunctionName(GlobalSystemConnector.NAME, BUILTIN_SCHEMA, mangleOperatorName(CAST)))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use

metadata.getBuiltinFunctions(mangleOperatorName(CAST))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, getBuiltInFunctions is not public

Comment thread core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java Outdated
Comment thread core/trino-main/src/main/java/io/trino/metadata/SignatureBinder.java Outdated
Comment thread core/trino-spi/src/main/java/io/trino/spi/variant/VariantUtils.java Outdated
Comment thread core/trino-spi/src/main/java/io/trino/spi/variant/VariantFieldRemapper.java Outdated
Comment thread core/trino-spi/src/main/java/io/trino/spi/variant/VariantFieldRemapper.java Outdated
Comment thread core/trino-spi/pom.xml
Comment thread core/trino-spi/pom.xml
if (parser.nextToken() != END_ARRAY) {
throw illegalToken(parser);
}
return List.of(metadata, value);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, I expect the variant type to return a JSON-like result instead of a binary array. The metadata is an implementation detail, and I don’t see much benefit in exposing it.

trino> SELECT CAST('{"abc": 123}' AS variant);
                       _col0
----------------------------------------------------
 [01 00 00, 31 7b 22 61 62 63 22 3a 20 31 32 33 7d]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should return VARIANT as JSON text by default.
A client should be able to opt-in (client capabilities) into getting VARIANT binary data losslessly.
Trino CLI and JDBC can opt into this. JDBC can expose VARIANT via resultset.getobject.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment above from David. This was always intended as temporary code, but for now we're just going to use JSON and add the pure VARIANT version back later, with opt-in controls.

Comment thread core/trino-spi/src/main/java/io/trino/spi/type/VariantType.java Outdated
"(NULL, NULL)");
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage in the Iceberg connector seems too limited. Please add more tests:

  • All column types as value fields across the three file formats
  • Product test to ensure that Spark can read tables using our variant type

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the base test class that is used to test all iceberg file formats, so all formats should be covered.

I also added a new test that has variants with every kind in it. I do not think this is necessary, as if you look at the code in Iceberg, it never actually inspects the variants. It's binary in and binary out.

I added a spark product test covering all variant types.

@trinodb trinodb deleted a comment from coderabbitai Bot Mar 28, 2026
@dain
Copy link
Copy Markdown
Member Author

dain commented Mar 30, 2026

I talked to @martint and he has OKed merging this.

@dain dain merged commit e791b2c into trinodb:master Mar 30, 2026
106 checks passed
@dain dain deleted the variant branch March 30, 2026 17:42
@github-actions github-actions Bot added this to the 481 milestone Mar 30, 2026
@ebyhr ebyhr mentioned this pull request Mar 31, 2026
@soumilshah1995
Copy link
Copy Markdown

When are we planning to release 481?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed docs iceberg Iceberg connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. syntax-needs-review

Development

Successfully merging this pull request may close these issues.

Add support for variant type in Iceberg

7 participants