Skip to content

Reject varbinary partition columns in Delta Lake connector#29438

Open
1fanwang wants to merge 1 commit into
trinodb:masterfrom
1fanwang:fix/delta-lake-varbinary-partition
Open

Reject varbinary partition columns in Delta Lake connector#29438
1fanwang wants to merge 1 commit into
trinodb:masterfrom
1fanwang:fix/delta-lake-varbinary-partition

Conversation

@1fanwang
Copy link
Copy Markdown

Closes #24155.

Creating a Delta Lake table with a VARBINARY partition column succeeded, but the table was unusable after that: every INSERT blew up deep in the write path with Unsupported type for partition: varbinary from DeltaLakeWriteUtils.toPartitionValue, and even reading a Spark-written varbinary-partitioned table surfaced Unable to parse value [...] from column part with type varbinary from TransactionLogParser.deserializeColumnValue. The Delta protocol's binary partition-value encoding is unimplemented on both sides.

DeltaLakeMetadata.validateTableColumns already rejects array, map, and row for partition columns. Adding varbinary to the same check fires at CREATE / CTAS / getNewTableLayout time, before the table is registered, so users get a clear error instead of a half-broken table. Hive's isValidPartitionType already excludes varbinary for the same reasons — this brings Delta Lake to parity, matching the comment in the file noting that the surrounding util was copied from HiveWriteUtils.

Tests

  • New TestDeltaLakeBasic#testVarbinaryPartitionColumnRejected — in-process test, no Docker, asserts the validation message on CREATE.
  • Existing TestDeltaLakeConnectorTest#testCreateTableWithUnsupportedPartitionType and testCreateTableAsSelectWithUnsupportedPartitionType extended to cover varbinary alongside the existing array/map/row cases.
  • Existing testInsertIntoUnsupportedVarbinaryPartitionType (which carried a TODO: see #24155 since the issue was filed) is removed — the CREATE-side reject makes it unreachable.

Creating a Delta Lake table with a varbinary partition column previously
succeeded, then any subsequent INSERT failed deep in the write path with
'Unsupported type for partition: varbinary' from DeltaLakeWriteUtils.
The Delta protocol's binary partition-value encoding is also not
implemented on the read side, so Spark-written varbinary-partitioned
tables cannot be read either.

Match the existing array/map/row handling and reject varbinary partition
columns up front in checkPartitionColumns. CREATE TABLE, CTAS, and the
table-layout path all go through validateTableColumns, so the new check
fires before any row is written.

Update TestDeltaLakeConnectorTest.testCreateTableWithUnsupportedPartitionType
and testCreateTableAsSelectWithUnsupportedPartitionType to assert the
new error, drop the obsolete testInsertIntoUnsupportedVarbinaryPartitionType
that expected the deep-write failure, and add a fast in-process variant
to TestDeltaLakeBasic.

Fixes trinodb#24155

Signed-off-by: 1fanwang <1fannnw@gmail.com>
@github-actions github-actions Bot added delta-lake Delta Lake connector cla-signed labels May 12, 2026
"(x int, part varbinary) WITH (partitioned_by = ARRAY['part'])")) {
assertQueryFails("INSERT INTO " + table.getName() + " VALUES (1, X'01')", "Unsupported type for partition: varbinary");
}
assertQueryFails(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to add the same tests to both TestDeltaLakeBasic and TestDeltaLakeConnectorTest.

if (columns.stream().filter(column -> partitionColumnNames.contains(column.getName()))
.anyMatch(column -> column.getType().equals(VARBINARY))) {
throw new TrinoException(DELTA_LAKE_INVALID_SCHEMA, "Using varbinary type on partitioned columns is unsupported");
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge this condition into L1746-1748.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

Cannot insert varbinary values into partitioned columns in Delta Lake

2 participants