Reject varbinary partition columns in Delta Lake connector#29438
Open
1fanwang wants to merge 1 commit into
Open
Reject varbinary partition columns in Delta Lake connector#294381fanwang wants to merge 1 commit into
1fanwang wants to merge 1 commit into
Conversation
Creating a Delta Lake table with a varbinary partition column previously succeeded, then any subsequent INSERT failed deep in the write path with 'Unsupported type for partition: varbinary' from DeltaLakeWriteUtils. The Delta protocol's binary partition-value encoding is also not implemented on the read side, so Spark-written varbinary-partitioned tables cannot be read either. Match the existing array/map/row handling and reject varbinary partition columns up front in checkPartitionColumns. CREATE TABLE, CTAS, and the table-layout path all go through validateTableColumns, so the new check fires before any row is written. Update TestDeltaLakeConnectorTest.testCreateTableWithUnsupportedPartitionType and testCreateTableAsSelectWithUnsupportedPartitionType to assert the new error, drop the obsolete testInsertIntoUnsupportedVarbinaryPartitionType that expected the deep-write failure, and add a fast in-process variant to TestDeltaLakeBasic. Fixes trinodb#24155 Signed-off-by: 1fanwang <1fannnw@gmail.com>
ebyhr
reviewed
May 12, 2026
| "(x int, part varbinary) WITH (partitioned_by = ARRAY['part'])")) { | ||
| assertQueryFails("INSERT INTO " + table.getName() + " VALUES (1, X'01')", "Unsupported type for partition: varbinary"); | ||
| } | ||
| assertQueryFails( |
Member
There was a problem hiding this comment.
There is no need to add the same tests to both TestDeltaLakeBasic and TestDeltaLakeConnectorTest.
| if (columns.stream().filter(column -> partitionColumnNames.contains(column.getName())) | ||
| .anyMatch(column -> column.getType().equals(VARBINARY))) { | ||
| throw new TrinoException(DELTA_LAKE_INVALID_SCHEMA, "Using varbinary type on partitioned columns is unsupported"); | ||
| } |
Member
There was a problem hiding this comment.
Please merge this condition into L1746-1748.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #24155.
Creating a Delta Lake table with a
VARBINARYpartition column succeeded, but the table was unusable after that: everyINSERTblew up deep in the write path withUnsupported type for partition: varbinaryfromDeltaLakeWriteUtils.toPartitionValue, and even reading a Spark-written varbinary-partitioned table surfacedUnable to parse value [...] from column part with type varbinaryfromTransactionLogParser.deserializeColumnValue. The Delta protocol's binary partition-value encoding is unimplemented on both sides.DeltaLakeMetadata.validateTableColumnsalready rejectsarray,map, androwfor partition columns. Addingvarbinaryto the same check fires at CREATE / CTAS /getNewTableLayouttime, before the table is registered, so users get a clear error instead of a half-broken table. Hive'sisValidPartitionTypealready excludesvarbinaryfor the same reasons — this brings Delta Lake to parity, matching the comment in the file noting that the surrounding util was copied fromHiveWriteUtils.Tests
TestDeltaLakeBasic#testVarbinaryPartitionColumnRejected— in-process test, no Docker, asserts the validation message on CREATE.TestDeltaLakeConnectorTest#testCreateTableWithUnsupportedPartitionTypeandtestCreateTableAsSelectWithUnsupportedPartitionTypeextended to covervarbinaryalongside the existingarray/map/rowcases.testInsertIntoUnsupportedVarbinaryPartitionType(which carried aTODO: see #24155since the issue was filed) is removed — the CREATE-side reject makes it unreachable.