Fix duplicate field error when partition column already exists in schema#587
Fix duplicate field error when partition column already exists in schema#587simbadzina wants to merge 2 commits into
Conversation
477c6b6 to
614ad37
Compare
|
This shouldn't be needed because partition columns should not exist inside the schema to begin with. Instead of here, it should be fixed on the caller side: if the schema contains the columns, no partition columns should be passed into this method in the first place |
The two callers of addPartitionColsToSchema are both inside SchemaUtilities (getCasePreservedSchemaForTable:79 and the merge branch of getAvroSchemaForTable:118-124) Deduplication in here fit the contract "ensure partition columns are present on the schema". |
2411854 to
dc328ec
Compare
dc328ec to
1807f65
Compare
…y exists in schema When a Hive view projects a partition column as a regular field in its schema, addPartitionColsToSchema() would attempt to add it again, causing AvroRuntimeException: "Duplicate field X in record". The fix skips partition columns that already exist in the schema by name.
1807f65 to
5b343b8
Compare
Summary
addPartitionColsToSchema()blindly appends partition columns without checking if they already exist in the schema, causingAvroRuntimeException: Duplicate field X in recordaddPartitionColsToSchema()tries to add it againSetand skips partition columns that are already presentChanges
SchemaUtilities.addPartitionColsToSchema(): Added deduplication check before adding partition columnsSchemaUtilitiesTests: Added two tests — one verifying duplicates are skipped, one verifying normal partition column addition still worksTest plan
testAddPartitionColsToSchemaSkipsDuplicatesfails without fix, passes with fixtestAddPartitionColsToSchemaAddsNewPartitionColconfirms normal behavior unchangedtestBaseTableWithPartition,testSelectStarWithPartition,testSelectPartitionColumn,testUnionSelectStarFromPartitionTable)