[Coral-Schema] Skip duplicate partition columns in getAvroSchemaForTable Hive-merge path#601
Draft
simbadzina wants to merge 1 commit into
Draft
Conversation
c396840 to
dd82e4a
Compare
…hema from Hive cols getAvroSchemaForTable and convertHiveSchemaToAvro both appended the table's partition columns to the regular column list before converting to Avro. When the regular column list already contained a partition column by name, the resulting Avro record failed with "Duplicate field X in record". This mirrors the dedup already applied in addPartitionColsToSchema. Extracted into a shared appendMissingPartitionCols helper and covered by tests for convertHiveSchemaToAvro.
dd82e4a to
585f841
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #587 covering a sibling code path in the same file.
SchemaUtilities.getAvroSchemaForTable()(non-Avro-serde branch) builds a Hive column list fromtable.getSd().getCols()and appendsgetPartitionCols(table)before merging with the original Avro schema viaMergeHiveSchemaWithAvro.visit(). If the regular column list already contains a column with the same name as a partition key, the merge produces a record with duplicate fields and fails atSchemaUtilities.copyRecordwithAvroRuntimeException: Duplicate field X in record.The same pattern exists in the fallback
convertHiveSchemaToAvro(). Both call sites now go through a small shared helper that skips partition columns whose names are already present in the column list.Changes
SchemaUtilities: Added private helperappendMissingPartitionCols(cols, table)that dedups by column name; replaced the two existingcols.addAll(getPartitionCols(table))call sites (getAvroSchemaForTableandconvertHiveSchemaToAvro) with calls to the helper.SchemaUtilitiesTests: Added two tests forconvertHiveSchemaToAvrocovering both the duplicate-skipping case and the normal-append case.Notes
addPartitionColsToSchemapath. The two PRs target different stack traces and can be reviewed independently.RecordSchema.setFields←SchemaUtilities.copyRecord←MergeHiveSchemaWithAvro.struct←SchemaUtilities.getAvroSchemaForTable.Test plan
./gradlew coral-schema:testpasses