[Coral-Schema] Skip duplicate partition columns in getAvroSchemaForTable Hive-merge path by simbadzina · Pull Request #601 · linkedin/coral

simbadzina · 2026-04-17T00:58:26Z

Summary

Follow-up to #587 covering a sibling code path in the same file.

SchemaUtilities.getAvroSchemaForTable() (non-Avro-serde branch) builds a Hive column list from table.getSd().getCols() and appends getPartitionCols(table) before merging with the original Avro schema via MergeHiveSchemaWithAvro.visit(). If the regular column list already contains a column with the same name as a partition key, the merge produces a record with duplicate fields and fails at SchemaUtilities.copyRecord with AvroRuntimeException: Duplicate field X in record.

The same pattern exists in the fallback convertHiveSchemaToAvro(). Both call sites now go through a small shared helper that skips partition columns whose names are already present in the column list.

Changes

SchemaUtilities: Added private helper appendMissingPartitionCols(cols, table) that dedups by column name; replaced the two existing cols.addAll(getPartitionCols(table)) call sites (getAvroSchemaForTable and convertHiveSchemaToAvro) with calls to the helper.
SchemaUtilitiesTests: Added two tests for convertHiveSchemaToAvro covering both the duplicate-skipping case and the normal-append case.

Notes

This is complementary to Fix duplicate field error when partition column already exists in schema #587 which addresses the addPartitionColsToSchema path. The two PRs target different stack traces and can be reviewed independently.
Stack trace this PR fixes (observed in production): RecordSchema.setFields ← SchemaUtilities.copyRecord ← MergeHiveSchemaWithAvro.struct ← SchemaUtilities.getAvroSchemaForTable.

Test plan

New tests pass against the fixed branch
./gradlew coral-schema:test passes

…hema from Hive cols getAvroSchemaForTable and convertHiveSchemaToAvro both appended the table's partition columns to the regular column list before converting to Avro. When the regular column list already contained a partition column by name, the resulting Avro record failed with "Duplicate field X in record". This mirrors the dedup already applied in addPartitionColsToSchema. Extracted into a shared appendMissingPartitionCols helper and covered by tests for convertHiveSchemaToAvro.

simbadzina force-pushed the sdzinama/dedup-partition-cols-merge-hive-path branch from c396840 to dd82e4a Compare April 21, 2026 17:14

simbadzina force-pushed the sdzinama/dedup-partition-cols-merge-hive-path branch from dd82e4a to 585f841 Compare April 21, 2026 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Coral-Schema] Skip duplicate partition columns in getAvroSchemaForTable Hive-merge path#601

[Coral-Schema] Skip duplicate partition columns in getAvroSchemaForTable Hive-merge path#601
simbadzina wants to merge 1 commit into
linkedin:masterfrom
simbadzina:sdzinama/dedup-partition-cols-merge-hive-path

simbadzina commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

simbadzina commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simbadzina commented Apr 17, 2026 •

edited

Loading