Add support for deletion vector in Iceberg#24882
Conversation
3104a24 to
6ff0af1
Compare
2c25635 to
8f37aff
Compare
0df8334 to
38a1b97
Compare
4d53e64 to
29e196e
Compare
d2912f1 to
f7d4b20
Compare
68d4564 to
f3f8642
Compare
findinpath
left a comment
There was a problem hiding this comment.
I see that slight code refactoring is needed around the way we handle pass task data when doing data file creation & position deletes merge handling.
Very nice to see the functionality already working.
I see we're boldly going with v3 format version for doing the testing, while the default is v2 - maybe we should duplicate the testing for now to be on the safe-side.
Looking forward to release this functionality to the public ❤️
There was a problem hiding this comment.
This method feels somehow artificial for the generic IcebergFileWriter
There was a problem hiding this comment.
I see it used in IcebergMetadat#finishWrite for PUFFIN only from within the CommitTaskData
I don't have right now a solution to suggest, but I feel we can do better also with CommitTaskData.
It currently contains a set of seemingly fields.
Did we explore the avenue of having DataCommitTaskData & PositionDeleteCommitTaskData (& DeletionVectorCommitTaskData) ?
This comment was marked as outdated.
This comment was marked as outdated.
9684b95 to
1c09d52
Compare
1c09d52 to
e5ed0cf
Compare
|
Addressed comments. |
e5ed0cf to
d7d71c3
Compare
|
Addressed comments. |
0303555 to
8971cf6
Compare
|
Squashed commits into one and added |
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for deletion vectors in Iceberg by introducing new file writers and updating related components to work with the V3 deletion vector specification. Key changes include:
- Adding new classes DeletionVectorWriter and DeletionVectorFileWriter to handle deletion vector file creation and merging.
- Updating methods and constructors across multiple files (e.g. DeleteManager, IcebergWritableTableHandle, IcebergFileWriterFactory) to propagate and work with deletion vector related data.
- Adjusting error messages, configuration defaults, and file format handling to support the new V3 spec while maintaining compatibility with format version 2 for Spark.
Reviewed Changes
Copilot reviewed 43 out of 44 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVectorWriter.java | Introduces writer for deletion vector files using PUFFIN format. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeletionVectorFileWriter.java | Implements file writing logic for deletion vectors including merging previous deletes. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteManager.java | Updates delete predicate creation to support deletion vectors. |
| Various Iceberg* files | Update writers, metadata, and configuration to include deletion vector support in the merge/commit paths. |
| docs/src/main/sphinx/connector/iceberg.md | Updates documentation for the supported Iceberg format versions. |
Files not reviewed (1)
- plugin/trino-iceberg/pom.xml: Language not supported
Comments suppressed due to low confidence (3)
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergFileWriterFactory.java:175
- Consider validating that DELETE_FILE_POS exists in POSITION_DELETE_SCHEMA.columns() before using its index to avoid a -1 result, which could lead to unexpected behavior.
int positionChannel = POSITION_DELETE_SCHEMA.columns().indexOf(DELETE_FILE_POS);
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/CommitTaskData.java:28
- [nitpick] Ensure that the transition from IcebergFileFormat to FileFormat is coordinated across the codebase to maintain consistent file format handling.
FileFormat fileFormat,
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java:3166
- [nitpick] Consider including additional contextual information (e.g., the associated data file path) in the error message to improve debuggability when deletion vector metadata is missing.
deleteBuilder.withContentOffset(task.deletionVectorContentOffset().orElseThrow(() -> new IllegalArgumentException("deletionVectorContentOffset is required for deletion vector")));
8971cf6 to
94f838c
Compare
94f838c to
fbb5780
Compare
fbb5780 to
4d11dca
Compare
Description
Add support for deletion vectors (DV) in Iceberg.
A main difference from V2 positional deletes is that only single DV file is allowed per data file.
We need to merge the previous DV to the new DV.
Iceberg says "Position delete files are deprecated in v3", but it's actually prohibited.
Iceberg library throws exceptions if we add legacy position delete files to V3 tables.
The default format version stays at 2 for compatibility with Spark.
Fixes #24457
Release notes