Add read support for Iceberg Parquet encryption in the Iceberg connector.#28389
Add read support for Iceberg Parquet encryption in the Iceberg connector.#28389sopel39 wants to merge 3 commits into
Conversation
|
cc @osscm |
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
|
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
|
|
@sopel39 pls add a bit of color to the description of the PR as well as to the commit comment. The PR comes with approximately 2K LOC Add functional notes to the description about what Iceberg parquet encryption is good for (Release notes would be welcome as well) |
There was a problem hiding this comment.
Pull request overview
This PR adds read support for Iceberg Parquet encryption in Trino’s Iceberg (and Lakehouse Iceberg) connectors by wiring Iceberg encryption configuration, propagating encryption metadata through planning/splits, and enabling Parquet footer/column decryption in the Parquet reader.
Changes:
- Introduces Iceberg encryption configuration +
EncryptionManagerFactory, and wires it into connector modules and table operations so encrypted manifests/metadata can be read. - Extends split and table-changes plumbing to carry Parquet decryption-related data and uses it in
IcebergPageSourceProviderwhen reading Parquet. - Enhances the Parquet
MetadataReader/FileDecryptionPropertiesto support rejecting plaintext files when decryption is expected (with a session override).
Reviewed changes
Copilot reviewed 59 out of 59 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseIcebergModule.java | Wires Iceberg encryption config/factory into the Lakehouse Iceberg module. |
| plugin/trino-iceberg/src/test/java/org/apache/iceberg/snowflake/TestTrinoSnowflakeCatalog.java | Updates Snowflake catalog test wiring for new encryption-aware provider constructor. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/util/EncryptedFileTestUtils.java | Adds test utilities for writing encrypted/plaintext Parquet files and delete files. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java | Adds tests for building Parquet file decryption properties from split data. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestingFileMetastoreKeyManagementClient.java | Adds a testing KMS client implementation. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java | Adjusts v3 tests to allow encrypted-table reads and updates cleanup logic. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergSplitSource.java | Adds tests for deriving Parquet decryption data during split generation. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetEncryption.java | Adds end-to-end tests for reading encrypted Parquet (data + deletes) and plaintext handling. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergPageSourceProvider.java | Updates page source provider test wiring for encryption factory + split changes. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergOrcMetricsCollection.java | Updates table-ops provider construction to include encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergNodeLocalDynamicSplitPruning.java | Updates split construction and page source factory wiring for new encryption-related parameters. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMergeAppend.java | Updates table-ops provider construction to include encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergTestUtils.java | Updates test session/table-ops provider wiring for encryption config/factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/encryption/TestIcebergEncryptionConfig.java | Adds config mapping/defaults tests for IcebergEncryptionConfig. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/nessie/TestTrinoNessieCatalog.java | Updates Nessie catalog provider constructor to accept encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/hms/TestTrinoHiveCatalogWithHiveMetastore.java | Updates HMS catalog construction to include encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/glue/TestTrinoGlueCatalog.java | Updates Glue catalog construction to include encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/file/TestTrinoHiveCatalogWithFileMetastore.java | Updates file-metastore provider wiring to include encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/file/TestAbstractIcebergTableOperations.java | Updates table operations construction to include encryption factory. |
| plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/BaseTrinoCatalogTest.java | Updates test session property wiring for new encryption config parameter. |
| plugin/trino-iceberg/src/main/java/org/apache/iceberg/snowflake/SnowflakeIcebergTableOperations.java | Passes encryption factory into base Iceberg table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/TrinoManifestFile.java | Adds manifest key metadata support for encrypted manifests. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/FilesTableSplitSource.java | Adds table properties to files-table splits (used for encryption manager creation). |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/FilesTableSplit.java | Extends files-table split to carry table properties and account for memory. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/FilesTablePageSource.java | Switches files-table reading to accept an Iceberg FileIO directly. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java | Derives Parquet decryption data from Iceberg key metadata during split planning. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java | Adds optional Parquet decryption data to splits and retained-size accounting. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSessionProperties.java | Adds session property to allow plaintext files when encryption is enabled. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProviderFactory.java | Injects EncryptionManagerFactory into the page source provider factory. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java | Plumbs Parquet decryption properties into Parquet footer read + reader creation. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergModule.java | Binds encryption config and default encryption manager factory. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java | Allows encrypted-table reads but blocks encrypted-table writes. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesSplitSource.java | Adds Parquet decryption data extraction for table-changes splits. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesSplit.java | Extends table-changes split to carry Parquet decryption data. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionProcessor.java | Passes split decryption info into page source provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionHandle.java | Adds storage properties to the function handle. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunction.java | Populates table-changes function handle with table properties. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/IcebergEncryptionConfig.java | Adds catalog-level Iceberg encryption configuration. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/EncryptionManagerFactory.java | Introduces abstraction to create EncryptionManager from table properties. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/EncryptionAwareFileIO.java | Wraps EncryptingFileIO while preserving FileIO.properties(). |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/DefaultEncryptionManagerFactory.java | Default factory that builds KMS client and creates encryption managers. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteFile.java | Adds optional Parquet decryption data to delete-file metadata. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/snowflake/SnowflakeIcebergTableOperationsProvider.java | Injects encryption factory into Snowflake table-ops provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/nessie/IcebergNessieTableOperationsProvider.java | Injects encryption factory into Nessie table-ops provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/nessie/IcebergNessieTableOperations.java | Passes encryption factory to base table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/jdbc/IcebergJdbcTableOperationsProvider.java | Injects encryption factory into JDBC table-ops provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/jdbc/IcebergJdbcTableOperations.java | Passes encryption factory to base table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java | Makes invalidateTableCache public (used by new tests). |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperationsProvider.java | Injects encryption factory into HMS table-ops provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperations.java | Passes encryption factory to base metastore table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/AbstractMetastoreTableOperations.java | Passes encryption factory to base Iceberg table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/GlueIcebergTableOperationsProvider.java | Injects encryption factory into Glue table-ops provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/GlueIcebergTableOperations.java | Passes encryption factory to base table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/file/IcebergFileMetastoreEncryptionConfig.java | Adds file-metastore-specific encryption config (currently not wired). |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/file/FileMetastoreTableOperationsProvider.java | Injects encryption factory into file-metastore table-ops provider. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/file/FileMetastoreTableOperations.java | Passes encryption factory to base metastore table operations. |
| plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractIcebergTableOperations.java | Maintains encryption manager + wraps FileIO for encrypted manifests/metadata. |
| lib/trino-parquet/src/main/java/io/trino/parquet/reader/MetadataReader.java | Adds plaintext-file detection/guardrail when decryption properties are present. |
| lib/trino-parquet/src/main/java/io/trino/parquet/crypto/FileDecryptionProperties.java | Adds plaintextFilesAllowed option to decryption properties. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
|
@sopel39 Karol, my request was about making the PR easier to grasp (and eventually to maintain) by testing it end to end against real-world KMS implementations. With that, I was not implying the |
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
|
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
|
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
|
|
@ebyhr @findinpath removed |
| "TBLPROPERTIES ('write.format.default'='PARQUET', 'encryption.key-id'='%s')", | ||
| sparkTableName, keyId)); | ||
| onSpark().executeQuery(format("INSERT INTO %s VALUES (1, 'alice'), (2, 'bob'), (3, 'charlie')", sparkTableName)); | ||
|
|
There was a problem hiding this comment.
let's verify as well that spark's read content is the same as trino's
|
|
||
| assertQueryFailure(() -> onTrino().executeQuery("INSERT INTO " + trinoTableName + " VALUES (4, 'dave')")) | ||
| .hasMessageContaining("Writing to encrypted Iceberg tables is not supported"); | ||
|
|
There was a problem hiding this comment.
can we check as well $files metadata table content ?
| "CREATE TABLE %s (id INT, name STRING) USING ICEBERG " + | ||
| "TBLPROPERTIES ('write.format.default'='PARQUET', 'encryption.key-id'='%s')", | ||
| sparkTableName, keyId)); | ||
| onSpark().executeQuery(format("INSERT INTO %s VALUES (1, 'alice'), (2, 'bob'), (3, 'charlie')", sparkTableName)); |
There was a problem hiding this comment.
[root@hadoop-master /]# hdfs dfs -copyToLocal /user/hive/warehouse/t1/data/00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet .
[root@hadoop-master /]# ls
00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet anaconda-post.log bin dev docker etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
[root@hadoop-master /]# exit
exit
(base) ➜ ~ docker cp ptl-hadoop-master:/00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet .
Successfully copied 654B (transferred 2.56kB) to /Users/marius/.
(base) ➜ ~ parquet schema 00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet
{
"type" : "record",
"name" : "table",
"fields" : [ {
"name" : "id",
"type" : "int"
}, {
"name" : "name",
"type" : "string"
} ]
}
(base) ➜ ~ parquet cat 00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet
{"id": 3, "name": "charlie"}
content doesn't seem encrypted to me.
There was a problem hiding this comment.
Iceberg 1.10.1 does not support reading encrypted Avro manifest list files (apache/iceberg#7770 is not in the 1.10.x branch), and 1.10.1's Hive Catalog also lacks the wiring to engage StandardEncryptionManager on writes (apache/iceberg#13066, apache/iceberg#15272 not in 1.10.x). End-to-end encrypted Iceberg tables test with Hive catalog requires Iceberg 1.11.0+ on both sides (Spark and Trino)
While I've managed to make Spark write encrypted tables with Hive catalog, these cannot be read by Trino using 1.10 (encrypted Avro read fails). I wonder if it makes sense to support PME without Hive catalog or we should just wait for 1.11
There was a problem hiding this comment.
Stack your PR on a branch that feeds from #26640
Iceberg table encryption protects the confidentiality and integrity of table data at rest in untrusted storage. Data files, delete files, manifest files, and manifest list files are encrypted and tamper-proofed before being sent to storage, using per-file encryption keys and an envelope encryption scheme backed by a KMS (Key Management Service). Wire the Iceberg encryption subsystem into Trino so that tables with the `encryption.key-id` table property can be read. Writing to encrypted tables is not yet supported and is explicitly blocked. What's included: - Catalog-level KMS configuration via the `iceberg.encryption.kms-type` property, which accepts `AWS` or `GCP`. - `EncryptionManagerFactory` that creates an Iceberg `EncryptionManager` from table properties, used to unwrap per-file DEKs via the configured KMS. - `EncryptionAwareFileIO` wrapper so that table operations (manifest/metadata reads) go through the Iceberg `EncryptingFileIO`. - Propagation of Parquet decryption data (file encryption key + AAD prefix) through splits to the Parquet reader. - Plaintext file detection: by default, reading an unencrypted file in an encrypted table fails; overridable via the `iceberg.encryption.plaintext-files-allowed-for-encrypted-tables` catalog property or the `plaintext_files_allowed_for_encrypted_tables` catalog session property. - Write blocking on all data-producing paths (INSERT, MERGE, OPTIMIZE, CREATE OR REPLACE, add_files, materialized view refresh). - Support for encrypted delete files (position and equality deletes) and table changes. See https://iceberg.apache.org/docs/nightly/encryption/ for details on Iceberg's encryption model. Co-Authored-By: kamijin_fanta <kamijin@live.jp> Co-Authored-By: Yuya Ebihara <ebyhry@gmail.com>
Description
Add read support for Iceberg Parquet encryption in the Iceberg connector.
Iceberg table encryption protects the confidentiality and integrity of table data at rest
in untrusted storage. Data files, delete files, manifest files, and manifest list files are
encrypted and tamper-proofed before being sent to storage, using per-file encryption keys
and an envelope encryption scheme backed by a KMS (Key Management Service).
This PR wires the Iceberg encryption subsystem into Trino so that tables with the
encryption.key-idtable property can be read. Writing to encrypted tables is not yetsupported and is explicitly blocked.
What's included
iceberg.encryption.kms-typeproperty,which accepts
AWSorGCPEncryptionManagerFactorythat creates an IcebergEncryptionManagerfrom tableproperties, used to unwrap per-file DEKs via the configured KMS
EncryptionAwareFileIOwrapper so that table operations (manifest/metadata reads)go through the Iceberg
EncryptingFileIOsplits to the Parquet reader
table fails; overridable via the
iceberg.encryption.plaintext-files-allowed-for-encrypted-tablescatalog property or the
plaintext_files_allowed_for_encrypted_tablescatalog sessionproperty
add_files, materialized view refresh)
Additional context and related issues
See https://iceberg.apache.org/docs/nightly/encryption/ for details on
Iceberg's encryption model.
Release notes
Release notes are required, with the following suggested text:
Iceberg
28389)