Skip to content

Add read support for Iceberg Parquet encryption in the Iceberg connector.#28389

Open
sopel39 wants to merge 3 commits into
trinodb:masterfrom
sopel39:ks/iceberg_enc
Open

Add read support for Iceberg Parquet encryption in the Iceberg connector.#28389
sopel39 wants to merge 3 commits into
trinodb:masterfrom
sopel39:ks/iceberg_enc

Conversation

@sopel39
Copy link
Copy Markdown
Member

@sopel39 sopel39 commented Feb 20, 2026

Description

Add read support for Iceberg Parquet encryption in the Iceberg connector.

Iceberg table encryption protects the confidentiality and integrity of table data at rest
in untrusted storage. Data files, delete files, manifest files, and manifest list files are
encrypted and tamper-proofed before being sent to storage, using per-file encryption keys
and an envelope encryption scheme backed by a KMS (Key Management Service).

This PR wires the Iceberg encryption subsystem into Trino so that tables with the
encryption.key-id table property can be read. Writing to encrypted tables is not yet
supported and is explicitly blocked.

What's included

  • Catalog-level KMS configuration via the iceberg.encryption.kms-type property,
    which accepts AWS or GCP
  • EncryptionManagerFactory that creates an Iceberg EncryptionManager from table
    properties, used to unwrap per-file DEKs via the configured KMS
  • EncryptionAwareFileIO wrapper so that table operations (manifest/metadata reads)
    go through the Iceberg EncryptingFileIO
  • Propagation of Parquet decryption data (file encryption key + AAD prefix) through
    splits to the Parquet reader
  • Plaintext file detection: by default, reading an unencrypted file in an encrypted
    table fails; overridable via the iceberg.encryption.plaintext-files-allowed-for-encrypted-tables
    catalog property or the plaintext_files_allowed_for_encrypted_tables catalog session
    property
  • Write blocking on all data-producing paths (INSERT, MERGE, OPTIMIZE, CREATE OR REPLACE,
    add_files, materialized view refresh)
  • Support for encrypted delete files (position and equality deletes) and table changes

Additional context and related issues

See https://iceberg.apache.org/docs/nightly/encryption/ for details on
Iceberg's encryption model.

Release notes

Release notes are required, with the following suggested text:

Iceberg

  • Add read support for encrypted Parquet tables in Iceberg. ({issue}28389)

@cla-bot cla-bot Bot added the cla-signed label Feb 20, 2026
@github-actions github-actions Bot added the iceberg Iceberg connector label Feb 20, 2026
@sopel39
Copy link
Copy Markdown
Member Author

sopel39 commented Feb 20, 2026

cc @osscm

@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Mar 16, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions Bot closed this Apr 7, 2026
@sopel39 sopel39 reopened this Apr 8, 2026
@github-actions github-actions Bot removed the stale label Apr 8, 2026
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 9, 2026

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot Bot removed the cla-signed label Apr 9, 2026
@sopel39 sopel39 changed the title Add support for Iceberg Parquet encryption (File metastore) Add support for Iceberg Parquet encryption Apr 9, 2026
@cla-bot cla-bot Bot added the cla-signed label Apr 9, 2026
@sopel39 sopel39 marked this pull request as ready for review April 9, 2026 16:01
@sopel39 sopel39 requested review from ebyhr and raunaqmorarka April 9, 2026 16:01
@findinpath findinpath requested a review from Copilot April 13, 2026 08:51
@findinpath
Copy link
Copy Markdown
Contributor

findinpath commented Apr 13, 2026

@sopel39 pls add a bit of color to the description of the PR as well as to the commit comment.

The PR comes with approximately 2K LOC

Add functional notes to the description about what Iceberg parquet encryption is good for
https://iceberg.apache.org/docs/nightly/encryption/

(Release notes would be welcome as well)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds read support for Iceberg Parquet encryption in Trino’s Iceberg (and Lakehouse Iceberg) connectors by wiring Iceberg encryption configuration, propagating encryption metadata through planning/splits, and enabling Parquet footer/column decryption in the Parquet reader.

Changes:

  • Introduces Iceberg encryption configuration + EncryptionManagerFactory, and wires it into connector modules and table operations so encrypted manifests/metadata can be read.
  • Extends split and table-changes plumbing to carry Parquet decryption-related data and uses it in IcebergPageSourceProvider when reading Parquet.
  • Enhances the Parquet MetadataReader/FileDecryptionProperties to support rejecting plaintext files when decryption is expected (with a session override).

Reviewed changes

Copilot reviewed 59 out of 59 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseIcebergModule.java Wires Iceberg encryption config/factory into the Lakehouse Iceberg module.
plugin/trino-iceberg/src/test/java/org/apache/iceberg/snowflake/TestTrinoSnowflakeCatalog.java Updates Snowflake catalog test wiring for new encryption-aware provider constructor.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/util/EncryptedFileTestUtils.java Adds test utilities for writing encrypted/plaintext Parquet files and delete files.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestParquetPredicates.java Adds tests for building Parquet file decryption properties from split data.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestingFileMetastoreKeyManagementClient.java Adds a testing KMS client implementation.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV3.java Adjusts v3 tests to allow encrypted-table reads and updates cleanup logic.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergSplitSource.java Adds tests for deriving Parquet decryption data during split generation.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetEncryption.java Adds end-to-end tests for reading encrypted Parquet (data + deletes) and plaintext handling.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergPageSourceProvider.java Updates page source provider test wiring for encryption factory + split changes.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergOrcMetricsCollection.java Updates table-ops provider construction to include encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergNodeLocalDynamicSplitPruning.java Updates split construction and page source factory wiring for new encryption-related parameters.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergMergeAppend.java Updates table-ops provider construction to include encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergTestUtils.java Updates test session/table-ops provider wiring for encryption config/factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/encryption/TestIcebergEncryptionConfig.java Adds config mapping/defaults tests for IcebergEncryptionConfig.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/nessie/TestTrinoNessieCatalog.java Updates Nessie catalog provider constructor to accept encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/hms/TestTrinoHiveCatalogWithHiveMetastore.java Updates HMS catalog construction to include encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/glue/TestTrinoGlueCatalog.java Updates Glue catalog construction to include encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/file/TestTrinoHiveCatalogWithFileMetastore.java Updates file-metastore provider wiring to include encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/file/TestAbstractIcebergTableOperations.java Updates table operations construction to include encryption factory.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/BaseTrinoCatalogTest.java Updates test session property wiring for new encryption config parameter.
plugin/trino-iceberg/src/main/java/org/apache/iceberg/snowflake/SnowflakeIcebergTableOperations.java Passes encryption factory into base Iceberg table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/TrinoManifestFile.java Adds manifest key metadata support for encrypted manifests.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/FilesTableSplitSource.java Adds table properties to files-table splits (used for encryption manager creation).
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/FilesTableSplit.java Extends files-table split to carry table properties and account for memory.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/system/files/FilesTablePageSource.java Switches files-table reading to accept an Iceberg FileIO directly.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java Derives Parquet decryption data from Iceberg key metadata during split planning.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java Adds optional Parquet decryption data to splits and retained-size accounting.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSessionProperties.java Adds session property to allow plaintext files when encryption is enabled.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProviderFactory.java Injects EncryptionManagerFactory into the page source provider factory.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java Plumbs Parquet decryption properties into Parquet footer read + reader creation.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergModule.java Binds encryption config and default encryption manager factory.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Allows encrypted-table reads but blocks encrypted-table writes.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesSplitSource.java Adds Parquet decryption data extraction for table-changes splits.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesSplit.java Extends table-changes split to carry Parquet decryption data.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionProcessor.java Passes split decryption info into page source provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunctionHandle.java Adds storage properties to the function handle.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/functions/tablechanges/TableChangesFunction.java Populates table-changes function handle with table properties.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/IcebergEncryptionConfig.java Adds catalog-level Iceberg encryption configuration.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/EncryptionManagerFactory.java Introduces abstraction to create EncryptionManager from table properties.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/EncryptionAwareFileIO.java Wraps EncryptingFileIO while preserving FileIO.properties().
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/encryption/DefaultEncryptionManagerFactory.java Default factory that builds KMS client and creates encryption managers.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteFile.java Adds optional Parquet decryption data to delete-file metadata.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/snowflake/SnowflakeIcebergTableOperationsProvider.java Injects encryption factory into Snowflake table-ops provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/nessie/IcebergNessieTableOperationsProvider.java Injects encryption factory into Nessie table-ops provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/nessie/IcebergNessieTableOperations.java Passes encryption factory to base table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/jdbc/IcebergJdbcTableOperationsProvider.java Injects encryption factory into JDBC table-ops provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/jdbc/IcebergJdbcTableOperations.java Passes encryption factory to base table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java Makes invalidateTableCache public (used by new tests).
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperationsProvider.java Injects encryption factory into HMS table-ops provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperations.java Passes encryption factory to base metastore table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/AbstractMetastoreTableOperations.java Passes encryption factory to base Iceberg table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/GlueIcebergTableOperationsProvider.java Injects encryption factory into Glue table-ops provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/GlueIcebergTableOperations.java Passes encryption factory to base table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/file/IcebergFileMetastoreEncryptionConfig.java Adds file-metastore-specific encryption config (currently not wired).
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/file/FileMetastoreTableOperationsProvider.java Injects encryption factory into file-metastore table-ops provider.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/file/FileMetastoreTableOperations.java Passes encryption factory to base metastore table operations.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/AbstractIcebergTableOperations.java Maintains encryption manager + wraps FileIO for encrypted manifests/metadata.
lib/trino-parquet/src/main/java/io/trino/parquet/reader/MetadataReader.java Adds plaintext-file detection/guardrail when decryption properties are present.
lib/trino-parquet/src/main/java/io/trino/parquet/crypto/FileDecryptionProperties.java Adds plaintextFilesAllowed option to decryption properties.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 15, 2026

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot Bot removed the cla-signed label Apr 15, 2026
@findinpath
Copy link
Copy Markdown
Contributor

@ebyhr @findinpath added KMS loading test. In 1.10 azure kms is not yet available

@sopel39 Karol, my request was about making the PR easier to grasp (and eventually to maintain) by testing it end to end against real-world KMS implementations. With that, I was not implying the io.trino.plugin.iceberg.encryption.TestKmsClientInstantiation unit test.

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 27, 2026

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot Bot removed the cla-signed label Apr 27, 2026
@sopel39 sopel39 requested a review from ebyhr April 27, 2026 16:47
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 27, 2026

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 29, 2026

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Karol Sobczak.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot Bot removed the cla-signed label Apr 29, 2026
@cla-bot cla-bot Bot added the cla-signed label Apr 29, 2026
@sopel39
Copy link
Copy Markdown
Member Author

sopel39 commented Apr 29, 2026

@ebyhr @findinpath removed kms-impl, added localstack based PT

Comment thread docs/src/main/sphinx/connector/iceberg.md
Comment thread docs/src/main/sphinx/connector/iceberg.md
"TBLPROPERTIES ('write.format.default'='PARQUET', 'encryption.key-id'='%s')",
sparkTableName, keyId));
onSpark().executeQuery(format("INSERT INTO %s VALUES (1, 'alice'), (2, 'bob'), (3, 'charlie')", sparkTableName));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's verify as well that spark's read content is the same as trino's


assertQueryFailure(() -> onTrino().executeQuery("INSERT INTO " + trinoTableName + " VALUES (4, 'dave')"))
.hasMessageContaining("Writing to encrypted Iceberg tables is not supported");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check as well $files metadata table content ?

"CREATE TABLE %s (id INT, name STRING) USING ICEBERG " +
"TBLPROPERTIES ('write.format.default'='PARQUET', 'encryption.key-id'='%s')",
sparkTableName, keyId));
onSpark().executeQuery(format("INSERT INTO %s VALUES (1, 'alice'), (2, 'bob'), (3, 'charlie')", sparkTableName));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[root@hadoop-master /]# hdfs dfs -copyToLocal /user/hive/warehouse/t1/data/00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet .
[root@hadoop-master /]# ls
00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet  anaconda-post.log  bin  dev  docker  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
[root@hadoop-master /]# exit
exit
(base) ➜  ~ docker cp ptl-hadoop-master:/00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet  .
Successfully copied 654B (transferred 2.56kB) to /Users/marius/.
(base) ➜  ~ parquet schema 00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet 
{
  "type" : "record",
  "name" : "table",
  "fields" : [ {
    "name" : "id",
    "type" : "int"
  }, {
    "name" : "name",
    "type" : "string"
  } ]
}
(base) ➜  ~ parquet cat 00002-2-7efcb6e4-7dd0-45c8-9882-c371bfb1992a-0-00001.parquet 
{"id": 3, "name": "charlie"}

content doesn't seem encrypted to me.

Copy link
Copy Markdown
Member Author

@sopel39 sopel39 Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg 1.10.1 does not support reading encrypted Avro manifest list files (apache/iceberg#7770 is not in the 1.10.x branch), and 1.10.1's Hive Catalog also lacks the wiring to engage StandardEncryptionManager on writes (apache/iceberg#13066, apache/iceberg#15272 not in 1.10.x). End-to-end encrypted Iceberg tables test with Hive catalog requires Iceberg 1.11.0+ on both sides (Spark and Trino)

While I've managed to make Spark write encrypted tables with Hive catalog, these cannot be read by Trino using 1.10 (encrypted Avro read fails). I wonder if it makes sense to support PME without Hive catalog or we should just wait for 1.11

Copy link
Copy Markdown
Contributor

@findinpath findinpath Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack your PR on a branch that feeds from #26640

Iceberg table encryption protects the confidentiality and integrity of
table data at rest in untrusted storage. Data files, delete files,
manifest files, and manifest list files are encrypted and tamper-proofed
before being sent to storage, using per-file encryption keys and an
envelope encryption scheme backed by a KMS (Key Management Service).

Wire the Iceberg encryption subsystem into Trino so that tables with
the `encryption.key-id` table property can be read. Writing to
encrypted tables is not yet supported and is explicitly blocked.

What's included:
- Catalog-level KMS configuration via the `iceberg.encryption.kms-type`
  property, which accepts `AWS` or `GCP`.
- `EncryptionManagerFactory` that creates an Iceberg `EncryptionManager`
  from table properties, used to unwrap per-file DEKs via the configured
  KMS.
- `EncryptionAwareFileIO` wrapper so that table operations
  (manifest/metadata reads) go through the Iceberg `EncryptingFileIO`.
- Propagation of Parquet decryption data (file encryption key + AAD
  prefix) through splits to the Parquet reader.
- Plaintext file detection: by default, reading an unencrypted file in
  an encrypted table fails; overridable via the
  `iceberg.encryption.plaintext-files-allowed-for-encrypted-tables`
  catalog property or the `plaintext_files_allowed_for_encrypted_tables`
  catalog session property.
- Write blocking on all data-producing paths (INSERT, MERGE, OPTIMIZE,
  CREATE OR REPLACE, add_files, materialized view refresh).
- Support for encrypted delete files (position and equality deletes) and
  table changes.

See https://iceberg.apache.org/docs/nightly/encryption/ for details on
Iceberg's encryption model.

Co-Authored-By: kamijin_fanta <kamijin@live.jp>
Co-Authored-By: Yuya Ebihara <ebyhry@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants