Skip to content

Core: Add support for HashiCorp Vault KMS client#16075

Open
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/hashicorp-vault
Open

Core: Add support for HashiCorp Vault KMS client#16075
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/hashicorp-vault

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Apr 22, 2026

Restores #14451 with additional changes:

  • Extracted the module so non-HashiCorp users can avoid the dependency
  • Added hashicorp to encryption.kms-type
  • Added test with hashicorp/vault docker image
  • Replaced environment variables with properties
  • Made Vault KMS client serializable

Mailing list: https://lists.apache.org/thread/1269w5pzoy723sr1c6xxq8jg02zcf3on

Fixes #14437

@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch from a988363 to 7e97bf6 Compare April 22, 2026 07:24
@github-actions github-actions Bot added the spark label Apr 22, 2026
@ebyhr ebyhr marked this pull request as draft April 23, 2026 11:37
@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch 2 times, most recently from fe0aed2 to a896b32 Compare April 24, 2026 00:19
@ebyhr ebyhr marked this pull request as ready for review April 24, 2026 01:16
@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch 2 times, most recently from d0d8db7 to c458222 Compare April 24, 2026 02:04
@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Apr 24, 2026

@pvary could you take a look at this PR when you have a moment?

cc: @mrendi29

Comment thread spark/v4.1/spark-runtime/runtime-deps.txt Outdated
@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch from c458222 to 6bc3124 Compare April 25, 2026 01:36
@pvary
Copy link
Copy Markdown
Contributor

pvary commented Apr 27, 2026

I think we should talk on the dev list about adding a new project and supporting a new KMS client.

@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch 3 times, most recently from a96a6d5 to a4ff3a3 Compare May 1, 2026 01:55
@mrendi29
Copy link
Copy Markdown

mrendi29 commented May 2, 2026

tysm @ebyhr for picking this up.

I will test this KMS implementation against our staging env on Monday and get back to you if I spot anything.

Copy link
Copy Markdown

@caushie-akamai caushie-akamai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reproducing this today with the following steps:

  1. Build jar ./gradlew build :iceberg-hashicorp:build -DsparkVersions=3.5 -x test -x integrationTest -x revapi --parallel
  2. Copy iceberg jar + hashicorp jar into spark
  3. Spawn spark-sql shell with vault configuration

But i am getting a java.lang.NoClassDefFoundError: org/apache/hc/core5/http/HttpEntity
Should we also shade http jars in here or would it be better to change the imports to the shaded version? WDYT


String configuredVaultToken = newProperties.get(VaultProperties.VAULT_TOKEN_PROP);
appRoleId = newProperties.get(VaultProperties.VAULT_ROLE_ID_PROP);
appSecretId = newProperties.get(VaultProperties.VAULT_SECRET_ID_PROP);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about using an env var instead of explicitly specifying secretID in the config?

I believe this would be beneficial for folks who use spark-operator in k8s and would not like sensitive credentials to live in the spark config.

Internally we also grab the secretID from an environment variable to avoid this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variables don't align well with Trino's catalog-based architecture, because in theory the value can differ per catalog (even if that's not practical). I've added support for both properties and environment variables.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, this way we are supporting both query engines.

if (httpClient == null) {
synchronized (this) {
if (httpClient == null) {
httpClient = HttpClients.createDefault();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about also allowing clients to configure SSL config in the http client? There may be folks who do a mTLS vault deployment or who need to trust their vault deployment's CA.

I believe the bettercloud vault had a similar solution: https://github.com/BetterCloud/vault-java-driver/blob/master/src/main/java/com/bettercloud/vault/SslConfig.java#L44

I am trying to find if such a thing is also possible with the apache http client

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding TLS support with the Apache HTTP client is straightforward. However, I'd like to keep the initial PR focused. Can we add TLS support in a follow-up?

@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch from a4ff3a3 to 44744c4 Compare May 6, 2026 20:17
Co-Authored-By: Endi Caushi <42871239+mrendi29@users.noreply.github.com>
@ebyhr ebyhr force-pushed the ebi/hashicorp-vault branch from 44744c4 to 446bbee Compare May 7, 2026 03:59
@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented May 7, 2026

@caushie-akamai I've added the following lines to iceberg-hashicorp project in build.gradle:

    apply plugin: 'com.gradleup.shadow'

    build.dependsOn shadowJar

Could you please confirm whether the NoClassDefFoundError issue has been resolved?

@mrendi29
Copy link
Copy Markdown

TYSM @ebyhr , i'll try to test this against staging tomorrow morning and get back to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Hashicorp Vault as encryption client

4 participants