-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Deliver key metadata for encryption of data files #9359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,6 +32,7 @@ public class StandardEncryptionManager implements EncryptionManager { | |
| private final transient KeyManagementClient kmsClient; | ||
| private final String tableKeyId; | ||
| private final int dataKeyLength; | ||
| private final boolean nativeDataEncryption; | ||
|
|
||
| private transient volatile SecureRandom lazyRNG = null; | ||
|
|
||
|
|
@@ -41,7 +42,10 @@ public class StandardEncryptionManager implements EncryptionManager { | |
| * @param kmsClient Client of KMS used to wrap/unwrap keys in envelope encryption | ||
| */ | ||
| public StandardEncryptionManager( | ||
| String tableKeyId, int dataKeyLength, KeyManagementClient kmsClient) { | ||
| String tableKeyId, | ||
| int dataKeyLength, | ||
| KeyManagementClient kmsClient, | ||
| boolean nativeDataEncryption) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that this should be passed in. The encryption manager needs to support files that use both native encryption (Parquet) and files that use AES GCM streams (Avro). There is no way to set this correctly because the behavior depends on the file type. |
||
| Preconditions.checkNotNull(tableKeyId, "Invalid encryption key ID: null"); | ||
| Preconditions.checkArgument( | ||
| dataKeyLength == 16 || dataKeyLength == 24 || dataKeyLength == 32, | ||
|
|
@@ -51,6 +55,7 @@ public StandardEncryptionManager( | |
| this.tableKeyId = tableKeyId; | ||
| this.kmsClient = kmsClient; | ||
| this.dataKeyLength = dataKeyLength; | ||
| this.nativeDataEncryption = nativeDataEncryption; | ||
| } | ||
|
|
||
| @Override | ||
|
|
@@ -67,7 +72,15 @@ public InputFile decrypt(EncryptedInputFile encrypted) { | |
| @Override | ||
| public Iterable<InputFile> decrypt(Iterable<EncryptedInputFile> encrypted) { | ||
| // Bulk decrypt is only applied to data files. Returning source input files for parquet. | ||
| return Iterables.transform(encrypted, this::decrypt); | ||
| if (nativeDataEncryption) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no way to know the intended file format or whether it will use native encryption at this point. I think this needs to always return the result of |
||
| return Iterables.transform(encrypted, this::getSourceFile); | ||
| } else { | ||
| return Iterables.transform(encrypted, this::decrypt); | ||
| } | ||
| } | ||
|
|
||
| private InputFile getSourceFile(EncryptedInputFile encryptedFile) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Style: Iceberg method names should not include |
||
| return encryptedFile.encryptedInputFile(); | ||
| } | ||
|
|
||
| private SecureRandom workerRNG() { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,7 +31,7 @@ | |
| import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; | ||
| import org.apache.iceberg.types.Types; | ||
|
|
||
| class StandardKeyMetadata implements EncryptionKeyMetadata, IndexedRecord { | ||
| public class StandardKeyMetadata implements EncryptionKeyMetadata, IndexedRecord { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't make this class public because it implements an Avro interface. If it needs to be public, we will have to extract |
||
| private static final byte V1 = 1; | ||
| private static final Schema SCHEMA_V1 = | ||
| new Schema( | ||
|
|
@@ -73,11 +73,11 @@ static Map<Byte, org.apache.avro.Schema> supportedAvroSchemaVersions() { | |
| return avroSchemaVersions; | ||
| } | ||
|
|
||
| ByteBuffer encryptionKey() { | ||
| public ByteBuffer encryptionKey() { | ||
| return encryptionKey; | ||
| } | ||
|
|
||
| ByteBuffer aadPrefix() { | ||
| public ByteBuffer aadPrefix() { | ||
| return aadPrefix; | ||
| } | ||
|
|
||
|
|
@@ -95,7 +95,7 @@ static StandardKeyMetadata castOrParse(EncryptionKeyMetadata keyMetadata) { | |
| return parse(kmBuffer); | ||
| } | ||
|
|
||
| static StandardKeyMetadata parse(ByteBuffer buffer) { | ||
| public static StandardKeyMetadata parse(ByteBuffer buffer) { | ||
| try { | ||
| return KEY_METADATA_DECODER.decode(buffer); | ||
| } catch (IOException e) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not supported? The AES GCM stream could easily be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also run
TestAvroFileSpliton Avro inside of AES GCM streamsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How nice. I expected Avro table encryption to work directly with AES GCM Streams - but not without some hiccups and fixes, since I never ran this usecase before. Turns out it just works out of box.
Now I have a functioning e2e unitest that encrypts/decrypts an Iceberg table with the Avro data format. The unitest is based on Spark SQL and catalog clients, so will go into the integration PR.
I'll add an encrypting version of the
TestAvroFileSplitto this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll verify of course where the file length comes from for the Avro reader.