Skip to content

Add Paimon Connector#26634

Open
leaves12138 wants to merge 1 commit into
trinodb:masterfrom
leaves12138:PaimonConnector-Read
Open

Add Paimon Connector#26634
leaves12138 wants to merge 1 commit into
trinodb:masterfrom
leaves12138:PaimonConnector-Read

Conversation

@leaves12138
Copy link
Copy Markdown

@leaves12138 leaves12138 commented Sep 15, 2025

Description

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture. After it gradulates, it becomes the Apache top level project. I want to introduce Apache Paimon as a
Connector of Trino, just like Iceberg or Hudi connector.

Paimon website: https://paimon.apache.org/
Paimon github home page: https://github.com/apache/paimon

Supported: read from Paimon

Additional context and related issues

@cla-bot cla-bot Bot added the cla-signed label Sep 15, 2025
@leaves12138 leaves12138 changed the title [Paimon] Introduce Paimmon Connector [WIP] [Paimon] Introduce Paimmon Connector Sep 15, 2025
@ebyhr ebyhr added the needs-docs This pull request requires changes to the documentation label Sep 15, 2025
@leaves12138 leaves12138 force-pushed the PaimonConnector-Read branch 2 times, most recently from ccb374a to 8ecbde8 Compare September 16, 2025 09:42
@ebyhr ebyhr changed the title [WIP] [Paimon] Introduce Paimmon Connector Add Paimmon Connector Sep 16, 2025
@ebyhr ebyhr marked this pull request as draft September 16, 2025 13:07
@github-actions github-actions Bot added the docs label Sep 17, 2025
@leaves12138 leaves12138 marked this pull request as ready for review September 17, 2025 08:59
@leaves12138
Copy link
Copy Markdown
Author

leaves12138 commented Sep 17, 2025

@ebyhr Hi, could you help me review this pull request? I have stripped out all the actions like create table, alter table and insert data. Only read from paimon ability is kept in this pull request. I will submit pull request to support INSERT and CREATE/ALTER table operations in the future.

@ebyhr ebyhr removed the needs-docs This pull request requires changes to the documentation label Sep 18, 2025
Comment thread docs/src/main/sphinx/connector.md Outdated
@leaves12138 leaves12138 force-pushed the PaimonConnector-Read branch 2 times, most recently from 0f61144 to fbee577 Compare September 29, 2025 03:19
@ebyhr ebyhr force-pushed the PaimonConnector-Read branch 2 times, most recently from d6b3623 to 7d5aed2 Compare October 1, 2025 23:03
@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Oct 23, 2025
@leaves12138
Copy link
Copy Markdown
Author

@ebyhr Hi, is there any thing I could do to merge this pull request? You can comment it, I will fix them as soon as possible

@leaves12138 leaves12138 requested a review from kbhatianr October 24, 2025 07:16
Comment thread plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java Outdated
Comment thread plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java Outdated
Comment thread plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java Outdated
@github-actions github-actions Bot removed the stale label Oct 24, 2025
@leaves12138
Copy link
Copy Markdown
Author

Fixed as requested

Comment thread plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 8, 2025

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@eskabetxe
Copy link
Copy Markdown
Member

LGTM

@github-actions github-actions Bot removed the stale label Feb 16, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 9, 2026

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Mar 9, 2026
@findepi findepi changed the title Add Paimmon Connector Add Paimon Connector Mar 9, 2026
@github-actions github-actions Bot removed the stale label Mar 10, 2026
Copy link
Copy Markdown
Contributor

@chenjian2664 chenjian2664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed table handle

return table;
}

public ConnectorTableMetadata tableMetadata(ConnectorSession session, TrinoCatalog catalog)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move all session-related methods out, they would likely fit better in the PaimonMetadata class.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

table(session, catalog).comment());
}

public List<ColumnMetadata> columnMetadatas(ConnectorSession session, TrinoCatalog catalog)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

.collect(Collectors.toList());
}

public PaimonColumnHandle columnHandle(ConnectorSession session, TrinoCatalog catalog, String field)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

return PaimonColumnHandle.of(originFieldNames.get(index), dataField.type(), dataField.id());
}

public PaimonTableHandle withFilter(TupleDomain<PaimonColumnHandle> filter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method seems unused, let's remove it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

return new PaimonTableHandle(schemaName, tableName, dynamicOptions, filter, projectedColumns, limit);
}

public PaimonTableHandle withColumns(Set<PaimonColumnHandle> projectedColumns)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@JsonProperty("projection") Set<PaimonColumnHandle> projectedColumns,
@JsonProperty("limit") OptionalLong limit)
{
this.schemaName = schemaName;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the field possible null? Add non-null check if it is not.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

private final OptionalLong limit;
private final Map<String, String> dynamicOptions;

private transient Table table;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid premature optimization for the first version. We can load the table on each access, as my reivews(in this round) earlier, the loading logic likely belongs in the metadata class, which would also be a better place to introduce caching later

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

private final String schemaName;
private final String tableName;
private final TupleDomain<PaimonColumnHandle> predicate;
private final Set<PaimonColumnHandle> projectedColumns;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the field if it's not used

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


import static org.assertj.core.api.Assertions.assertThat;

final class TestPaimonTableHandle
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the test, it's not necessary to test the json de/serialization.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@ebyhr ebyhr removed their request for review March 18, 2026 05:01
Comment on lines +70 to +78
binder -> {
binder.bind(ClassLoader.class).toInstance(PaimonConnectorFactory.class.getClassLoader());
binder.bind(NodeVersion.class).toInstance(new NodeVersion(context.getCurrentNode().getVersion()));
binder.bind(TypeManager.class).toInstance(context.getTypeManager());
binder.bind(OpenTelemetry.class).toInstance(context.getOpenTelemetry());
binder.bind(Node.class).toInstance(context.getCurrentNode());
binder.bind(Tracer.class).toInstance(context.getTracer());
binder.bind(CatalogName.class).toInstance(new CatalogName(catalogName));
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
binder -> {
binder.bind(ClassLoader.class).toInstance(PaimonConnectorFactory.class.getClassLoader());
binder.bind(NodeVersion.class).toInstance(new NodeVersion(context.getCurrentNode().getVersion()));
binder.bind(TypeManager.class).toInstance(context.getTypeManager());
binder.bind(OpenTelemetry.class).toInstance(context.getOpenTelemetry());
binder.bind(Node.class).toInstance(context.getCurrentNode());
binder.bind(Tracer.class).toInstance(context.getTracer());
binder.bind(CatalogName.class).toInstance(new CatalogName(catalogName));
},
new ConnectorContextModule(catalogName, context),
binder -> binder.bind(ClassLoader.class).toInstance(PaimonConnectorFactory.class.getClassLoader()),

( we need to import io.trino.plugin.base.ConnectorContextModule as well)

I was experimenting the use filesystem caching with the Paimon connector, but when fs.cache.enabled=true, FileSystemModule installs AlluxioFileSystemCacheModule, which binds ConsistentHashingHostAddressProvider. That class requires NodeManager to be injected, but PaimonConnectorFactory manually binds only a subset of the connector context objects (e.g. Node, NodeVersion, TypeManager, OpenTelemetry, Tracer, CatalogName) and omits NodeManager.

All other connectors that use FileSystemModule (Iceberg, Hive, Delta Lake) end up by using ConnectorContextModule, which provides all standard context bindings, including NodeManager

Maybe this can be implemented? Other options is to add binder.bind(NodeManager.class).toInstance(context.getNodeManager()); direct as a bind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, cc @leaves12138

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed this!

@eskabetxe
Copy link
Copy Markdown
Member

@leaves12138 can you check? do you need help here?

}
return columnHandles;
}

Copy link
Copy Markdown

@jeancmsantana jeancmsantana Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Override
public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter(
ConnectorSession session,
ConnectorTableHandle tableHandle,
Constraint constraint)
{
PaimonTableHandle paimonTableHandle = (PaimonTableHandle) tableHandle;
Optional<PaimonFilterExtractor.TrinoFilter> trinoFilter =
PaimonFilterExtractor.extract(session, catalog, paimonTableHandle, constraint);
if (trinoFilter.isEmpty()) {
return Optional.empty();
}
PaimonTableHandle newHandle = paimonTableHandle.withFilter(trinoFilter.get().filter());
return Optional.of(new ConstraintApplicationResult<>(newHandle, trinoFilter.get().remainFilter(), constraint.getExpression(), false));
}

(import io.trino.spi.connector.Constraint, import io.trino.spi.connector.ConstraintApplicationResult, import io.trino.spi.expression.ConnectorExpression)

After testing some queries, I notice that we are missing predicate pushdown support, which causes significant performance issues, but all the downstream code for filtering already exists

  • PaimonTableHandle has a predicate field and a withFilter() method.
  • PaimonFilterExtractor.extract() converts Trino's Constraint into a TrinoFilter (split between predicates the connector enforces and those Trino must still apply).
  • PaimonFilterConverter translates the TupleDomain into Paimon-native Predicate objects.
  • PaimonSplitManager already calls readBuilder.withFilter(predicate) for partition/split pruning.
  • PaimonPageSourceProvider already passes the predicate to ORC stripe-level and Parquet row-group filtering.

However, PaimonFilterExtractor.extract() is never called, besides on tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeancmsantana Let's keep the first version simple, it is intention and ok to not support the predicate pd at this time.

@leaves12138 leaves12138 force-pushed the PaimonConnector-Read branch from a1df282 to 37904e4 Compare March 24, 2026 02:47
@leaves12138
Copy link
Copy Markdown
Author

Thanks for all your help!!! I will fix all the comments in one or two days~

@leaves12138 leaves12138 marked this pull request as draft March 24, 2026 05:13
@leaves12138 leaves12138 force-pushed the PaimonConnector-Read branch 2 times, most recently from 11759c4 to 3dff5ae Compare March 26, 2026 15:39
@leaves12138 leaves12138 marked this pull request as ready for review March 26, 2026 15:51
@leaves12138
Copy link
Copy Markdown
Author

@leaves12138 can you check? do you need help here?

Thanks for asking, I will fix them all

@leaves12138 leaves12138 force-pushed the PaimonConnector-Read branch from 3dff5ae to 123060d Compare March 26, 2026 15:56
@leaves12138
Copy link
Copy Markdown
Author

Fixed as requested~

@jeancmsantana
Copy link
Copy Markdown

great work @leaves12138!

@eskabetxe
Copy link
Copy Markdown
Member

@chenjian2664 could you recheck please

@martint
Copy link
Copy Markdown
Member

martint commented Apr 20, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a new Apache Paimon connector for Trino, enabling users to query Paimon-format tables via Trino SQL. The implementation adds a complete connector plugin with configuration management, metadata discovery, split generation, page source handling, and filesystem/Hive metastore catalog backends. Documentation is provided for configuration, type mappings, and SQL support. Supporting infrastructure includes type conversion utilities, file I/O wrappers, and comprehensive test coverage.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (28)
docs/src/main/sphinx/connector/paimon.md-216-219 (1)

216-219: ⚠️ Potential issue | 🟠 Major

Fix truncated metadata table output.

The example output appears truncated with incomplete column names (ending in "min_>" and "NULL>"). This makes the example unclear and unhelpful for users. Consider either wrapping the output properly, showing fewer columns, or formatting the table differently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/main/sphinx/connector/paimon.md` around lines 216 - 219, The example
metadata table in the Paimon doc shows truncated column names and values (e.g.,
the header fragment "partition | bucket | ... | file_size_in_bytes | min_>" and
the value ending "NULL>"), so update the table output in the paimon.md example
to avoid truncation: replace the current single-line wrapped table with either a
fully expanded multi-line table (showing complete column names like "min_values"
or whatever the full name is) or a reduced column set that fits the layout
(e.g., show only "partition | bucket | file_path | file_format | record_count |
file_size_in_bytes"), or render it as a fenced code block with no wrapping so
the full file_path and trailing values (the s3://... path and the NULL value)
are visible and not cut off.
docs/src/main/sphinx/connector/paimon.md-139-142 (1)

139-142: ⚠️ Potential issue | 🟠 Major

Fix grammar, capitalization, and documentation style.

Multiple issues in this section:

  1. "tpch" should be "TPC-H" (proper benchmark name)
  2. "s3" should be "S3" (proper product name)
  3. "which mocked" is missing "is" → "which is mocked"
  4. "minio" should be "MinIO" (proper product name)
  5. "My paimon.properties" uses first person, inappropriate for technical documentation
  6. "shows as below" is informal
📝 Proposed fix
-In the following example queries, `nation` is the Paimon append table loaded by tpch.
-The target file system is s3, which mocked by minio in the local environment.
+In the following example queries, `nation` is the Paimon append table loaded by TPC-H.
+The target file system is S3, which is mocked by MinIO in the local environment.
 
-My paimon.properties in etc/catalog shows as below:
+The `paimon.properties` file in `etc/catalog` contains the following:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/main/sphinx/connector/paimon.md` around lines 139 - 142, Reword the
paragraph to correct capitalization and tone: replace "tpch" with "TPC-H", "s3"
with "S3", and "minio" with "MinIO"; insert the missing "is" in "which is
mocked"; remove first-person phrasing like "My paimon.properties" and replace
informal "shows as below" with a formal phrase such as "is shown below"; for
example, produce a sentence that mentions the Paimon append table "nation"
loaded by TPC-H, that the target file system is S3 which is mocked by MinIO in
the local environment, and that the paimon.properties file in etc/catalog is
shown below to locate the change around the fragments "nation", "Paimon append
table", "etc/catalog", and "paimon.properties".
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java-304-306 (1)

304-306: ⚠️ Potential issue | 🟠 Major

CharType predicate values must apply trailing-space trimming to match Trino's char comparison semantics.

Trino stores CHAR(n) values with trailing spaces removed internally. When building Paimon predicates for CHAR columns, the predicate values must use truncateToLengthAndTrimSpaces() to strip trailing spaces, matching the behavior in TupleDomainOrcPredicate (line 238). Currently, CharType and VarcharType are handled identically without trimming, which will produce incorrect results for CHAR(n) equality and range comparisons against padded values.

Apply truncateToLengthAndTrimSpaces() to CharType values in the conversion logic (around line 304-306), and add a test case for CHAR(n) = 'abc' pushdown where the stored value contains trailing padding to verify correct predicate behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java`
around lines 304 - 306, The current conversion in PaimonFilterConverter treats
CharType and VarcharType identically; modify the branch handling text types so
that when type instanceof CharType you first call
truncateToLengthAndTrimSpaces(...) on the trinoNativeValue bytes (using the same
utility used in TupleDomainOrcPredicate) and then pass the trimmed result to
BinaryString.fromBytes, while leaving VarcharType behavior unchanged; update/add
a unit/integration test asserting pushdown for CHAR(n) = 'abc' where the stored
value contains trailing spaces to verify the trimmed predicate matches Trino
semantics.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConfig.java-42-49 (1)

42-49: ⚠️ Potential issue | 🟠 Major

Validate that paimon.warehouse is configured.

warehouse is required by toOptions(), but the getter is not annotated as required. Add @NotNull so missing catalog config fails during Trino configuration validation instead of later in catalog initialization.

Suggested validation
+    `@NotNull`
     public String getWarehouse()
     {
         return warehouse;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConfig.java`
around lines 42 - 49, Add validation to ensure paimon.warehouse is required by
annotating the PaimonConfig.getWarehouse() method with `@NotNull` so Trino's
configuration validation fails early; locate the PaimonConfig class (methods
getWarehouse and setWarehouse and the toOptions() usage) and add the
javax.validation.constraints.NotNull annotation to the getter, keeping the
existing `@Config` and `@ConfigDescription` on setWarehouse unchanged.
plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTablesInitializer.java-189-198 (1)

189-198: ⚠️ Potential issue | 🟠 Major

Use the same schema for creation and data loading.

The table is created in schemaName, but data is loaded from session.getSchema().orElse(schemaName). If the session has a default schema that differs from schemaName, the initializer writes to the wrong database or fails to find the table.

Suggested fix
             createPaimonTable(paimonCatalog, paimonTableMeta);
-            createTableWithData(paimonCatalog, session.getSchema().orElse(schemaName), tpchTable, queryRunner);
+            createTableWithData(paimonCatalog, schemaName, tpchTable, queryRunner);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTablesInitializer.java`
around lines 189 - 198, The table is created under schemaName but data is loaded
using session.getSchema().orElse(schemaName), which can mismatch; change
createTableWithData usage so it uses the same schema as creation (i.e.,
schemaName) or compute a single targetSchema variable and use it for both
createPaimonTable and createTableWithData to ensure the table creation and data
loading operate on the same schema; update the call sites for createPaimonTable
and createTableWithData accordingly (referencing createPaimonTable,
createTableWithData, schemaName, and session.getSchema().orElse(schemaName)).
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java-67-77 (1)

67-77: ⚠️ Potential issue | 🟠 Major

Build row-id fields in row-type order, not page-channel order.

RowBlock fields are read by ordinal. Iterating channels and checking fieldToIndex.containsValue(i) reorders the row-id fields by projected channel index, which can put values under the wrong row field when projection order differs from the row-id field order.

Suggested fix
         Block[] newBlocks = new Block[nextPage.getChannelCount() + 1];
-        Block[] rowIdBlocks = new Block[fieldToIndex.size()];
-        for (int i = 0, idx = 0; i < nextPage.getChannelCount(); i++) {
+        for (int i = 0; i < nextPage.getChannelCount(); i++) {
             Block block = nextPage.getBlock(i);
             newBlocks[i] = block;
-            if (fieldToIndex.containsValue(i)) {
-                rowIdBlocks[idx] = block;
-                idx++;
-            }
+        }
+        Block[] rowIdBlocks = new Block[fieldToIndex.size()];
+        int idx = 0;
+        for (int channelIndex : fieldToIndex.values()) {
+            rowIdBlocks[idx++] = nextPage.getBlock(channelIndex);
         }
         newBlocks[nextPage.getChannelCount()] = RowBlock.fromFieldBlocks(rowCount, rowIdBlocks);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java`
around lines 67 - 77, The code builds rowIdBlocks in page-channel order by
checking fieldToIndex.containsValue(i) while iterating nextPage.getBlock(i),
which misorders fields when projection differs; instead populate rowIdBlocks in
row-field (ordinal) order: iterate over the field ordinals in fieldToIndex (or
over sorted fieldToIndex.keySet()), for each field get the corresponding channel
index (fieldToIndex.get(fieldOrdinal)) and set rowIdBlocks[fieldOrdinal] =
nextPage.getBlock(channelIndex), then call RowBlock.fromFieldBlocks with that
rowIdBlocks; update PaimonMergePageSourceWrapper where newBlocks/rowIdBlocks are
constructed to use this mapping.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceWrapper.java-65-77 (1)

65-77: ⚠️ Potential issue | 🟠 Major

Do not require completed positions unless a deletion vector is applied.

deletionVector is optional, but Line 67 always calls orElseThrow() before checking whether a page exists or whether filtering is needed. That can fail reads where the wrapper is constructed with an empty deletion vector (when deletionFiles.get().get(i) is null) or when the source reaches EOF without reporting completed positions.

Suggested guard
     public SourcePage getNextSourcePage()
     {
-        int startPosition = (int) source.getCompletedPositions().orElseThrow();
         SourcePage next = source.getNextSourcePage();
         if (next == null) {
             return next;
         }
+        if (deletionVector.isEmpty()) {
+            return next;
+        }
 
         int pageCount = next.getPositionCount();
 
-        return deletionVector
-                .map(deletionVector -> convertToRetained(next, deletionVector, startPosition, pageCount))
-                .orElse(next);
+        int startPosition = (int) source.getCompletedPositions().orElseThrow();
+        return convertToRetained(next, deletionVector.get(), startPosition, pageCount);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceWrapper.java`
around lines 65 - 77, The code currently calls
source.getCompletedPositions().orElseThrow() unconditionally in
getNextSourcePage(), which can throw when no deletionVector is present or at
EOF; change the logic to first get SourcePage next = source.getNextSourcePage()
and return null/next early if next == null or deletionVector.isEmpty(), and only
then (inside the deletionVector.map lambda or after confirming
deletionVector.isPresent()) call source.getCompletedPositions().orElseThrow() to
obtain startPosition and pass it to convertToRetained(next, deletionVector,
startPosition, pageCount); this ensures completed positions are only required
when a deletionVector is applied.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java-40-93 (1)

40-93: ⚠️ Potential issue | 🟠 Major

Delegate the full ConnectorPageSource contract.

This wrapper does not override isBlocked(), getCompletedPositions(), and getMetrics(), which means it returns defaults that hide the underlying source's behavior: NOT_BLOCKED, OptionalLong.empty(), and Metrics.EMPTY respectively. If the wrapped source implements custom blocking or metrics, the engine will not see them and may busy-poll instead of respecting asynchronous blocking.

Delegation additions
+import io.trino.spi.metrics.Metrics;
+
 import java.io.IOException;
 import java.util.Map;
+import java.util.OptionalLong;
+import java.util.concurrent.CompletableFuture;
@@
     public long getCompletedBytes()
     {
         return pageSource.getCompletedBytes();
     }
+
+    `@Override`
+    public OptionalLong getCompletedPositions()
+    {
+        return pageSource.getCompletedPositions();
+    }
@@
     public void close()
             throws IOException
     {
         pageSource.close();
     }
+
+    `@Override`
+    public CompletableFuture<?> isBlocked()
+    {
+        return pageSource.isBlocked();
+    }
+
+    `@Override`
+    public Metrics getMetrics()
+    {
+        return pageSource.getMetrics();
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java`
around lines 40 - 93, PaimonMergePageSourceWrapper currently fails to delegate
the full ConnectorPageSource contract: implement and delegate isBlocked(),
getCompletedPositions(), and getMetrics() to the wrapped pageSource (same
pattern as getCompletedBytes/getReadTimeNanos/isFinished/getMemoryUsage/close)
so the engine sees the underlying source's blocking, completed positions and
metrics; update the PaimonMergePageSourceWrapper class to add overrides for
isBlocked(), getCompletedPositions(), and getMetrics() that simply call through
to pageSource.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PositionOutputStreamWrapper.java-42-64 (1)

42-64: ⚠️ Potential issue | 🟠 Major

Advance position only after a successful write.

The current code increments position before the wrapped stream validates and writes. If outputStream.write(...) throws, getPos() reports bytes that were never written.

🐛 Proposed fix
     public void write(int b)
             throws IOException
     {
-        position++;
         outputStream.write(b);
+        position++;
     }
@@
     public void write(byte[] bytes)
             throws IOException
     {
-        position += bytes.length;
         outputStream.write(bytes);
+        position += bytes.length;
     }
@@
     public void write(byte[] bytes, int off, int len)
             throws IOException
     {
-        position += len;
         outputStream.write(bytes, off, len);
+        position += len;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PositionOutputStreamWrapper.java`
around lines 42 - 64, The PositionOutputStreamWrapper increments position before
delegating to outputStream, so getPos() can be wrong if write throws; update
write(int), write(byte[]), and write(byte[], int, int) to perform
outputStream.write(...) first and only increment position after that call
succeeds (e.g., call outputStream.write(b) then position++ in write(int), call
outputStream.write(bytes) then position += bytes.length in write(byte[]), and
call outputStream.write(bytes, off, len) then position += len in write(byte[],
int, int)) so position reflects only successfully written bytes.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitManager.java-65-71 (1)

65-71: ⚠️ Potential issue | 🟠 Major

Guard split-weight calculation when all row counts are zero.

For non-empty split lists where every rowCount() is 0, (double) split.rowCount() / maxRowCount becomes NaN, so the weight clamp does not work.

🐛 Proposed fix
             long maxRowCount = splits.stream().mapToLong(Split::rowCount).max().orElse(0L);
             double minimumSplitWeight = PaimonSessionProperties.getMinimumSplitWeight(session);
             return new PaimonSplitSource(splits.stream()
                     .map(split -> PaimonSplit.fromSplit(
                             split,
-                            Math.min(Math.max((double) split.rowCount() / maxRowCount, minimumSplitWeight), 1.0)))
+                            maxRowCount > 0
+                                    ? Math.min(Math.max((double) split.rowCount() / maxRowCount, minimumSplitWeight), 1.0)
+                                    : minimumSplitWeight))
                     .collect(Collectors.toList()));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitManager.java`
around lines 65 - 71, When all splits have rowCount() == 0, maxRowCount becomes
0 and (double) split.rowCount() / maxRowCount yields NaN, so guard the division:
in PaimonSplitManager compute maxRowCount and if maxRowCount == 0 use a default
ratio of 0.0 for all splits (or otherwise treat ratio as minimumSplitWeight)
before applying Math.min/Math.max; update the mapping passed to
PaimonSplit.fromSplit so it uses ratio = (maxRowCount == 0 ? 0.0 : (double)
split.rowCount() / maxRowCount) and then clamp it with minimumSplitWeight and
1.0, and return the PaimonSplitSource as before.
plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java-221-239 (1)

221-239: ⚠️ Potential issue | 🟠 Major

Return when ignoreIfExists is true.

When the Hive table already exists and ignoreIfExists is true, this currently falls through to hiveMetastore.createTable(...), which can fail on the duplicate table.

🐛 Proposed fix
         if (hiveMetastore.getTable(identifier.getDatabaseName(), identifier.getTableName()).isPresent()) {
-            if (!ignoreIfExists) {
-                throw new TableAlreadyExistException(identifier);
+            if (ignoreIfExists) {
+                return;
             }
+            throw new TableAlreadyExistException(identifier);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java`
around lines 221 - 239, The code checks hiveMetastore.getTable(...) and throws
TableAlreadyExistException when ignoreIfExists is false, but still proceeds to
call hiveMetastore.createTable(...) and baseCatalog.createTable(...); change the
flow so that if the table exists and ignoreIfExists is true you return
immediately (skip both hiveMetastore.createTable and baseCatalog.createTable),
i.e., add an early return right after the existence check when ignoreIfExists is
true; adjust the block around hiveMetastore.getTable(...),
TableAlreadyExistException, and baseCatalog.createTable(...) accordingly to
avoid attempting to create a duplicate table.
plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java-205-210 (1)

205-210: ⚠️ Potential issue | 🟠 Major

Honor ignoreIfNotExists before dropping from Hive.

hiveMetastore.dropTable(...) runs unconditionally, so dropTable(..., true) can still fail when the Hive table is absent and leave base-catalog cleanup unattempted.

🐛 Proposed fix
     public void dropTable(Identifier identifier, boolean ignoreIfNotExists)
             throws TableNotExistException
     {
+        if (hiveMetastore.getTable(identifier.getDatabaseName(), identifier.getTableName()).isEmpty()) {
+            if (ignoreIfNotExists) {
+                return;
+            }
+            throw new TableNotExistException(identifier);
+        }
         hiveMetastore.dropTable(identifier.getDatabaseName(), identifier.getTableName(), false);
         baseCatalog.dropTable(identifier, ignoreIfNotExists);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java`
around lines 205 - 210, In TrinoHiveCatalog.dropTable(Identifier identifier,
boolean ignoreIfNotExists) the call to hiveMetastore.dropTable(...) runs
unconditionally and can throw when ignoreIfNotExists is true; change the logic
to first check ignoreIfNotExists and only call hiveMetastore.dropTable if the
Hive table exists or when ignoreIfNotExists is false (e.g., guard or catch the
specific not-found exception), then always invoke
baseCatalog.dropTable(identifier, ignoreIfNotExists); ensure you reference the
existing hiveMetastore.dropTable and baseCatalog.dropTable calls and preserve
the method signature and behavior for the non-ignore case.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java-93-101 (1)

93-101: ⚠️ Potential issue | 🟠 Major

Clamp bounded Paimon varchar lengths before creating Trino varchar.

The code can fail for Paimon bounded VarCharType lengths that exceed Trino's VarcharType.MAX_LENGTH (Integer.MAX_VALUE - 1) but don't equal Paimon's MAX_LENGTH (Integer.MAX_VALUE). Trino's createVarcharType() validates and rejects such lengths. This mirrors the asymmetry in the CharType visitor (line 90), which already applies clamping.

Proposed fix
             if (varCharType.getLength() == VarCharType.MAX_LENGTH) {
                 return VarcharType.createUnboundedVarcharType();
             }
             return VarcharType.createVarcharType(
-                    varCharType.getLength());
+                    Math.min(VarcharType.MAX_LENGTH, varCharType.getLength()));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java`
around lines 93 - 101, The VarChar visitor in PaimonTypeUtils should clamp
bounded Paimon lengths to Trino's max before calling
VarcharType.createVarcharType to avoid rejection for values >
VarcharType.MAX_LENGTH; update visit(VarCharType) to mirror the CharType
handling by computing int len = Math.min(varCharType.getLength(),
VarcharType.MAX_LENGTH) when varCharType.getLength() != VarCharType.MAX_LENGTH,
then call VarcharType.createVarcharType(len) (use VarCharType.MAX_LENGTH and
VarcharType.MAX_LENGTH constants and the visit(VarCharType) method and
VarcharType.createVarcharType reference to locate the change).
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonColumnHandle.java-46-50 (1)

46-50: ⚠️ Potential issue | 🟠 Major

Replace @JsonProperty with @JsonIgnore on isRowId() method.

isRowId() is derived from columnName and should not be serialized as a JSON property. This follows the established pattern in Trino where derived helper methods on ColumnHandle implementations (Iceberg, Delta Lake, MongoDB, OpenSearch, etc.) use @JsonIgnore. Marking it with @JsonProperty would add a non-reconstructible rowId field to the serialized JSON, breaking deserialization round-trips.

Suggested fix
-    `@JsonProperty`
+    `@JsonIgnore`
     public boolean isRowId()
     {
         return TRINO_ROW_ID_NAME.equals(columnName);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonColumnHandle.java`
around lines 46 - 50, The isRowId() accessor on PaimonColumnHandle is a derived
helper based on columnName and must not be serialized; replace the `@JsonProperty`
annotation on the isRowId() method with `@JsonIgnore` so the derived rowId is
excluded from JSON and does not break deserialization round-trips for
PaimonColumnHandle.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java-270-345 (1)

270-345: ⚠️ Potential issue | 🟠 Major

Move row field ID generation to be scoped per row type conversion.

The singleton TrinoToPaimonTypeVisitor.INSTANCE maintains a shared AtomicInteger currentIndex that is never reset. When toPaimonType() is called multiple times (e.g., in PaimonTestingUtils.prepareSchema() at line 72 for each table column, or for nested row types), field IDs for subsequent row conversions begin from the previous counter value instead of 0. Additionally, field.getName().get() at line 342 throws NoSuchElementException if a Trino row field has no name (anonymous rows).

Proposed fix
         case io.trino.spi.type.RowType rowType -> {
+            AtomicInteger currentIndex = new AtomicInteger();
             List<DataField> dataFields = rowType.getFields().stream()
                     .map(field -> new DataField(
                             currentIndex.getAndIncrement(),
-                            field.getName().get(),
+                            field.getName().orElseThrow(() -> new UnsupportedOperationException("Anonymous row fields are not supported")),
                             visit(field.getType())))
                     .collect(Collectors.toList());
             return new RowType(true, dataFields);
         }

Remove the instance field private final AtomicInteger currentIndex = new AtomicInteger(0); at line 270.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java`
around lines 270 - 345, The TrinoToPaimonTypeVisitor.INSTANCE uses a shared
field currentIndex which is never reset so row field IDs keep incrementing
across separate toPaimonType/visit calls and nested rows; also
field.getName().get() can throw for anonymous Trino row fields. Fix by removing
the instance-level AtomicInteger currentIndex and make field-id generation local
per-row conversion inside the visit(Type) branch handling
io.trino.spi.type.RowType: create a fresh counter or use the stream index to
assign ids starting at 0 for that row, and replace field.getName().get() with a
safe retrieval (e.g., field.getName().orElse("") or a deterministic generated
name) before constructing DataField; update references in
TrinoToPaimonTypeVisitor.visit/rowType handling accordingly so nested rows get
their own id scope.
plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java-112-112 (1)

112-112: ⚠️ Potential issue | 🟠 Major

Guard the optional data loader before initialization.

builder() allows building without setDataLoader(...), but build() unconditionally dereferences dataLoader.

🐛 Proposed fix
-                dataLoader.initializeTables(session, queryRunner, SCHEMA_NAME);
+                if (dataLoader != null) {
+                    dataLoader.initializeTables(session, queryRunner, SCHEMA_NAME);
+                }
                 return queryRunner;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java`
at line 112, The build() path unconditionally dereferences dataLoader by calling
dataLoader.initializeTables(...), but builder() allows omitting
setDataLoader(...); update PaimonQueryRunner.build()/constructor to null-check
or guard the optional dataLoader before calling initializeTables: check that
dataLoader != null (or use Optional) and only call
dataLoader.initializeTables(session, queryRunner, SCHEMA_NAME) when present, and
ensure any related fields (dataLoader) are stored/typed to reflect optionality
so callers won't NPE.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java-91-105 (1)

91-105: ⚠️ Potential issue | 🟠 Major

Close every queued source even when one close fails.

The current loop exits on the first IOException, leaving later page sources open. Also clear current/pageSourceQueue so repeated close() calls do not double-close the same sources.

🧹 Proposed fix
     `@Override`
-    public void close()
+    public void close()
+            throws IOException
     {
-        try {
-            if (current != null) {
+        IOException failure = null;
+        if (current != null) {
+            try {
                 current.close();
             }
-            for (ConnectorPageSource source : pageSourceQueue) {
+            catch (IOException e) {
+                failure = e;
+            }
+            finally {
+                current = null;
+            }
+        }
+        ConnectorPageSource source;
+        while ((source = pageSourceQueue.poll()) != null) {
+            try {
                 source.close();
             }
-        }
-        catch (IOException e) {
-            throw new UncheckedIOException(e);
+            catch (IOException e) {
+                if (failure == null) {
+                    failure = e;
+                }
+                else {
+                    failure.addSuppressed(e);
+                }
+            }
+        }
+        if (failure != null) {
+            throw failure;
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java`
around lines 91 - 105, The close() method currently stops on the first
IOException and leaves later ConnectorPageSource instances open; update
DirectPaimonPageSource.close() to attempt closing every source (first close
current, then iterate pageSourceQueue) while catching IOExceptions per-source,
collect the first exception (or add subsequent ones as suppressed), and after
all closes clear current and pageSourceQueue to avoid double-closing on repeated
calls; finally, if any IOException occurred, rethrow it as an
UncheckedIOException with suppressed exceptions preserved.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java-128-137 (1)

128-137: ⚠️ Potential issue | 🟠 Major

Normalize the lookup key for mixed-case column names.

lowerCaseFieldNames is searched with the original field, so a column like aCa can fail lookup even though it exists.

🐛 Proposed fix
+import static java.util.Locale.ENGLISH;
+
     public PaimonColumnHandle columnHandle(Table paimonTable, String field)
     {
         List<String> lowerCaseFieldNames = fieldNames(paimonTable.rowType());
         List<String> originFieldNames = paimonTable.rowType().getFieldNames();
-        int index = lowerCaseFieldNames.indexOf(field);
+        int index = lowerCaseFieldNames.indexOf(field.toLowerCase(ENGLISH));
         if (index == -1) {
             throw new RuntimeException(format("Cannot find field %s in schema %s", field, lowerCaseFieldNames));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java`
around lines 128 - 137, In PaimonMetadata.columnHandle, the lookup uses
lowerCaseFieldNames but searches with the original field string which breaks
mixed-case names; change the search to use a normalized key (e.g.,
field.toLowerCase(Locale.ROOT)) when computing index =
lowerCaseFieldNames.indexOf(...), ensuring you import/use Locale.ROOT or
equivalent for stable lowercasing; keep the rest of the method
(originFieldNames, DataField retrieval, and PaimonColumnHandle.of(...))
unchanged.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java-61-69 (1)

61-69: ⚠️ Potential issue | 🟠 Major

Do not advance to the next source on a transient null page.

getNextSourcePage() can return null before isFinished() is true. Closing and advancing here can drop data from page sources that are temporarily unable to produce a page.

🐛 Proposed fix
     public SourcePage getNextSourcePage()
     {
         while (current != null) {
             SourcePage dataPage = current.getNextSourcePage();
             if (dataPage != null) {
                 return dataPage;
             }
+            if (!current.isFinished()) {
+                return null;
+            }
             advance();
         }
         return null;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java`
around lines 61 - 69, The code in DirectPaimonPageSource.getNextSourcePage()
currently advances to the next source whenever current.getNextSourcePage()
returns null, which can drop pages from sources that are temporarily unable to
produce a page; change the logic so that when current.getNextSourcePage()
returns null you only call advance() if the current source reports it is
finished (e.g., current.isFinished()); otherwise return null to indicate a
transient lack of data. Update getNextSourcePage() to check
current.getNextSourcePage() -> if non-null return it; if null then if
current.isFinished() call advance() and continue the loop, else return null.
Ensure you reference the methods current.getNextSourcePage(),
current.isFinished(), advance(), and the class DirectPaimonPageSource in your
change.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java-189-204 (1)

189-204: ⚠️ Potential issue | 🟠 Major

Do not swallow all metadata runtime failures.

Catching every RuntimeException makes real metadata failures look like missing columns/tables. Only tolerate table-not-found races and propagate other failures.

🐛 Proposed fix
         return tableNames.stream()
                 .map(table -> {
                     List<ColumnMetadata> columnMetadata;
                     try {
-                        getTableHandle(session, table, Optional.empty(), Optional.empty());
+                        ConnectorTableHandle tableHandle = getTableHandle(session, table, Optional.empty(), Optional.empty());
+                        if (tableHandle == null) {
+                            return Pair.of(table, (List<ColumnMetadata>) null);
+                        }
                         Table paimonTable = table(session, table.getSchemaName(), table.getTableName());
                         columnMetadata = columnMetadatas(paimonTable);
                     }
-                    catch (RuntimeException e) {
-                        // Error when getting column metadata, return null
+                    catch (TableNotFoundException e) {
                         return Pair.of(table, (List<ColumnMetadata>) null);
                     }
 
                     return Pair.of(table, columnMetadata);
                 }).filter(p -> p.getRight() != null)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java`
around lines 189 - 204, The current stream silently swallows all
RuntimeException in the mapping (inside PaimonMetadata where getTableHandle,
table and columnMetadatas are called), hiding real metadata errors; change the
catch to only tolerate the table-not-found case and rethrow all other failures —
e.g., catch RuntimeException e, detect table-missing (replace with the actual
exception type your codebase uses such as TableNotFoundException or check error
code/message) and return Pair.of(table, null) only for that case, otherwise
rethrow the exception so getTableHandle/table/columnMetadatas errors propagate.
plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java-104-104 (1)

104-104: ⚠️ Potential issue | 🟠 Major

Pass the query runner's base data directory to the testing plugin, not the filesystem root.

Path.resolve("/") with an absolute path returns the filesystem root instead of a child directory. The TestingPaimonPlugin expects localFileSystemRootPath to be the testing context's root path, which should be the query runner's base data directory—not /.

-                queryRunner.installPlugin(new TestingPaimonPlugin(queryRunner.getCoordinator().getBaseDataDir().resolve("/")));
+                queryRunner.installPlugin(new TestingPaimonPlugin(queryRunner.getCoordinator().getBaseDataDir()));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java`
at line 104, The code is passing the filesystem root by calling
queryRunner.getCoordinator().getBaseDataDir().resolve("/") to
TestingPaimonPlugin; instead, pass the query runner's base data directory itself
so the plugin receives the test context root. Update the call in
PaimonQueryRunner where queryRunner.installPlugin(...) creates the
TestingPaimonPlugin to use queryRunner.getCoordinator().getBaseDataDir() (remove
the .resolve("/")) so TestingPaimonPlugin receives the correct
localFileSystemRootPath.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSessionProperties.java-49-54 (1)

49-54: ⚠️ Potential issue | 🟠 Major

Add validation to minimum_split_weight session property to reject invalid proportions.

This session property currently accepts any Double value without constraints. Invalid values like 0, negative numbers, NaN, or Infinity will propagate to split weight creation and cause IllegalArgumentException at split generation time. The property should be constrained to the valid range (0, 1] using a decoder in the session property definition, consistent with how other Trino connectors handle split weight parameters (e.g., Hive's minimum_assigned_split_weight).

The clamping logic in PaimonSplitManager.getSplits() (line 70) only partially mitigates this by enforcing the upper bound; it does not prevent invalid values from being used when the session property itself is invalid.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSessionProperties.java`
around lines 49 - 54, The MINIMUM_SPLIT_WEIGHT session property in
PaimonSessionProperties allows any Double and must be constrained to (0, 1]; add
a decoder/validator when building the PropertyMetadata for MINIMUM_SPLIT_WEIGHT
in PaimonSessionProperties that rejects NaN, infinite, <=0 and >1 (accepting
values >0 up to and including 1) and throws a clear exception for invalid inputs
so invalid proportions cannot be stored; this mirrors the validation approach
used for similar properties and prevents bad values from reaching
PaimonSplitManager.getSplits().
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSource.java-250-286 (1)

250-286: ⚠️ Potential issue | 🟠 Major

Add handling for high-precision timestamp-with-time-zone conversions.

The LongTimestampWithTimeZone branch at line 285 checks type.equals(TIMESTAMP_TZ_MILLIS), but TIMESTAMP_TZ_MILLIS is a short timestamp (long java type). High-precision timestamp-with-time-zone types (TIMESTAMP_TZ_MICROS, TIMESTAMP_TZ_NANOS, TIMESTAMP_TZ_PICOS) are mapped as LongTimestampWithTimeZoneType objects by PaimonTypeUtils for precisions > 3, so they will fail the assertion. These types require explicit handling to convert the epoch millis and fractional seconds.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSource.java`
around lines 250 - 286, The LongTimestampWithTimeZone branch only allows
TIMESTAMP_TZ_MILLIS but must handle high‑precision TZ types
(TIMESTAMP_TZ_MICROS, TIMESTAMP_TZ_NANOS, TIMESTAMP_TZ_PICOS /
LongTimestampWithTimeZoneType) returned by PaimonTypeUtils; change the guard to
detect LongTimestampWithTimeZoneType (or check for type instance/one of
TIMESTAMP_TZ_* constants) and convert the Paimon Timestamp into the Trino
long-timestamp-with-time-zone representation by calling
fromEpochMillisAndFraction(timestamp.getMillisecond(), fractionalPart, UTC_KEY)
where fractionalPart is computed from the Timestamp fractional component at the
appropriate precision (micros→micro fraction, nanos→nano fraction, etc.), then
call type.writeObject(output, <that value>) instead of asserting only
TIMESTAMP_TZ_MILLIS.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java-202-231 (1)

202-231: ⚠️ Potential issue | 🟠 Major

Fragile error handling: matching on IOException message substrings.

Classifying rename failures by searching for the strings "Target location already exists" and "rename failed" inside e.getMessage() is brittle — messages differ across TrinoFileSystem implementations (S3, GCS, ABFS, local) and can change between Trino versions with no API-level warning. A legitimate rename failure today will be converted into a silent return false.

Prefer checking preconditions explicitly before attempting the rename (e.g. fileSystem.newInputFile(target).exists() for the "already exists" case), or catch the specific filesystem exception type if one is available.

Also on Line 220: e.getMessage() can be nullNullPointerException. If you keep the string-matching, guard with Objects.toString(e.getMessage(), "").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java`
around lines 202 - 231, The rename() method uses fragile string-matching on
e.getMessage() and can NPE; instead, proactively check target existence before
attempting a rename (e.g. use fileSystem.newInputFile(targetLocation).exists()
and/or fileSystem.directoryExists(targetLocation).orElse(false)) and return
false if the target already exists, then perform the rename using the existing
logic (renameDirectory / renameFileAnyway) inside the RENAME_LOCK for local
URIs; if you must keep exception handling, avoid matching on nullable messages
by wrapping e.getMessage() with Objects.toString(..., "") and prefer catching a
more specific filesystem exception type if available rather than IOException.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java-286-391 (1)

286-391: ⚠️ Potential issue | 🟠 Major

createPageSource swallows causes and loses recoverability signals.

Two concerns in the try/catch structure:

  1. Lines 367–369 catch Exception and rethrow as a bare RuntimeException(e) — any TrinoException (e.g. NOT_SUPPORTED, authorization) thrown while building per-file page sources is demoted to a generic 500, losing the error code. Prefer throw e; for RuntimeException subtypes, and only wrap the checked ones (IOExceptionUncheckedIOException).

  2. The outer catch on Line 388 does the same for the entire method body, so paimonSplit.decodeSplit() or ReadBuilder failures also lose their original types.

♻️ Proposed refactor (pattern)
-                catch (Exception e) {
-                    throw new RuntimeException(e);
-                }
+                catch (IOException e) {
+                    throw new UncheckedIOException(e);
+                }
+                catch (RuntimeException e) {
+                    throw e;
+                }
+                catch (Exception e) {
+                    throw new RuntimeException(e);
+                }

Apply the same pattern to the outer catch (Exception e) at Line 388.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java`
around lines 286 - 391, The createPageSource method currently catches Exception
in two places and rethrows new RuntimeException(e), which strips original
RuntimeException/TrinoException types and error codes; update both catch blocks
(the inner catch around per-file processing and the outer catch around the whole
method) to: if (e instanceof RuntimeException || e instanceof TrinoException)
throw (RuntimeException) e; else if (e instanceof IOException) throw new
UncheckedIOException((IOException) e); else throw new RuntimeException(e);
ensure you keep the same method symbol names (createPageSource) and preserve the
existing logic around FileIndexPredicate,
PaimonPageSourceWrapper/DirectPaimonPageSource and ReadBuilder while replacing
the two Exception handlers with this pattern.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java-107-111 (1)

107-111: ⚠️ Potential issue | 🟠 Major

Avoid the direct ByteArrayBlock cast.

The underlying block may be a dictionary/RLE/lazy wrapper, in which case the cast throws ClassCastException. Use the same readNativeValue idiom as the other getters (with TINYINT) so it works regardless of block implementation.

🛡️ Proposed fix
     `@Override`
     public byte getByte(int i)
     {
-        return ((ByteArrayBlock) singlePage.getBlock(i)).getByte(0);
+        long value = (long) readNativeValue(TINYINT, singlePage.getBlock(i), 0);
+        checkArgument(value >= Byte.MIN_VALUE && value <= Byte.MAX_VALUE, "Value out of range for byte: %s", value);
+        return (byte) value;
     }

(and add import static io.trino.spi.type.TinyintType.TINYINT;)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 107 - 111, In PaimonRow.getByte replace the unsafe cast to
ByteArrayBlock by calling readNativeValue on the block obtained from
singlePage.getBlock(i) and converting to a byte using the TINYINT type (use
readNativeValue(singlePage.getBlock(i), TINYINT) and extract the byte), so the
code works with dictionary/RLE/lazy wrappers; also add the static import for
io.trino.spi.type.TinyintType.TINYINT and remove the direct ByteArrayBlock cast.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java-171-191 (1)

171-191: ⚠️ Potential issue | 🟠 Major

Non-recursive delete doesn't detect subdirectories and still deletes them.

listFiles(location).hasNext() only enumerates files, not sub-directories. A directory that contains only subdirectories returns false, and deleteDirectory then recursively removes everything. This violates the documented POSIX-ish delete(path, recursive=false) contract Paimon relies on and can cause silent data loss if callers use the non-recursive form as a safety check.

Include listDirectories in the emptiness test (or delete only the target dir via a non-recursive primitive).

🐛 Proposed fix
         if (fileSystem.directoryExists(location).orElse(false)) {
             if (!recursive) {
-                if (fileSystem.listFiles(location).hasNext()) {
-                    throw new IOException("Directory " + location + " is not empty");
-                }
+                if (fileSystem.listFiles(location).hasNext()
+                        || !fileSystem.listDirectories(location).isEmpty()) {
+                    throw new IOException("Directory " + location + " is not empty");
+                }
             }
             fileSystem.deleteDirectory(location);
             return true;
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java`
around lines 171 - 191, The non-recursive delete in delete(Path, boolean)
currently checks only fileSystem.listFiles(location).hasNext(), which misses
subdirectories and allows fileSystem.deleteDirectory(location) to remove
contents recursively; update the emptiness check to also inspect
fileSystem.listDirectories(location).hasNext() (or otherwise use a non-recursive
delete primitive) so that when recursive==false you throw the IOException if
either files or subdirectories exist before calling
fileSystem.deleteDirectory(location); keep existFile(...) and
fileSystem.deleteFile(...) behavior unchanged.
plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHivePaimonCatalog.java-102-116 (1)

102-116: ⚠️ Potential issue | 🟠 Major

listTables doesn't filter for Paimon-backed tables.

Both branches return every Hive table in the namespace regardless of whether it's actually a Paimon table. Users will see non-Paimon tables in SHOW TABLES, and subsequent metadata/load operations on them will blow up (currently via the "non-Paimon" TableNotFoundException in loadTableSchema). Filter with isPaimonTable(...) as other Trino connectors (Iceberg, Delta) do with their format predicates.

Additionally, the "no namespace" branch walks every database and every table sequentially via the metastore — for deployments with many databases this can be extremely slow. Prefer the metastore's bulk listing when available, or parallelize.

🐛 Proposed fix sketch
-        if (namespace.isPresent()) {
-            String databaseName = namespace.get();
-            if (!databaseExists(session, databaseName)) {
-                throw new SchemaNotFoundException("Database does not exist: " + databaseName);
-            }
-            return metastore.getTables(databaseName).stream().map(t -> t.tableName().getTableName()).collect(Collectors.toList());
-        }
-
-        return metastore.getAllDatabases().stream().flatMap(name -> metastore.getTables(name).stream())
-                .map(t -> t.tableName().getTableName())
-                .collect(Collectors.toList());
+        List<String> databases = namespace
+                .map(List::of)
+                .orElseGet(metastore::getAllDatabases);
+        return databases.stream()
+                .flatMap(db -> metastore.getTables(db).stream())
+                .map(t -> metastore.getTable(t.tableName().getDatabaseName(), t.tableName().getTableName()))
+                .flatMap(Optional::stream)
+                .filter(TrinoHivePaimonCatalog::isPaimonTable)
+                .map(t -> t.getTableName())
+                .collect(toImmutableList());

(Also replace the existence check throwing a message with new SchemaNotFoundException(databaseName) — see separate comment.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHivePaimonCatalog.java`
around lines 102 - 116, listTables currently returns every Hive table; change it
to only include Paimon-backed tables by applying isPaimonTable(...) to each
metastore table result (e.g., wrap metastore.getTables(databaseName).stream()
and the flatMap branch to filter t -> isPaimonTable(t.tableName()) before
mapping to tableName()); for the no-namespace branch avoid sequential
per-database roundtrips by using metastore bulk listing APIs if available or by
parallelizing the per-database metastore.getTables calls (e.g., parallelStream)
to reduce latency; also replace the thrown SchemaNotFoundException message in
the namespace-exists check with new SchemaNotFoundException(databaseName) as
suggested.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: afb02a8e-59ff-4d29-8c11-dbebc30e4233

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc99a8 and 123060d.

⛔ Files ignored due to path filters (1)
  • docs/src/main/sphinx/static/img/paimon.png is excluded by !**/*.png
📒 Files selected for processing (66)
  • core/trino-server/src/main/provisio/trino.xml
  • docs/release-template.md
  • docs/src/main/sphinx/connector.md
  • docs/src/main/sphinx/connector/paimon.md
  • docs/src/main/sphinx/sql/drop-catalog.md
  • plugin/trino-paimon/pom.xml
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/CatalogType.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/EncodingUtils.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonColumnHandle.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConfig.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConnector.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConnectorFactory.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonErrorCode.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadataFactory.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonModule.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonOrcDataSource.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSource.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceWrapper.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPlugin.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSessionProperties.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplit.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitManager.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitSource.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableHandle.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableOptionUtils.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableOptions.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTransactionManager.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/AbstractPaimonTrinoCatalog.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/PaimonCatalogModule.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/TrinoCatalog.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/TrinoCatalogFactory.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/file/PaimonFileSystemCatalogModule.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/file/TrinoFileSystemCatalogFactory.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/file/TrinoFileSystemPaimonCatalog.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/PaimonHiveCatalogModule.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHiveCatalogFactory.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHivePaimonCatalog.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonDirectoryFileStatus.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileStatus.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonInputStreamWrapper.java
  • plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PositionOutputStreamWrapper.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonTestUtils.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/SimpleTableTestHelper.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonConfig.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonConnectorTest.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonITCase.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonMinioConnectorSmokeTest.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonPlugin.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonRow.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonSplit.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonTrinoType.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestingPaimonConnectorFactory.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestingPaimonPlugin.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTablesInitializer.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTestingUtils.java
  • plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java
  • pom.xml

Comment on lines +156 to +158
```sql
USE use paimon.tests;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix duplicated USE keyword in SQL example.

The SQL statement has "USE use" with the keyword duplicated, which is a syntax error.

🐛 Proposed fix
 ```sql
-USE use paimon.tests;
+USE paimon.tests;
 
 SELECT * FROM nation LIMIT 1;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```sql
USE use paimon.tests;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/main/sphinx/connector/paimon.md` around lines 156 - 158, Remove the
duplicated "USE" keyword in the SQL example: find the SQL snippet containing
"USE use paimon.tests;" and change it to "USE paimon.tests;" so the example is
valid (the SELECT statement that follows should remain unchanged).

Comment on lines +297 to +302
if (type.equals(TIMESTAMP_TZ_MILLIS)) {
if (trinoNativeValue instanceof Long) {
return trinoNativeValue;
}
return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Trino spi short TIMESTAMP WITH TIME ZONE packed long representation DateTimeEncoding

💡 Result:

In Trino's SPI (trino-spi), TIMESTAMP WITH TIME ZONE values are stored in blocks as packed 64-bit longs using DateTimeEncoding utilities. The timestamp portion represents milliseconds since the Unix epoch, shifted left to make room for timezone information packed in the lower bits (approximately 12 bits for timezone offset, limiting the effective timestamp range to about 2^52 ms or ~285,000 years around the epoch). For higher precision (p>3), LongTimestampWithTimeZone is used instead of short representations. ShortTimestampType (for p<=6 without TZ) uses microseconds from epoch as long. The exact bit layout is defined in io.trino.spi.type.DateTimeEncoding.packDateTimeWithZone(millis, timeZoneKey), which throws on millis overflow due to shifting.

Citations:


🏁 Script executed:

# Locate the file mentioned in the review
find . -name "PaimonFilterConverter.java" -type f

Repository: trinodb/trino

Length of output: 142


🏁 Script executed:

# Once found, examine the specific lines 297-302
fd -t f "PaimonFilterConverter.java" --exec cat -n {} \; | head -320 | tail -30

Repository: trinodb/trino

Length of output: 1415


🏁 Script executed:

# Search for TIMESTAMP_TZ_MILLIS definition
rg "TIMESTAMP_TZ_MILLIS" -B 2 -A 2

Repository: trinodb/trino

Length of output: 50370


🏁 Script executed:

# Search for DateTimeEncoding class and unpackMillisUtc method
rg "class DateTimeEncoding|def unpackMillisUtc|unpackMillisUtc" --type java -B 2 -A 5

Repository: trinodb/trino

Length of output: 50370


Unpack TIMESTAMP_TZ_MILLIS packed long before using as epoch milliseconds.

Trino's short TIMESTAMP WITH TIME ZONE (precision ≤ 3) is stored as a packed 64-bit long via DateTimeEncoding.packDateTimeWithZone(millisUtc, zoneKey), where high bits hold millis-since-epoch and low bits hold the timezone key. Returning the packed value as-is feeds garbage into Paimon's predicate, causing incorrect row filtering.

🐛 Fix
        if (type.equals(TIMESTAMP_TZ_MILLIS)) {
            if (trinoNativeValue instanceof Long packed) {
-               return trinoNativeValue;
+               return Timestamp.fromEpochMillis(DateTimeEncoding.unpackMillisUtc(packed));
            }
            return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis());
        }

(Add import io.trino.spi.type.DateTimeEncoding;.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (type.equals(TIMESTAMP_TZ_MILLIS)) {
if (trinoNativeValue instanceof Long) {
return trinoNativeValue;
}
return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis());
}
if (type.equals(TIMESTAMP_TZ_MILLIS)) {
if (trinoNativeValue instanceof Long packed) {
return Timestamp.fromEpochMillis(DateTimeEncoding.unpackMillisUtc(packed));
}
return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis());
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java`
around lines 297 - 302, The TIMESTAMP_TZ_MILLIS branch in PaimonFilterConverter
is returning Trino's packed long (DateTimeEncoding.packDateTimeWithZone)
directly, feeding a packed value to Paimon; unpack the millis before
constructing the Timestamp: when trinoNativeValue is a Long extract millis =
DateTimeEncoding.unpackMillisUtc((Long) trinoNativeValue) (import
io.trino.spi.type.DateTimeEncoding), and when it's a LongTimestampWithTimeZone
use its getEpochMillis(); then call Timestamp.fromEpochMillis(millis) and return
that value so Paimon receives raw epoch milliseconds.

Comment on lines +49 to +57
@Override
protected Slice readTailInternal(int length)
throws IOException
{
long readStart = System.nanoTime();
Slice tail = input.readTail(length);
stats.readDataBytesPerSecond(tail.length(), System.nanoTime() - readStart);
return input.readTail(length);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: readTailInternal reads the tail twice and returns the unmeasured second read.

input.readTail(length) is invoked twice — the first result is timed and fed into stats, but the second (unmeasured) result is returned. This doubles I/O for every ORC footer/stripe-tail read, and the returned bytes are not the ones reflected in readDataBytesPerSecond. Return the originally read tail.

🐛 Fix
     `@Override`
     protected Slice readTailInternal(int length)
             throws IOException
     {
         long readStart = System.nanoTime();
         Slice tail = input.readTail(length);
         stats.readDataBytesPerSecond(tail.length(), System.nanoTime() - readStart);
-        return input.readTail(length);
+        return tail;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonOrcDataSource.java`
around lines 49 - 57, The method readTailInternal currently calls
input.readTail(length) twice: once to measure and record bytes via
stats.readDataBytesPerSecond using the local variable Slice tail, then it
incorrectly returns a second call to input.readTail(length), causing duplicate
I/O and mismatched metrics. Fix by returning the previously read and measured
Slice tail instead of calling input.readTail(length) again; update the return
statement in readTailInternal to return the local variable tail so the measured
data matches the returned data.

Comment on lines +16 to +17
import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableList;
import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableMap;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Wrong Guava imports — from the GCS-connector shaded package.

import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableList;
import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableMap;

These are repackaged Guava classes shipped inside gcs-connector's shaded jar. Depending on them from the Paimon plugin is almost certainly unintended:

  • It introduces a hard, fragile dependency on the GCS connector's internal shading layout — any upgrade of the GCS connector that renames the shade prefix breaks the plugin.
  • At runtime these classes may not be visible on the plugin classpath → NoClassDefFoundError / ClassNotFoundException.
  • They are not the same ImmutableList/ImmutableMap the rest of Trino uses, so types from this file won't be interoperable with the rest of the codebase despite compiling.

Replace with the standard Guava imports. Elsewhere this file already uses com.google.common.collect.ImmutableList.toImmutableList / ImmutableSet.toImmutableSet static imports, so regular Guava is available.

🐛 Proposed fix
-import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableList;
-import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java`
around lines 16 - 17, The file imports shaded Guava classes from the GCS
connector; replace the two repackaged imports with the standard Guava ones (use
com.google.common.collect.ImmutableList and
com.google.common.collect.ImmutableMap) so PaimonPageSourceProvider and its
usages reference the real Guava types rather than the GCS-connector shaded
variants; remove the com.google.cloud.hadoop.repackaged.gcs... imports and add
the com.google.common.collect imports so types remain compatible with the rest
of the codebase.

Comment on lines +121 to +135
@Override
public int getInt(int i)
{
if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIME_WITHOUT_TIME_ZONE) {
if (cacheTypes[i] == null) {
cacheTypes[i] = TimeType.createTimeType(((org.apache.paimon.types.TimeType) rowType.getTypeAt(i)).getPrecision());
}
SqlTime sqlTime = (SqlTime) cacheTypes[i].getObjectValue(singlePage.getBlock(i), 0);
return (int) (sqlTime.getPicos() / 1_000_000);
}

long value = (long) readNativeValue(INTEGER, singlePage.getBlock(i), 0);
checkArgument(value >= Integer.MIN_VALUE && value <= Integer.MAX_VALUE, "Value out of range for int: %s", value);
return (int) value;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Apache Paimon InternalRow getInt TIME_WITHOUT_TIME_ZONE milliseconds of day

💡 Result:

In Apache Paimon, for a TIME_WITHOUT_TIME_ZONE field (TIME type), InternalRow.getInt(pos) returns the time as milliseconds since midnight (00:00:00.000). This is the standard internal representation for TIME WITHOUT TIME ZONE in Paimon, consistent with common database practices where TIME is stored as an integer representing milliseconds-of-day (0 to 86399999). Paimon's DataGetters interface (implemented by InternalRow) provides typed accessors like getInt for INT-compatible types, and TIME fits this as a compact int storage. Official docs confirm TIME as time without time zone up to nanosecond precision but defaults to int milliseconds for precision 0-2.

Citations:


🏁 Script executed:

#!/bin/bash
# Locate the PaimonRow.java file
find . -name "PaimonRow.java" -type f

Repository: trinodb/trino

Length of output: 130


🏁 Script executed:

#!/bin/bash
# Check the code at lines 121-135
cat -n plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java | sed -n '121,135p'

Repository: trinodb/trino

Length of output: 889


🏁 Script executed:

#!/bin/bash
# Search for SqlTime class and getPicos() method documentation in Trino
rg -A 5 "class SqlTime" --type java

Repository: trinodb/trino

Length of output: 2365


🏁 Script executed:

#!/bin/bash
# Check if toIntExact is used elsewhere in the codebase (for proposed fix)
rg "toIntExact" plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/ --type java

Repository: trinodb/trino

Length of output: 890


🏁 Script executed:

#!/bin/bash
# Get the full SqlTime class to see getPicos() method signature
cat core/trino-spi/src/main/java/io/trino/spi/type/SqlTime.java

Repository: trinodb/trino

Length of output: 4064


🏁 Script executed:

#!/bin/bash
# Search for other TIME conversions in the codebase to understand Trino's practices
rg -A 3 "getPicos\(\)" --type java | head -50

Repository: trinodb/trino

Length of output: 4969


🏁 Script executed:

#!/bin/bash
# Find scalePicosToMillis function to see how it handles rounding
rg "scalePicosToMillis|PICOSECONDS_PER_MILLISECOND" --type java -B 2 -A 5 | head -60

Repository: trinodb/trino

Length of output: 8676


🏁 Script executed:

#!/bin/bash
# Check for rounding patterns with picoseconds conversions
rg "roundDiv|Math\\.ceil|Math\\.floor" --type java | grep -i "picos\|millis" | head -20

Repository: trinodb/trino

Length of output: 3802


🏁 Script executed:

#!/bin/bash
# Get the TimeConversions utility class
cat plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/json/format/util/TimeConversions.java

Repository: trinodb/trino

Length of output: 6361


Fix incorrect TIME-to-milliseconds conversion.

SqlTime.getPicos() returns picoseconds-of-day, but the current code divides by 1_000_000 (converting to microseconds), while Paimon's InternalRow.getInt() for TIME_WITHOUT_TIME_ZONE requires milliseconds-of-day. The divisor must be 1_000_000_000 (not 1_000_000), making the current return value off by a factor of 1000.

Additionally, use scalePicosToMillis() (or implement proper rounding) instead of plain truncation to handle sub-millisecond precision correctly. Trino's standard pattern for picos-to-millis conversion uses rounding via rescaleWithRounding(), which applies (value + 500_000_000) / 1_000_000_000 logic for correct nearest-millisecond rounding.

Suggested fix
-            SqlTime sqlTime = (SqlTime) cacheTypes[i].getObjectValue(singlePage.getBlock(i), 0);
-            return (int) (sqlTime.getPicos() / 1_000_000);
+            SqlTime sqlTime = (SqlTime) cacheTypes[i].getObjectValue(singlePage.getBlock(i), 0);
+            // Paimon TIME values are milliseconds-of-day; SqlTime stores picoseconds-of-day.
+            return toIntExact((sqlTime.getPicos() + 500_000_000L) / 1_000_000_000L);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 121 - 135, The getInt method incorrectly converts
SqlTime.getPicos() (picoseconds-of-day) by dividing by 1_000_000; update the
TIME_WITHOUT_TIME_ZONE branch in PaimonRow.getInt to convert picoseconds to
milliseconds-of-day by dividing by 1_000_000_000 and apply proper rounding (use
the existing rescaleWithRounding/scalePicosToMillis helper if available, or
implement rounding as (picos + 500_000_000) / 1_000_000_000) before casting to
int; adjust references to cacheTypes, TimeType.createTimeType, SqlTime.getPicos,
and the getInt return so the value is milliseconds-of-day with
nearest-millisecond rounding.

Comment on lines +162 to +181
@Override
public Decimal getDecimal(int i, int decimalPrecision, int decimalScale)
{
Object value =
readNativeValue(
DecimalType.createDecimalType(decimalPrecision, decimalScale),
singlePage.getBlock(i),
0);
if (decimalPrecision <= MAX_SHORT_PRECISION) {
return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale);
}
else {
long high = ((Int128) value).getHigh();
long low = ((Int128) value).getLow();
BigInteger bigIntegerValue =
BigInteger.valueOf(high).shiftLeft(64).add(BigInteger.valueOf(low));
BigDecimal bigDecimalValue = new BigDecimal(bigIntegerValue, decimalScale);
return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect Int128BigInteger conversion for long decimals.

BigInteger.valueOf(low) treats low as signed, but the lower 64 bits of Int128 must be interpreted as unsigned. Whenever low has its high bit set, the resulting BigInteger is off by 2^64, producing silently-wrong decimals.

Use Int128.toBigInteger() directly, which handles the sign/two's-complement conversion correctly.

🐛 Proposed fix
     `@Override`
     public Decimal getDecimal(int i, int decimalPrecision, int decimalScale)
     {
         Object value =
                 readNativeValue(
                         DecimalType.createDecimalType(decimalPrecision, decimalScale),
                         singlePage.getBlock(i),
                         0);
         if (decimalPrecision <= MAX_SHORT_PRECISION) {
             return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale);
         }
-        else {
-            long high = ((Int128) value).getHigh();
-            long low = ((Int128) value).getLow();
-            BigInteger bigIntegerValue =
-                    BigInteger.valueOf(high).shiftLeft(64).add(BigInteger.valueOf(low));
-            BigDecimal bigDecimalValue = new BigDecimal(bigIntegerValue, decimalScale);
-            return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
-        }
+        BigDecimal bigDecimalValue = new BigDecimal(((Int128) value).toBigInteger(), decimalScale);
+        return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public Decimal getDecimal(int i, int decimalPrecision, int decimalScale)
{
Object value =
readNativeValue(
DecimalType.createDecimalType(decimalPrecision, decimalScale),
singlePage.getBlock(i),
0);
if (decimalPrecision <= MAX_SHORT_PRECISION) {
return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale);
}
else {
long high = ((Int128) value).getHigh();
long low = ((Int128) value).getLow();
BigInteger bigIntegerValue =
BigInteger.valueOf(high).shiftLeft(64).add(BigInteger.valueOf(low));
BigDecimal bigDecimalValue = new BigDecimal(bigIntegerValue, decimalScale);
return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
}
}
`@Override`
public Decimal getDecimal(int i, int decimalPrecision, int decimalScale)
{
Object value =
readNativeValue(
DecimalType.createDecimalType(decimalPrecision, decimalScale),
singlePage.getBlock(i),
0);
if (decimalPrecision <= MAX_SHORT_PRECISION) {
return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale);
}
BigDecimal bigDecimalValue = new BigDecimal(((Int128) value).toBigInteger(), decimalScale);
return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 162 - 181, In PaimonRow.getDecimal the manual Int128 → BigInteger
conversion treats the lower 64 bits as signed causing wrong values; replace the
high/low construction with a proper two's-complement conversion by calling
Int128.toBigInteger() on the (Int128) value, then construct the BigDecimal from
that BigInteger with the decimalScale and pass it to
Decimal.fromBigDecimal(decimalPrecision, decimalScale) as before (update the
branch handling large decimals in getDecimal to use Int128.toBigInteger()).

Comment on lines +183 to +194
@Override
public Timestamp getTimestamp(int i, int timestampPrecision)
{
if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
SqlTimestampWithTimeZone sqlTimestampWithTimeZone = (SqlTimestampWithTimeZone) TimestampWithTimeZoneType.createTimestampWithTimeZoneType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0);
return Timestamp.fromEpochMillis(sqlTimestampWithTimeZone.getEpochMillis(), sqlTimestampWithTimeZone.getPicosOfMilli() * 1000);
}
else {
SqlTimestamp sqlTimestamp = (SqlTimestamp) TimestampType.createTimestampType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0);
return Timestamp.fromLocalDateTime(sqlTimestamp.toLocalDateTime());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Apache Paimon Timestamp.fromEpochMillis signature nanosOfMillisecond

💡 Result:

In Apache Paimon, the org.apache.paimon.data.Timestamp class has two fromEpochMillis factory methods: 1. public static Timestamp fromEpochMillis(long milliseconds) - Creates a Timestamp from milliseconds since 1970-01-01 00:00:00, setting nanos-of-millisecond to zero. 2. public static Timestamp fromEpochMillis(long milliseconds, int nanosOfMillisecond) - Creates a Timestamp from milliseconds since 1970-01-01 00:00:00 and nanoseconds within that millisecond (0 to 999,999).

Citations:


🏁 Script executed:

# Search for SqlTimestampWithTimeZone in the codebase to understand getPicosOfMilli()
find . -name "*.java" -type f | xargs grep -l "SqlTimestampWithTimeZone" | head -20

Repository: trinodb/trino

Length of output: 1708


🏁 Script executed:

# Look at the actual PaimonRow.java file to confirm the implementation
find . -name "PaimonRow.java" -type f

Repository: trinodb/trino

Length of output: 130


🏁 Script executed:

# Read SqlTimestampWithTimeZone to understand getPicosOfMilli()
cat -n ./core/trino-spi/src/main/java/io/trino/spi/type/SqlTimestampWithTimeZone.java

Repository: trinodb/trino

Length of output: 7498


🏁 Script executed:

# Read PaimonRow.java around lines 183-194
sed -n '175,200p' ./plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java

Repository: trinodb/trino

Length of output: 1414


Fix unit conversion from picoseconds to nanoseconds in getTimestamp for TIMESTAMP WITH LOCAL TIME ZONE.

SqlTimestampWithTimeZone.getPicosOfMilli() returns picoseconds-of-millisecond (0..999,999,999), but Timestamp.fromEpochMillis(long, int) expects nanoseconds-of-millisecond (0..999,999). The current * 1000 multiplies the picosecond value out of range (up to ~10^12); it should be / 1000 to convert picos→nanos.

Proposed fix
-            return Timestamp.fromEpochMillis(sqlTimestampWithTimeZone.getEpochMillis(), sqlTimestampWithTimeZone.getPicosOfMilli() * 1000);
+            return Timestamp.fromEpochMillis(
+                    sqlTimestampWithTimeZone.getEpochMillis(),
+                    (int) (sqlTimestampWithTimeZone.getPicosOfMilli() / 1000));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public Timestamp getTimestamp(int i, int timestampPrecision)
{
if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
SqlTimestampWithTimeZone sqlTimestampWithTimeZone = (SqlTimestampWithTimeZone) TimestampWithTimeZoneType.createTimestampWithTimeZoneType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0);
return Timestamp.fromEpochMillis(sqlTimestampWithTimeZone.getEpochMillis(), sqlTimestampWithTimeZone.getPicosOfMilli() * 1000);
}
else {
SqlTimestamp sqlTimestamp = (SqlTimestamp) TimestampType.createTimestampType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0);
return Timestamp.fromLocalDateTime(sqlTimestamp.toLocalDateTime());
}
}
`@Override`
public Timestamp getTimestamp(int i, int timestampPrecision)
{
if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE) {
SqlTimestampWithTimeZone sqlTimestampWithTimeZone = (SqlTimestampWithTimeZone) TimestampWithTimeZoneType.createTimestampWithTimeZoneType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0);
return Timestamp.fromEpochMillis(
sqlTimestampWithTimeZone.getEpochMillis(),
(int) (sqlTimestampWithTimeZone.getPicosOfMilli() / 1000));
}
else {
SqlTimestamp sqlTimestamp = (SqlTimestamp) TimestampType.createTimestampType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0);
return Timestamp.fromLocalDateTime(sqlTimestamp.toLocalDateTime());
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 183 - 194, In PaimonRow.getTimestamp, the
TIMESTAMP_WITH_LOCAL_TIME_ZONE branch incorrectly converts picoseconds to
nanoseconds by multiplying picosOfMilli by 1000; change the conversion to divide
SqlTimestampWithTimeZone.getPicosOfMilli() by 1000 before passing to
Timestamp.fromEpochMillis so you supply nanoseconds-of-millisecond (int) not
picoseconds (long). Locate the block inside getTimestamp that constructs
SqlTimestampWithTimeZone and replace the picos -> nanos conversion (currently "*
1000") with "/ 1000" and ensure the resulting value is cast/typed to the int
parameter expected by Timestamp.fromEpochMillis.

Comment on lines +48 to +77
public static List<OptionInfo> getOptionInfos()
{
List<OptionInfo> optionInfos = new ArrayList<>();
List<OptionWithMetaInfo> optionWithMetaInfos = extractConfigOptions(CoreOptions.class);
String className = "";
for (OptionWithMetaInfo optionWithMetaInfo : optionWithMetaInfos) {
if (shouldSkip(optionWithMetaInfo.field.getName())) {
continue;
}

Type genericType = optionWithMetaInfo.field.getGenericType();
if (genericType instanceof ParameterizedType parameterizedType) {
Type[] actualTypeArguments = parameterizedType.getActualTypeArguments();
for (Type actualTypeArgument : actualTypeArguments) {
if (actualTypeArgument instanceof Class<?>) {
className = ((Class<?>) actualTypeArgument).getSimpleName();
}
}
}

optionInfos.add(
new OptionInfo(
convertOptionKey(optionWithMetaInfo.option.key()),
optionWithMetaInfo.option.key(),
buildClass(className),
isEnum(className),
className));
}
return optionInfos;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: className leaks across loop iterations.

className is declared once outside the for-loop (Line 52) and is only reassigned inside the if (genericType instanceof ParameterizedType ...) branch. For any field whose generic type is not a ParameterizedType (or whose actual type argument is not a Class<?>), the new OptionInfo will be constructed with the previous iteration's className, leading to wrong isEnum/buildClass() values and malformed property metadata in PaimonTableOptions.

Reset className at the start of each iteration (or declare it inside the loop):

🐛 Fix
-        String className = "";
         for (OptionWithMetaInfo optionWithMetaInfo : optionWithMetaInfos) {
             if (shouldSkip(optionWithMetaInfo.field.getName())) {
                 continue;
             }
 
+            String className = "";
             Type genericType = optionWithMetaInfo.field.getGenericType();
             if (genericType instanceof ParameterizedType parameterizedType) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableOptionUtils.java`
around lines 48 - 77, In getOptionInfos(), className is declared outside the
for-loop and can leak previous values into OptionInfo construction; move the
declaration of className (or reset it to empty) at the start of each iteration
of the for (OptionWithMetaInfo optionWithMetaInfo : optionWithMetaInfos) loop so
that when examining optionWithMetaInfo.field.getGenericType() (and the
ParameterizedType/actualTypeArgument handling) you only use the className
derived for that field when calling buildClass(className) and isEnum(className)
to construct the new OptionInfo.

@eskabetxe
Copy link
Copy Markdown
Member

thanks @martint

@leaves12138 can you check?

@eskabetxe
Copy link
Copy Markdown
Member

@leaves12138 do you need any help here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants