Add Paimon Connector#26634
Conversation
ccb374a to
8ecbde8
Compare
8d197cb to
8af7c2d
Compare
|
@ebyhr Hi, could you help me review this pull request? I have stripped out all the actions like create table, alter table and insert data. Only read from paimon ability is kept in this pull request. I will submit pull request to support INSERT and CREATE/ALTER table operations in the future. |
be1c9cc to
ca77568
Compare
0f61144 to
fbee577
Compare
d6b3623 to
7d5aed2
Compare
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
|
@ebyhr Hi, is there any thing I could do to merge this pull request? You can comment it, I will fix them as soon as possible |
7d5aed2 to
855534b
Compare
|
Fixed as requested |
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
|
LGTM |
|
This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack. |
chenjian2664
left a comment
There was a problem hiding this comment.
Skimmed table handle
| return table; | ||
| } | ||
|
|
||
| public ConnectorTableMetadata tableMetadata(ConnectorSession session, TrinoCatalog catalog) |
There was a problem hiding this comment.
Let's move all session-related methods out, they would likely fit better in the PaimonMetadata class.
| table(session, catalog).comment()); | ||
| } | ||
|
|
||
| public List<ColumnMetadata> columnMetadatas(ConnectorSession session, TrinoCatalog catalog) |
| .collect(Collectors.toList()); | ||
| } | ||
|
|
||
| public PaimonColumnHandle columnHandle(ConnectorSession session, TrinoCatalog catalog, String field) |
| return PaimonColumnHandle.of(originFieldNames.get(index), dataField.type(), dataField.id()); | ||
| } | ||
|
|
||
| public PaimonTableHandle withFilter(TupleDomain<PaimonColumnHandle> filter) |
There was a problem hiding this comment.
The method seems unused, let's remove it
| return new PaimonTableHandle(schemaName, tableName, dynamicOptions, filter, projectedColumns, limit); | ||
| } | ||
|
|
||
| public PaimonTableHandle withColumns(Set<PaimonColumnHandle> projectedColumns) |
| @JsonProperty("projection") Set<PaimonColumnHandle> projectedColumns, | ||
| @JsonProperty("limit") OptionalLong limit) | ||
| { | ||
| this.schemaName = schemaName; |
There was a problem hiding this comment.
Does the field possible null? Add non-null check if it is not.
| private final OptionalLong limit; | ||
| private final Map<String, String> dynamicOptions; | ||
|
|
||
| private transient Table table; |
There was a problem hiding this comment.
Let's avoid premature optimization for the first version. We can load the table on each access, as my reivews(in this round) earlier, the loading logic likely belongs in the metadata class, which would also be a better place to introduce caching later
| private final String schemaName; | ||
| private final String tableName; | ||
| private final TupleDomain<PaimonColumnHandle> predicate; | ||
| private final Set<PaimonColumnHandle> projectedColumns; |
There was a problem hiding this comment.
Remove the field if it's not used
|
|
||
| import static org.assertj.core.api.Assertions.assertThat; | ||
|
|
||
| final class TestPaimonTableHandle |
There was a problem hiding this comment.
Let's remove the test, it's not necessary to test the json de/serialization.
| binder -> { | ||
| binder.bind(ClassLoader.class).toInstance(PaimonConnectorFactory.class.getClassLoader()); | ||
| binder.bind(NodeVersion.class).toInstance(new NodeVersion(context.getCurrentNode().getVersion())); | ||
| binder.bind(TypeManager.class).toInstance(context.getTypeManager()); | ||
| binder.bind(OpenTelemetry.class).toInstance(context.getOpenTelemetry()); | ||
| binder.bind(Node.class).toInstance(context.getCurrentNode()); | ||
| binder.bind(Tracer.class).toInstance(context.getTracer()); | ||
| binder.bind(CatalogName.class).toInstance(new CatalogName(catalogName)); | ||
| }, |
There was a problem hiding this comment.
| binder -> { | |
| binder.bind(ClassLoader.class).toInstance(PaimonConnectorFactory.class.getClassLoader()); | |
| binder.bind(NodeVersion.class).toInstance(new NodeVersion(context.getCurrentNode().getVersion())); | |
| binder.bind(TypeManager.class).toInstance(context.getTypeManager()); | |
| binder.bind(OpenTelemetry.class).toInstance(context.getOpenTelemetry()); | |
| binder.bind(Node.class).toInstance(context.getCurrentNode()); | |
| binder.bind(Tracer.class).toInstance(context.getTracer()); | |
| binder.bind(CatalogName.class).toInstance(new CatalogName(catalogName)); | |
| }, | |
| new ConnectorContextModule(catalogName, context), | |
| binder -> binder.bind(ClassLoader.class).toInstance(PaimonConnectorFactory.class.getClassLoader()), |
( we need to import io.trino.plugin.base.ConnectorContextModule as well)
I was experimenting the use filesystem caching with the Paimon connector, but when fs.cache.enabled=true, FileSystemModule installs AlluxioFileSystemCacheModule, which binds ConsistentHashingHostAddressProvider. That class requires NodeManager to be injected, but PaimonConnectorFactory manually binds only a subset of the connector context objects (e.g. Node, NodeVersion, TypeManager, OpenTelemetry, Tracer, CatalogName) and omits NodeManager.
All other connectors that use FileSystemModule (Iceberg, Hive, Delta Lake) end up by using ConnectorContextModule, which provides all standard context bindings, including NodeManager
Maybe this can be implemented? Other options is to add binder.bind(NodeManager.class).toInstance(context.getNodeManager()); direct as a bind.
|
@leaves12138 can you check? do you need help here? |
| } | ||
| return columnHandles; | ||
| } | ||
|
|
There was a problem hiding this comment.
| @Override | |
| public Optional<ConstraintApplicationResult<ConnectorTableHandle>> applyFilter( | |
| ConnectorSession session, | |
| ConnectorTableHandle tableHandle, | |
| Constraint constraint) | |
| { | |
| PaimonTableHandle paimonTableHandle = (PaimonTableHandle) tableHandle; | |
| Optional<PaimonFilterExtractor.TrinoFilter> trinoFilter = | |
| PaimonFilterExtractor.extract(session, catalog, paimonTableHandle, constraint); | |
| if (trinoFilter.isEmpty()) { | |
| return Optional.empty(); | |
| } | |
| PaimonTableHandle newHandle = paimonTableHandle.withFilter(trinoFilter.get().filter()); | |
| return Optional.of(new ConstraintApplicationResult<>(newHandle, trinoFilter.get().remainFilter(), constraint.getExpression(), false)); | |
| } | |
(import io.trino.spi.connector.Constraint, import io.trino.spi.connector.ConstraintApplicationResult, import io.trino.spi.expression.ConnectorExpression)
After testing some queries, I notice that we are missing predicate pushdown support, which causes significant performance issues, but all the downstream code for filtering already exists
- PaimonTableHandle has a predicate field and a withFilter() method.
- PaimonFilterExtractor.extract() converts Trino's Constraint into a TrinoFilter (split between predicates the connector enforces and those Trino must still apply).
- PaimonFilterConverter translates the TupleDomain into Paimon-native Predicate objects.
- PaimonSplitManager already calls readBuilder.withFilter(predicate) for partition/split pruning.
- PaimonPageSourceProvider already passes the predicate to ORC stripe-level and Parquet row-group filtering.
However, PaimonFilterExtractor.extract() is never called, besides on tests.
There was a problem hiding this comment.
@jeancmsantana Let's keep the first version simple, it is intention and ok to not support the predicate pd at this time.
a1df282 to
37904e4
Compare
|
Thanks for all your help!!! I will fix all the comments in one or two days~ |
11759c4 to
3dff5ae
Compare
Thanks for asking, I will fix them all |
3dff5ae to
123060d
Compare
|
Fixed as requested~ |
|
great work @leaves12138! |
|
@chenjian2664 could you recheck please |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis pull request introduces a new Apache Paimon connector for Trino, enabling users to query Paimon-format tables via Trino SQL. The implementation adds a complete connector plugin with configuration management, metadata discovery, split generation, page source handling, and filesystem/Hive metastore catalog backends. Documentation is provided for configuration, type mappings, and SQL support. Supporting infrastructure includes type conversion utilities, file I/O wrappers, and comprehensive test coverage. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (28)
docs/src/main/sphinx/connector/paimon.md-216-219 (1)
216-219:⚠️ Potential issue | 🟠 MajorFix truncated metadata table output.
The example output appears truncated with incomplete column names (ending in "min_>" and "NULL>"). This makes the example unclear and unhelpful for users. Consider either wrapping the output properly, showing fewer columns, or formatting the table differently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/main/sphinx/connector/paimon.md` around lines 216 - 219, The example metadata table in the Paimon doc shows truncated column names and values (e.g., the header fragment "partition | bucket | ... | file_size_in_bytes | min_>" and the value ending "NULL>"), so update the table output in the paimon.md example to avoid truncation: replace the current single-line wrapped table with either a fully expanded multi-line table (showing complete column names like "min_values" or whatever the full name is) or a reduced column set that fits the layout (e.g., show only "partition | bucket | file_path | file_format | record_count | file_size_in_bytes"), or render it as a fenced code block with no wrapping so the full file_path and trailing values (the s3://... path and the NULL value) are visible and not cut off.docs/src/main/sphinx/connector/paimon.md-139-142 (1)
139-142:⚠️ Potential issue | 🟠 MajorFix grammar, capitalization, and documentation style.
Multiple issues in this section:
- "tpch" should be "TPC-H" (proper benchmark name)
- "s3" should be "S3" (proper product name)
- "which mocked" is missing "is" → "which is mocked"
- "minio" should be "MinIO" (proper product name)
- "My paimon.properties" uses first person, inappropriate for technical documentation
- "shows as below" is informal
📝 Proposed fix
-In the following example queries, `nation` is the Paimon append table loaded by tpch. -The target file system is s3, which mocked by minio in the local environment. +In the following example queries, `nation` is the Paimon append table loaded by TPC-H. +The target file system is S3, which is mocked by MinIO in the local environment. -My paimon.properties in etc/catalog shows as below: +The `paimon.properties` file in `etc/catalog` contains the following:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/main/sphinx/connector/paimon.md` around lines 139 - 142, Reword the paragraph to correct capitalization and tone: replace "tpch" with "TPC-H", "s3" with "S3", and "minio" with "MinIO"; insert the missing "is" in "which is mocked"; remove first-person phrasing like "My paimon.properties" and replace informal "shows as below" with a formal phrase such as "is shown below"; for example, produce a sentence that mentions the Paimon append table "nation" loaded by TPC-H, that the target file system is S3 which is mocked by MinIO in the local environment, and that the paimon.properties file in etc/catalog is shown below to locate the change around the fragments "nation", "Paimon append table", "etc/catalog", and "paimon.properties".plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java-304-306 (1)
304-306:⚠️ Potential issue | 🟠 MajorCharType predicate values must apply trailing-space trimming to match Trino's char comparison semantics.
Trino stores
CHAR(n)values with trailing spaces removed internally. When building Paimon predicates forCHARcolumns, the predicate values must usetruncateToLengthAndTrimSpaces()to strip trailing spaces, matching the behavior inTupleDomainOrcPredicate(line 238). Currently,CharTypeandVarcharTypeare handled identically without trimming, which will produce incorrect results forCHAR(n)equality and range comparisons against padded values.Apply
truncateToLengthAndTrimSpaces()toCharTypevalues in the conversion logic (around line 304-306), and add a test case forCHAR(n) = 'abc'pushdown where the stored value contains trailing padding to verify correct predicate behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java` around lines 304 - 306, The current conversion in PaimonFilterConverter treats CharType and VarcharType identically; modify the branch handling text types so that when type instanceof CharType you first call truncateToLengthAndTrimSpaces(...) on the trinoNativeValue bytes (using the same utility used in TupleDomainOrcPredicate) and then pass the trimmed result to BinaryString.fromBytes, while leaving VarcharType behavior unchanged; update/add a unit/integration test asserting pushdown for CHAR(n) = 'abc' where the stored value contains trailing spaces to verify the trimmed predicate matches Trino semantics.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConfig.java-42-49 (1)
42-49:⚠️ Potential issue | 🟠 MajorValidate that
paimon.warehouseis configured.
warehouseis required bytoOptions(), but the getter is not annotated as required. Add@NotNullso missing catalog config fails during Trino configuration validation instead of later in catalog initialization.Suggested validation
+ `@NotNull` public String getWarehouse() { return warehouse; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConfig.java` around lines 42 - 49, Add validation to ensure paimon.warehouse is required by annotating the PaimonConfig.getWarehouse() method with `@NotNull` so Trino's configuration validation fails early; locate the PaimonConfig class (methods getWarehouse and setWarehouse and the toOptions() usage) and add the javax.validation.constraints.NotNull annotation to the getter, keeping the existing `@Config` and `@ConfigDescription` on setWarehouse unchanged.plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTablesInitializer.java-189-198 (1)
189-198:⚠️ Potential issue | 🟠 MajorUse the same schema for creation and data loading.
The table is created in
schemaName, but data is loaded fromsession.getSchema().orElse(schemaName). If the session has a default schema that differs fromschemaName, the initializer writes to the wrong database or fails to find the table.Suggested fix
createPaimonTable(paimonCatalog, paimonTableMeta); - createTableWithData(paimonCatalog, session.getSchema().orElse(schemaName), tpchTable, queryRunner); + createTableWithData(paimonCatalog, schemaName, tpchTable, queryRunner); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTablesInitializer.java` around lines 189 - 198, The table is created under schemaName but data is loaded using session.getSchema().orElse(schemaName), which can mismatch; change createTableWithData usage so it uses the same schema as creation (i.e., schemaName) or compute a single targetSchema variable and use it for both createPaimonTable and createTableWithData to ensure the table creation and data loading operate on the same schema; update the call sites for createPaimonTable and createTableWithData accordingly (referencing createPaimonTable, createTableWithData, schemaName, and session.getSchema().orElse(schemaName)).plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java-67-77 (1)
67-77:⚠️ Potential issue | 🟠 MajorBuild row-id fields in row-type order, not page-channel order.
RowBlockfields are read by ordinal. Iterating channels and checkingfieldToIndex.containsValue(i)reorders the row-id fields by projected channel index, which can put values under the wrong row field when projection order differs from the row-id field order.Suggested fix
Block[] newBlocks = new Block[nextPage.getChannelCount() + 1]; - Block[] rowIdBlocks = new Block[fieldToIndex.size()]; - for (int i = 0, idx = 0; i < nextPage.getChannelCount(); i++) { + for (int i = 0; i < nextPage.getChannelCount(); i++) { Block block = nextPage.getBlock(i); newBlocks[i] = block; - if (fieldToIndex.containsValue(i)) { - rowIdBlocks[idx] = block; - idx++; - } + } + Block[] rowIdBlocks = new Block[fieldToIndex.size()]; + int idx = 0; + for (int channelIndex : fieldToIndex.values()) { + rowIdBlocks[idx++] = nextPage.getBlock(channelIndex); } newBlocks[nextPage.getChannelCount()] = RowBlock.fromFieldBlocks(rowCount, rowIdBlocks);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java` around lines 67 - 77, The code builds rowIdBlocks in page-channel order by checking fieldToIndex.containsValue(i) while iterating nextPage.getBlock(i), which misorders fields when projection differs; instead populate rowIdBlocks in row-field (ordinal) order: iterate over the field ordinals in fieldToIndex (or over sorted fieldToIndex.keySet()), for each field get the corresponding channel index (fieldToIndex.get(fieldOrdinal)) and set rowIdBlocks[fieldOrdinal] = nextPage.getBlock(channelIndex), then call RowBlock.fromFieldBlocks with that rowIdBlocks; update PaimonMergePageSourceWrapper where newBlocks/rowIdBlocks are constructed to use this mapping.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceWrapper.java-65-77 (1)
65-77:⚠️ Potential issue | 🟠 MajorDo not require completed positions unless a deletion vector is applied.
deletionVectoris optional, but Line 67 always callsorElseThrow()before checking whether a page exists or whether filtering is needed. That can fail reads where the wrapper is constructed with an empty deletion vector (whendeletionFiles.get().get(i)isnull) or when the source reaches EOF without reporting completed positions.Suggested guard
public SourcePage getNextSourcePage() { - int startPosition = (int) source.getCompletedPositions().orElseThrow(); SourcePage next = source.getNextSourcePage(); if (next == null) { return next; } + if (deletionVector.isEmpty()) { + return next; + } int pageCount = next.getPositionCount(); - return deletionVector - .map(deletionVector -> convertToRetained(next, deletionVector, startPosition, pageCount)) - .orElse(next); + int startPosition = (int) source.getCompletedPositions().orElseThrow(); + return convertToRetained(next, deletionVector.get(), startPosition, pageCount); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceWrapper.java` around lines 65 - 77, The code currently calls source.getCompletedPositions().orElseThrow() unconditionally in getNextSourcePage(), which can throw when no deletionVector is present or at EOF; change the logic to first get SourcePage next = source.getNextSourcePage() and return null/next early if next == null or deletionVector.isEmpty(), and only then (inside the deletionVector.map lambda or after confirming deletionVector.isPresent()) call source.getCompletedPositions().orElseThrow() to obtain startPosition and pass it to convertToRetained(next, deletionVector, startPosition, pageCount); this ensures completed positions are only required when a deletionVector is applied.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java-40-93 (1)
40-93:⚠️ Potential issue | 🟠 MajorDelegate the full ConnectorPageSource contract.
This wrapper does not override
isBlocked(),getCompletedPositions(), andgetMetrics(), which means it returns defaults that hide the underlying source's behavior:NOT_BLOCKED,OptionalLong.empty(), andMetrics.EMPTYrespectively. If the wrapped source implements custom blocking or metrics, the engine will not see them and may busy-poll instead of respecting asynchronous blocking.Delegation additions
+import io.trino.spi.metrics.Metrics; + import java.io.IOException; import java.util.Map; +import java.util.OptionalLong; +import java.util.concurrent.CompletableFuture; @@ public long getCompletedBytes() { return pageSource.getCompletedBytes(); } + + `@Override` + public OptionalLong getCompletedPositions() + { + return pageSource.getCompletedPositions(); + } @@ public void close() throws IOException { pageSource.close(); } + + `@Override` + public CompletableFuture<?> isBlocked() + { + return pageSource.isBlocked(); + } + + `@Override` + public Metrics getMetrics() + { + return pageSource.getMetrics(); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.java` around lines 40 - 93, PaimonMergePageSourceWrapper currently fails to delegate the full ConnectorPageSource contract: implement and delegate isBlocked(), getCompletedPositions(), and getMetrics() to the wrapped pageSource (same pattern as getCompletedBytes/getReadTimeNanos/isFinished/getMemoryUsage/close) so the engine sees the underlying source's blocking, completed positions and metrics; update the PaimonMergePageSourceWrapper class to add overrides for isBlocked(), getCompletedPositions(), and getMetrics() that simply call through to pageSource.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PositionOutputStreamWrapper.java-42-64 (1)
42-64:⚠️ Potential issue | 🟠 MajorAdvance
positiononly after a successful write.The current code increments
positionbefore the wrapped stream validates and writes. IfoutputStream.write(...)throws,getPos()reports bytes that were never written.🐛 Proposed fix
public void write(int b) throws IOException { - position++; outputStream.write(b); + position++; } @@ public void write(byte[] bytes) throws IOException { - position += bytes.length; outputStream.write(bytes); + position += bytes.length; } @@ public void write(byte[] bytes, int off, int len) throws IOException { - position += len; outputStream.write(bytes, off, len); + position += len; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PositionOutputStreamWrapper.java` around lines 42 - 64, The PositionOutputStreamWrapper increments position before delegating to outputStream, so getPos() can be wrong if write throws; update write(int), write(byte[]), and write(byte[], int, int) to perform outputStream.write(...) first and only increment position after that call succeeds (e.g., call outputStream.write(b) then position++ in write(int), call outputStream.write(bytes) then position += bytes.length in write(byte[]), and call outputStream.write(bytes, off, len) then position += len in write(byte[], int, int)) so position reflects only successfully written bytes.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitManager.java-65-71 (1)
65-71:⚠️ Potential issue | 🟠 MajorGuard split-weight calculation when all row counts are zero.
For non-empty split lists where every
rowCount()is0,(double) split.rowCount() / maxRowCountbecomesNaN, so the weight clamp does not work.🐛 Proposed fix
long maxRowCount = splits.stream().mapToLong(Split::rowCount).max().orElse(0L); double minimumSplitWeight = PaimonSessionProperties.getMinimumSplitWeight(session); return new PaimonSplitSource(splits.stream() .map(split -> PaimonSplit.fromSplit( split, - Math.min(Math.max((double) split.rowCount() / maxRowCount, minimumSplitWeight), 1.0))) + maxRowCount > 0 + ? Math.min(Math.max((double) split.rowCount() / maxRowCount, minimumSplitWeight), 1.0) + : minimumSplitWeight)) .collect(Collectors.toList()));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitManager.java` around lines 65 - 71, When all splits have rowCount() == 0, maxRowCount becomes 0 and (double) split.rowCount() / maxRowCount yields NaN, so guard the division: in PaimonSplitManager compute maxRowCount and if maxRowCount == 0 use a default ratio of 0.0 for all splits (or otherwise treat ratio as minimumSplitWeight) before applying Math.min/Math.max; update the mapping passed to PaimonSplit.fromSplit so it uses ratio = (maxRowCount == 0 ? 0.0 : (double) split.rowCount() / maxRowCount) and then clamp it with minimumSplitWeight and 1.0, and return the PaimonSplitSource as before.plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java-221-239 (1)
221-239:⚠️ Potential issue | 🟠 MajorReturn when
ignoreIfExistsis true.When the Hive table already exists and
ignoreIfExistsis true, this currently falls through tohiveMetastore.createTable(...), which can fail on the duplicate table.🐛 Proposed fix
if (hiveMetastore.getTable(identifier.getDatabaseName(), identifier.getTableName()).isPresent()) { - if (!ignoreIfExists) { - throw new TableAlreadyExistException(identifier); + if (ignoreIfExists) { + return; } + throw new TableAlreadyExistException(identifier); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java` around lines 221 - 239, The code checks hiveMetastore.getTable(...) and throws TableAlreadyExistException when ignoreIfExists is false, but still proceeds to call hiveMetastore.createTable(...) and baseCatalog.createTable(...); change the flow so that if the table exists and ignoreIfExists is true you return immediately (skip both hiveMetastore.createTable and baseCatalog.createTable), i.e., add an early return right after the existence check when ignoreIfExists is true; adjust the block around hiveMetastore.getTable(...), TableAlreadyExistException, and baseCatalog.createTable(...) accordingly to avoid attempting to create a duplicate table.plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java-205-210 (1)
205-210:⚠️ Potential issue | 🟠 MajorHonor
ignoreIfNotExistsbefore dropping from Hive.
hiveMetastore.dropTable(...)runs unconditionally, sodropTable(..., true)can still fail when the Hive table is absent and leave base-catalog cleanup unattempted.🐛 Proposed fix
public void dropTable(Identifier identifier, boolean ignoreIfNotExists) throws TableNotExistException { + if (hiveMetastore.getTable(identifier.getDatabaseName(), identifier.getTableName()).isEmpty()) { + if (ignoreIfNotExists) { + return; + } + throw new TableNotExistException(identifier); + } hiveMetastore.dropTable(identifier.getDatabaseName(), identifier.getTableName(), false); baseCatalog.dropTable(identifier, ignoreIfNotExists); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.java` around lines 205 - 210, In TrinoHiveCatalog.dropTable(Identifier identifier, boolean ignoreIfNotExists) the call to hiveMetastore.dropTable(...) runs unconditionally and can throw when ignoreIfNotExists is true; change the logic to first check ignoreIfNotExists and only call hiveMetastore.dropTable if the Hive table exists or when ignoreIfNotExists is false (e.g., guard or catch the specific not-found exception), then always invoke baseCatalog.dropTable(identifier, ignoreIfNotExists); ensure you reference the existing hiveMetastore.dropTable and baseCatalog.dropTable calls and preserve the method signature and behavior for the non-ignore case.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java-93-101 (1)
93-101:⚠️ Potential issue | 🟠 MajorClamp bounded Paimon varchar lengths before creating Trino varchar.
The code can fail for Paimon bounded
VarCharTypelengths that exceed Trino'sVarcharType.MAX_LENGTH(Integer.MAX_VALUE - 1) but don't equal Paimon'sMAX_LENGTH(Integer.MAX_VALUE). Trino'screateVarcharType()validates and rejects such lengths. This mirrors the asymmetry in theCharTypevisitor (line 90), which already applies clamping.Proposed fix
if (varCharType.getLength() == VarCharType.MAX_LENGTH) { return VarcharType.createUnboundedVarcharType(); } return VarcharType.createVarcharType( - varCharType.getLength()); + Math.min(VarcharType.MAX_LENGTH, varCharType.getLength()));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java` around lines 93 - 101, The VarChar visitor in PaimonTypeUtils should clamp bounded Paimon lengths to Trino's max before calling VarcharType.createVarcharType to avoid rejection for values > VarcharType.MAX_LENGTH; update visit(VarCharType) to mirror the CharType handling by computing int len = Math.min(varCharType.getLength(), VarcharType.MAX_LENGTH) when varCharType.getLength() != VarCharType.MAX_LENGTH, then call VarcharType.createVarcharType(len) (use VarCharType.MAX_LENGTH and VarcharType.MAX_LENGTH constants and the visit(VarCharType) method and VarcharType.createVarcharType reference to locate the change).plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonColumnHandle.java-46-50 (1)
46-50:⚠️ Potential issue | 🟠 MajorReplace
@JsonPropertywith@JsonIgnoreonisRowId()method.
isRowId()is derived fromcolumnNameand should not be serialized as a JSON property. This follows the established pattern in Trino where derived helper methods onColumnHandleimplementations (Iceberg, Delta Lake, MongoDB, OpenSearch, etc.) use@JsonIgnore. Marking it with@JsonPropertywould add a non-reconstructiblerowIdfield to the serialized JSON, breaking deserialization round-trips.Suggested fix
- `@JsonProperty` + `@JsonIgnore` public boolean isRowId() { return TRINO_ROW_ID_NAME.equals(columnName); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonColumnHandle.java` around lines 46 - 50, The isRowId() accessor on PaimonColumnHandle is a derived helper based on columnName and must not be serialized; replace the `@JsonProperty` annotation on the isRowId() method with `@JsonIgnore` so the derived rowId is excluded from JSON and does not break deserialization round-trips for PaimonColumnHandle.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java-270-345 (1)
270-345:⚠️ Potential issue | 🟠 MajorMove row field ID generation to be scoped per row type conversion.
The singleton
TrinoToPaimonTypeVisitor.INSTANCEmaintains a sharedAtomicInteger currentIndexthat is never reset. WhentoPaimonType()is called multiple times (e.g., inPaimonTestingUtils.prepareSchema()at line 72 for each table column, or for nested row types), field IDs for subsequent row conversions begin from the previous counter value instead of0. Additionally,field.getName().get()at line 342 throwsNoSuchElementExceptionif a Trino row field has no name (anonymous rows).Proposed fix
case io.trino.spi.type.RowType rowType -> { + AtomicInteger currentIndex = new AtomicInteger(); List<DataField> dataFields = rowType.getFields().stream() .map(field -> new DataField( currentIndex.getAndIncrement(), - field.getName().get(), + field.getName().orElseThrow(() -> new UnsupportedOperationException("Anonymous row fields are not supported")), visit(field.getType()))) .collect(Collectors.toList()); return new RowType(true, dataFields); }Remove the instance field
private final AtomicInteger currentIndex = new AtomicInteger(0);at line 270.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.java` around lines 270 - 345, The TrinoToPaimonTypeVisitor.INSTANCE uses a shared field currentIndex which is never reset so row field IDs keep incrementing across separate toPaimonType/visit calls and nested rows; also field.getName().get() can throw for anonymous Trino row fields. Fix by removing the instance-level AtomicInteger currentIndex and make field-id generation local per-row conversion inside the visit(Type) branch handling io.trino.spi.type.RowType: create a fresh counter or use the stream index to assign ids starting at 0 for that row, and replace field.getName().get() with a safe retrieval (e.g., field.getName().orElse("") or a deterministic generated name) before constructing DataField; update references in TrinoToPaimonTypeVisitor.visit/rowType handling accordingly so nested rows get their own id scope.plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java-112-112 (1)
112-112:⚠️ Potential issue | 🟠 MajorGuard the optional data loader before initialization.
builder()allows building withoutsetDataLoader(...), butbuild()unconditionally dereferencesdataLoader.🐛 Proposed fix
- dataLoader.initializeTables(session, queryRunner, SCHEMA_NAME); + if (dataLoader != null) { + dataLoader.initializeTables(session, queryRunner, SCHEMA_NAME); + } return queryRunner;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java` at line 112, The build() path unconditionally dereferences dataLoader by calling dataLoader.initializeTables(...), but builder() allows omitting setDataLoader(...); update PaimonQueryRunner.build()/constructor to null-check or guard the optional dataLoader before calling initializeTables: check that dataLoader != null (or use Optional) and only call dataLoader.initializeTables(session, queryRunner, SCHEMA_NAME) when present, and ensure any related fields (dataLoader) are stored/typed to reflect optionality so callers won't NPE.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java-91-105 (1)
91-105:⚠️ Potential issue | 🟠 MajorClose every queued source even when one close fails.
The current loop exits on the first
IOException, leaving later page sources open. Also clearcurrent/pageSourceQueueso repeatedclose()calls do not double-close the same sources.🧹 Proposed fix
`@Override` - public void close() + public void close() + throws IOException { - try { - if (current != null) { + IOException failure = null; + if (current != null) { + try { current.close(); } - for (ConnectorPageSource source : pageSourceQueue) { + catch (IOException e) { + failure = e; + } + finally { + current = null; + } + } + ConnectorPageSource source; + while ((source = pageSourceQueue.poll()) != null) { + try { source.close(); } - } - catch (IOException e) { - throw new UncheckedIOException(e); + catch (IOException e) { + if (failure == null) { + failure = e; + } + else { + failure.addSuppressed(e); + } + } + } + if (failure != null) { + throw failure; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java` around lines 91 - 105, The close() method currently stops on the first IOException and leaves later ConnectorPageSource instances open; update DirectPaimonPageSource.close() to attempt closing every source (first close current, then iterate pageSourceQueue) while catching IOExceptions per-source, collect the first exception (or add subsequent ones as suppressed), and after all closes clear current and pageSourceQueue to avoid double-closing on repeated calls; finally, if any IOException occurred, rethrow it as an UncheckedIOException with suppressed exceptions preserved.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java-128-137 (1)
128-137:⚠️ Potential issue | 🟠 MajorNormalize the lookup key for mixed-case column names.
lowerCaseFieldNamesis searched with the originalfield, so a column likeaCacan fail lookup even though it exists.🐛 Proposed fix
+import static java.util.Locale.ENGLISH; + public PaimonColumnHandle columnHandle(Table paimonTable, String field) { List<String> lowerCaseFieldNames = fieldNames(paimonTable.rowType()); List<String> originFieldNames = paimonTable.rowType().getFieldNames(); - int index = lowerCaseFieldNames.indexOf(field); + int index = lowerCaseFieldNames.indexOf(field.toLowerCase(ENGLISH)); if (index == -1) { throw new RuntimeException(format("Cannot find field %s in schema %s", field, lowerCaseFieldNames)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java` around lines 128 - 137, In PaimonMetadata.columnHandle, the lookup uses lowerCaseFieldNames but searches with the original field string which breaks mixed-case names; change the search to use a normalized key (e.g., field.toLowerCase(Locale.ROOT)) when computing index = lowerCaseFieldNames.indexOf(...), ensuring you import/use Locale.ROOT or equivalent for stable lowercasing; keep the rest of the method (originFieldNames, DataField retrieval, and PaimonColumnHandle.of(...)) unchanged.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java-61-69 (1)
61-69:⚠️ Potential issue | 🟠 MajorDo not advance to the next source on a transient
nullpage.
getNextSourcePage()can returnnullbeforeisFinished()is true. Closing and advancing here can drop data from page sources that are temporarily unable to produce a page.🐛 Proposed fix
public SourcePage getNextSourcePage() { while (current != null) { SourcePage dataPage = current.getNextSourcePage(); if (dataPage != null) { return dataPage; } + if (!current.isFinished()) { + return null; + } advance(); } return null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.java` around lines 61 - 69, The code in DirectPaimonPageSource.getNextSourcePage() currently advances to the next source whenever current.getNextSourcePage() returns null, which can drop pages from sources that are temporarily unable to produce a page; change the logic so that when current.getNextSourcePage() returns null you only call advance() if the current source reports it is finished (e.g., current.isFinished()); otherwise return null to indicate a transient lack of data. Update getNextSourcePage() to check current.getNextSourcePage() -> if non-null return it; if null then if current.isFinished() call advance() and continue the loop, else return null. Ensure you reference the methods current.getNextSourcePage(), current.isFinished(), advance(), and the class DirectPaimonPageSource in your change.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java-189-204 (1)
189-204:⚠️ Potential issue | 🟠 MajorDo not swallow all metadata runtime failures.
Catching every
RuntimeExceptionmakes real metadata failures look like missing columns/tables. Only tolerate table-not-found races and propagate other failures.🐛 Proposed fix
return tableNames.stream() .map(table -> { List<ColumnMetadata> columnMetadata; try { - getTableHandle(session, table, Optional.empty(), Optional.empty()); + ConnectorTableHandle tableHandle = getTableHandle(session, table, Optional.empty(), Optional.empty()); + if (tableHandle == null) { + return Pair.of(table, (List<ColumnMetadata>) null); + } Table paimonTable = table(session, table.getSchemaName(), table.getTableName()); columnMetadata = columnMetadatas(paimonTable); } - catch (RuntimeException e) { - // Error when getting column metadata, return null + catch (TableNotFoundException e) { return Pair.of(table, (List<ColumnMetadata>) null); } return Pair.of(table, columnMetadata); }).filter(p -> p.getRight() != null)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.java` around lines 189 - 204, The current stream silently swallows all RuntimeException in the mapping (inside PaimonMetadata where getTableHandle, table and columnMetadatas are called), hiding real metadata errors; change the catch to only tolerate the table-not-found case and rethrow all other failures — e.g., catch RuntimeException e, detect table-missing (replace with the actual exception type your codebase uses such as TableNotFoundException or check error code/message) and return Pair.of(table, null) only for that case, otherwise rethrow the exception so getTableHandle/table/columnMetadatas errors propagate.plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java-104-104 (1)
104-104:⚠️ Potential issue | 🟠 MajorPass the query runner's base data directory to the testing plugin, not the filesystem root.
Path.resolve("/")with an absolute path returns the filesystem root instead of a child directory. The TestingPaimonPlugin expectslocalFileSystemRootPathto be the testing context's root path, which should be the query runner's base data directory—not/.- queryRunner.installPlugin(new TestingPaimonPlugin(queryRunner.getCoordinator().getBaseDataDir().resolve("/"))); + queryRunner.installPlugin(new TestingPaimonPlugin(queryRunner.getCoordinator().getBaseDataDir()));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.java` at line 104, The code is passing the filesystem root by calling queryRunner.getCoordinator().getBaseDataDir().resolve("/") to TestingPaimonPlugin; instead, pass the query runner's base data directory itself so the plugin receives the test context root. Update the call in PaimonQueryRunner where queryRunner.installPlugin(...) creates the TestingPaimonPlugin to use queryRunner.getCoordinator().getBaseDataDir() (remove the .resolve("/")) so TestingPaimonPlugin receives the correct localFileSystemRootPath.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSessionProperties.java-49-54 (1)
49-54:⚠️ Potential issue | 🟠 MajorAdd validation to
minimum_split_weightsession property to reject invalid proportions.This session property currently accepts any
Doublevalue without constraints. Invalid values like0, negative numbers,NaN, orInfinitywill propagate to split weight creation and causeIllegalArgumentExceptionat split generation time. The property should be constrained to the valid range(0, 1]using a decoder in the session property definition, consistent with how other Trino connectors handle split weight parameters (e.g., Hive'sminimum_assigned_split_weight).The clamping logic in
PaimonSplitManager.getSplits()(line 70) only partially mitigates this by enforcing the upper bound; it does not prevent invalid values from being used when the session property itself is invalid.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSessionProperties.java` around lines 49 - 54, The MINIMUM_SPLIT_WEIGHT session property in PaimonSessionProperties allows any Double and must be constrained to (0, 1]; add a decoder/validator when building the PropertyMetadata for MINIMUM_SPLIT_WEIGHT in PaimonSessionProperties that rejects NaN, infinite, <=0 and >1 (accepting values >0 up to and including 1) and throws a clear exception for invalid inputs so invalid proportions cannot be stored; this mirrors the validation approach used for similar properties and prevents bad values from reaching PaimonSplitManager.getSplits().plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSource.java-250-286 (1)
250-286:⚠️ Potential issue | 🟠 MajorAdd handling for high-precision timestamp-with-time-zone conversions.
The
LongTimestampWithTimeZonebranch at line 285 checkstype.equals(TIMESTAMP_TZ_MILLIS), butTIMESTAMP_TZ_MILLISis a short timestamp (long java type). High-precision timestamp-with-time-zone types (TIMESTAMP_TZ_MICROS,TIMESTAMP_TZ_NANOS,TIMESTAMP_TZ_PICOS) are mapped asLongTimestampWithTimeZoneTypeobjects byPaimonTypeUtilsfor precisions > 3, so they will fail the assertion. These types require explicit handling to convert the epoch millis and fractional seconds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSource.java` around lines 250 - 286, The LongTimestampWithTimeZone branch only allows TIMESTAMP_TZ_MILLIS but must handle high‑precision TZ types (TIMESTAMP_TZ_MICROS, TIMESTAMP_TZ_NANOS, TIMESTAMP_TZ_PICOS / LongTimestampWithTimeZoneType) returned by PaimonTypeUtils; change the guard to detect LongTimestampWithTimeZoneType (or check for type instance/one of TIMESTAMP_TZ_* constants) and convert the Paimon Timestamp into the Trino long-timestamp-with-time-zone representation by calling fromEpochMillisAndFraction(timestamp.getMillisecond(), fractionalPart, UTC_KEY) where fractionalPart is computed from the Timestamp fractional component at the appropriate precision (micros→micro fraction, nanos→nano fraction, etc.), then call type.writeObject(output, <that value>) instead of asserting only TIMESTAMP_TZ_MILLIS.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java-202-231 (1)
202-231:⚠️ Potential issue | 🟠 MajorFragile error handling: matching on
IOExceptionmessage substrings.Classifying rename failures by searching for the strings
"Target location already exists"and"rename failed"insidee.getMessage()is brittle — messages differ acrossTrinoFileSystemimplementations (S3, GCS, ABFS, local) and can change between Trino versions with no API-level warning. A legitimate rename failure today will be converted into a silentreturn false.Prefer checking preconditions explicitly before attempting the rename (e.g.
fileSystem.newInputFile(target).exists()for the "already exists" case), or catch the specific filesystem exception type if one is available.Also on Line 220:
e.getMessage()can benull→NullPointerException. If you keep the string-matching, guard withObjects.toString(e.getMessage(), "").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java` around lines 202 - 231, The rename() method uses fragile string-matching on e.getMessage() and can NPE; instead, proactively check target existence before attempting a rename (e.g. use fileSystem.newInputFile(targetLocation).exists() and/or fileSystem.directoryExists(targetLocation).orElse(false)) and return false if the target already exists, then perform the rename using the existing logic (renameDirectory / renameFileAnyway) inside the RENAME_LOCK for local URIs; if you must keep exception handling, avoid matching on nullable messages by wrapping e.getMessage() with Objects.toString(..., "") and prefer catching a more specific filesystem exception type if available rather than IOException.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java-286-391 (1)
286-391:⚠️ Potential issue | 🟠 Major
createPageSourceswallows causes and loses recoverability signals.Two concerns in the try/catch structure:
Lines 367–369 catch
Exceptionand rethrow as a bareRuntimeException(e)— anyTrinoException(e.g.NOT_SUPPORTED, authorization) thrown while building per-file page sources is demoted to a generic 500, losing the error code. Preferthrow e;forRuntimeExceptionsubtypes, and only wrap the checked ones (IOException→UncheckedIOException).The outer catch on Line 388 does the same for the entire method body, so
paimonSplit.decodeSplit()orReadBuilderfailures also lose their original types.♻️ Proposed refactor (pattern)
- catch (Exception e) { - throw new RuntimeException(e); - } + catch (IOException e) { + throw new UncheckedIOException(e); + } + catch (RuntimeException e) { + throw e; + } + catch (Exception e) { + throw new RuntimeException(e); + }Apply the same pattern to the outer
catch (Exception e)at Line 388.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java` around lines 286 - 391, The createPageSource method currently catches Exception in two places and rethrows new RuntimeException(e), which strips original RuntimeException/TrinoException types and error codes; update both catch blocks (the inner catch around per-file processing and the outer catch around the whole method) to: if (e instanceof RuntimeException || e instanceof TrinoException) throw (RuntimeException) e; else if (e instanceof IOException) throw new UncheckedIOException((IOException) e); else throw new RuntimeException(e); ensure you keep the same method symbol names (createPageSource) and preserve the existing logic around FileIndexPredicate, PaimonPageSourceWrapper/DirectPaimonPageSource and ReadBuilder while replacing the two Exception handlers with this pattern.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java-107-111 (1)
107-111:⚠️ Potential issue | 🟠 MajorAvoid the direct
ByteArrayBlockcast.The underlying block may be a dictionary/RLE/lazy wrapper, in which case the cast throws
ClassCastException. Use the samereadNativeValueidiom as the other getters (withTINYINT) so it works regardless of block implementation.🛡️ Proposed fix
`@Override` public byte getByte(int i) { - return ((ByteArrayBlock) singlePage.getBlock(i)).getByte(0); + long value = (long) readNativeValue(TINYINT, singlePage.getBlock(i), 0); + checkArgument(value >= Byte.MIN_VALUE && value <= Byte.MAX_VALUE, "Value out of range for byte: %s", value); + return (byte) value; }(and add
import static io.trino.spi.type.TinyintType.TINYINT;)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java` around lines 107 - 111, In PaimonRow.getByte replace the unsafe cast to ByteArrayBlock by calling readNativeValue on the block obtained from singlePage.getBlock(i) and converting to a byte using the TINYINT type (use readNativeValue(singlePage.getBlock(i), TINYINT) and extract the byte), so the code works with dictionary/RLE/lazy wrappers; also add the static import for io.trino.spi.type.TinyintType.TINYINT and remove the direct ByteArrayBlock cast.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java-171-191 (1)
171-191:⚠️ Potential issue | 🟠 MajorNon-recursive
deletedoesn't detect subdirectories and still deletes them.
listFiles(location).hasNext()only enumerates files, not sub-directories. A directory that contains only subdirectories returnsfalse, anddeleteDirectorythen recursively removes everything. This violates the documented POSIX-ishdelete(path, recursive=false)contract Paimon relies on and can cause silent data loss if callers use the non-recursive form as a safety check.Include
listDirectoriesin the emptiness test (or delete only the target dir via a non-recursive primitive).🐛 Proposed fix
if (fileSystem.directoryExists(location).orElse(false)) { if (!recursive) { - if (fileSystem.listFiles(location).hasNext()) { - throw new IOException("Directory " + location + " is not empty"); - } + if (fileSystem.listFiles(location).hasNext() + || !fileSystem.listDirectories(location).isEmpty()) { + throw new IOException("Directory " + location + " is not empty"); + } } fileSystem.deleteDirectory(location); return true; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.java` around lines 171 - 191, The non-recursive delete in delete(Path, boolean) currently checks only fileSystem.listFiles(location).hasNext(), which misses subdirectories and allows fileSystem.deleteDirectory(location) to remove contents recursively; update the emptiness check to also inspect fileSystem.listDirectories(location).hasNext() (or otherwise use a non-recursive delete primitive) so that when recursive==false you throw the IOException if either files or subdirectories exist before calling fileSystem.deleteDirectory(location); keep existFile(...) and fileSystem.deleteFile(...) behavior unchanged.plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHivePaimonCatalog.java-102-116 (1)
102-116:⚠️ Potential issue | 🟠 Major
listTablesdoesn't filter for Paimon-backed tables.Both branches return every Hive table in the namespace regardless of whether it's actually a Paimon table. Users will see non-Paimon tables in
SHOW TABLES, and subsequent metadata/load operations on them will blow up (currently via the "non-Paimon"TableNotFoundExceptioninloadTableSchema). Filter withisPaimonTable(...)as other Trino connectors (Iceberg, Delta) do with their format predicates.Additionally, the "no namespace" branch walks every database and every table sequentially via the metastore — for deployments with many databases this can be extremely slow. Prefer the metastore's bulk listing when available, or parallelize.
🐛 Proposed fix sketch
- if (namespace.isPresent()) { - String databaseName = namespace.get(); - if (!databaseExists(session, databaseName)) { - throw new SchemaNotFoundException("Database does not exist: " + databaseName); - } - return metastore.getTables(databaseName).stream().map(t -> t.tableName().getTableName()).collect(Collectors.toList()); - } - - return metastore.getAllDatabases().stream().flatMap(name -> metastore.getTables(name).stream()) - .map(t -> t.tableName().getTableName()) - .collect(Collectors.toList()); + List<String> databases = namespace + .map(List::of) + .orElseGet(metastore::getAllDatabases); + return databases.stream() + .flatMap(db -> metastore.getTables(db).stream()) + .map(t -> metastore.getTable(t.tableName().getDatabaseName(), t.tableName().getTableName())) + .flatMap(Optional::stream) + .filter(TrinoHivePaimonCatalog::isPaimonTable) + .map(t -> t.getTableName()) + .collect(toImmutableList());(Also replace the existence check throwing a message with
new SchemaNotFoundException(databaseName)— see separate comment.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHivePaimonCatalog.java` around lines 102 - 116, listTables currently returns every Hive table; change it to only include Paimon-backed tables by applying isPaimonTable(...) to each metastore table result (e.g., wrap metastore.getTables(databaseName).stream() and the flatMap branch to filter t -> isPaimonTable(t.tableName()) before mapping to tableName()); for the no-namespace branch avoid sequential per-database roundtrips by using metastore bulk listing APIs if available or by parallelizing the per-database metastore.getTables calls (e.g., parallelStream) to reduce latency; also replace the thrown SchemaNotFoundException message in the namespace-exists check with new SchemaNotFoundException(databaseName) as suggested.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: afb02a8e-59ff-4d29-8c11-dbebc30e4233
⛔ Files ignored due to path filters (1)
docs/src/main/sphinx/static/img/paimon.pngis excluded by!**/*.png
📒 Files selected for processing (66)
core/trino-server/src/main/provisio/trino.xmldocs/release-template.mddocs/src/main/sphinx/connector.mddocs/src/main/sphinx/connector/paimon.mddocs/src/main/sphinx/sql/drop-catalog.mdplugin/trino-paimon/pom.xmlplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/CatalogType.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/DirectPaimonPageSource.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/EncodingUtils.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonColumnHandle.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConfig.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConnector.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonConnectorFactory.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonErrorCode.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMergePageSourceWrapper.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadata.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonMetadataFactory.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonModule.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonOrcDataSource.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSource.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceWrapper.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPlugin.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSessionProperties.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplit.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitManager.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonSplitSource.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableHandle.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableOptionUtils.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableOptions.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTransactionManager.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTypeUtils.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/AbstractPaimonTrinoCatalog.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/PaimonCatalogModule.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/TrinoCatalog.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/TrinoCatalogFactory.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/file/PaimonFileSystemCatalogModule.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/file/TrinoFileSystemCatalogFactory.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/file/TrinoFileSystemPaimonCatalog.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/PaimonHiveCatalogModule.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHiveCatalogFactory.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/catalog/hms/TrinoHivePaimonCatalog.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonDirectoryFileStatus.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileIO.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonFileStatus.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PaimonInputStreamWrapper.javaplugin/trino-paimon/src/main/java/io/trino/plugin/paimon/fileio/PositionOutputStreamWrapper.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonQueryRunner.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/PaimonTestUtils.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/SimpleTableTestHelper.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonConfig.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonConnectorTest.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonITCase.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonMinioConnectorSmokeTest.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonPlugin.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonRow.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonSplit.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestPaimonTrinoType.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestingPaimonConnectorFactory.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/TestingPaimonPlugin.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTablesInitializer.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/PaimonTestingUtils.javaplugin/trino-paimon/src/test/java/io/trino/plugin/paimon/testing/TrinoHiveCatalog.javapom.xml
| ```sql | ||
| USE use paimon.tests; | ||
|
|
There was a problem hiding this comment.
Fix duplicated USE keyword in SQL example.
The SQL statement has "USE use" with the keyword duplicated, which is a syntax error.
🐛 Proposed fix
```sql
-USE use paimon.tests;
+USE paimon.tests;
SELECT * FROM nation LIMIT 1;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```sql | |
| USE use paimon.tests; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/main/sphinx/connector/paimon.md` around lines 156 - 158, Remove the
duplicated "USE" keyword in the SQL example: find the SQL snippet containing
"USE use paimon.tests;" and change it to "USE paimon.tests;" so the example is
valid (the SELECT statement that follows should remain unchanged).
| if (type.equals(TIMESTAMP_TZ_MILLIS)) { | ||
| if (trinoNativeValue instanceof Long) { | ||
| return trinoNativeValue; | ||
| } | ||
| return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis()); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Trino spi short TIMESTAMP WITH TIME ZONE packed long representation DateTimeEncoding
💡 Result:
In Trino's SPI (trino-spi), TIMESTAMP WITH TIME ZONE values are stored in blocks as packed 64-bit longs using DateTimeEncoding utilities. The timestamp portion represents milliseconds since the Unix epoch, shifted left to make room for timezone information packed in the lower bits (approximately 12 bits for timezone offset, limiting the effective timestamp range to about 2^52 ms or ~285,000 years around the epoch). For higher precision (p>3), LongTimestampWithTimeZone is used instead of short representations. ShortTimestampType (for p<=6 without TZ) uses microseconds from epoch as long. The exact bit layout is defined in io.trino.spi.type.DateTimeEncoding.packDateTimeWithZone(millis, timeZoneKey), which throws on millis overflow due to shifting.
Citations:
- 1: https://javadoc.io/static/io.trino/trino-spi/479/trino.spi/io/trino/spi/type/package-summary.html
- 2: Support for nanosecond/microsecond precision in TIMESTAMP and TIMESTAMP WITH TIME ZONE #1284
- 3: Error reading from Iceberg tables with very large timestamps in Parquet files #25837
- 4: https://github.com/trinodb/trino/blob/eef66628759d7244c176f62be45f3d9f0e5a1a5d/core/trino-spi/src/main/java/io/trino/spi/type/ShortTimestampType.java
🏁 Script executed:
# Locate the file mentioned in the review
find . -name "PaimonFilterConverter.java" -type fRepository: trinodb/trino
Length of output: 142
🏁 Script executed:
# Once found, examine the specific lines 297-302
fd -t f "PaimonFilterConverter.java" --exec cat -n {} \; | head -320 | tail -30Repository: trinodb/trino
Length of output: 1415
🏁 Script executed:
# Search for TIMESTAMP_TZ_MILLIS definition
rg "TIMESTAMP_TZ_MILLIS" -B 2 -A 2Repository: trinodb/trino
Length of output: 50370
🏁 Script executed:
# Search for DateTimeEncoding class and unpackMillisUtc method
rg "class DateTimeEncoding|def unpackMillisUtc|unpackMillisUtc" --type java -B 2 -A 5Repository: trinodb/trino
Length of output: 50370
Unpack TIMESTAMP_TZ_MILLIS packed long before using as epoch milliseconds.
Trino's short TIMESTAMP WITH TIME ZONE (precision ≤ 3) is stored as a packed 64-bit long via DateTimeEncoding.packDateTimeWithZone(millisUtc, zoneKey), where high bits hold millis-since-epoch and low bits hold the timezone key. Returning the packed value as-is feeds garbage into Paimon's predicate, causing incorrect row filtering.
🐛 Fix
if (type.equals(TIMESTAMP_TZ_MILLIS)) {
if (trinoNativeValue instanceof Long packed) {
- return trinoNativeValue;
+ return Timestamp.fromEpochMillis(DateTimeEncoding.unpackMillisUtc(packed));
}
return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis());
}(Add import io.trino.spi.type.DateTimeEncoding;.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (type.equals(TIMESTAMP_TZ_MILLIS)) { | |
| if (trinoNativeValue instanceof Long) { | |
| return trinoNativeValue; | |
| } | |
| return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis()); | |
| } | |
| if (type.equals(TIMESTAMP_TZ_MILLIS)) { | |
| if (trinoNativeValue instanceof Long packed) { | |
| return Timestamp.fromEpochMillis(DateTimeEncoding.unpackMillisUtc(packed)); | |
| } | |
| return Timestamp.fromEpochMillis(((LongTimestampWithTimeZone) trinoNativeValue).getEpochMillis()); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonFilterConverter.java`
around lines 297 - 302, The TIMESTAMP_TZ_MILLIS branch in PaimonFilterConverter
is returning Trino's packed long (DateTimeEncoding.packDateTimeWithZone)
directly, feeding a packed value to Paimon; unpack the millis before
constructing the Timestamp: when trinoNativeValue is a Long extract millis =
DateTimeEncoding.unpackMillisUtc((Long) trinoNativeValue) (import
io.trino.spi.type.DateTimeEncoding), and when it's a LongTimestampWithTimeZone
use its getEpochMillis(); then call Timestamp.fromEpochMillis(millis) and return
that value so Paimon receives raw epoch milliseconds.
| @Override | ||
| protected Slice readTailInternal(int length) | ||
| throws IOException | ||
| { | ||
| long readStart = System.nanoTime(); | ||
| Slice tail = input.readTail(length); | ||
| stats.readDataBytesPerSecond(tail.length(), System.nanoTime() - readStart); | ||
| return input.readTail(length); | ||
| } |
There was a problem hiding this comment.
Critical: readTailInternal reads the tail twice and returns the unmeasured second read.
input.readTail(length) is invoked twice — the first result is timed and fed into stats, but the second (unmeasured) result is returned. This doubles I/O for every ORC footer/stripe-tail read, and the returned bytes are not the ones reflected in readDataBytesPerSecond. Return the originally read tail.
🐛 Fix
`@Override`
protected Slice readTailInternal(int length)
throws IOException
{
long readStart = System.nanoTime();
Slice tail = input.readTail(length);
stats.readDataBytesPerSecond(tail.length(), System.nanoTime() - readStart);
- return input.readTail(length);
+ return tail;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonOrcDataSource.java`
around lines 49 - 57, The method readTailInternal currently calls
input.readTail(length) twice: once to measure and record bytes via
stats.readDataBytesPerSecond using the local variable Slice tail, then it
incorrectly returns a second call to input.readTail(length), causing duplicate
I/O and mismatched metrics. Fix by returning the previously read and measured
Slice tail instead of calling input.readTail(length) again; update the return
statement in readTailInternal to return the local variable tail so the measured
data matches the returned data.
| import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableList; | ||
| import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableMap; |
There was a problem hiding this comment.
Wrong Guava imports — from the GCS-connector shaded package.
import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableList;
import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableMap;These are repackaged Guava classes shipped inside gcs-connector's shaded jar. Depending on them from the Paimon plugin is almost certainly unintended:
- It introduces a hard, fragile dependency on the GCS connector's internal shading layout — any upgrade of the GCS connector that renames the shade prefix breaks the plugin.
- At runtime these classes may not be visible on the plugin classpath →
NoClassDefFoundError/ClassNotFoundException. - They are not the same
ImmutableList/ImmutableMapthe rest of Trino uses, so types from this file won't be interoperable with the rest of the codebase despite compiling.
Replace with the standard Guava imports. Elsewhere this file already uses com.google.common.collect.ImmutableList.toImmutableList / ImmutableSet.toImmutableSet static imports, so regular Guava is available.
🐛 Proposed fix
-import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableList;
-import com.google.cloud.hadoop.repackaged.gcs.com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonPageSourceProvider.java`
around lines 16 - 17, The file imports shaded Guava classes from the GCS
connector; replace the two repackaged imports with the standard Guava ones (use
com.google.common.collect.ImmutableList and
com.google.common.collect.ImmutableMap) so PaimonPageSourceProvider and its
usages reference the real Guava types rather than the GCS-connector shaded
variants; remove the com.google.cloud.hadoop.repackaged.gcs... imports and add
the com.google.common.collect imports so types remain compatible with the rest
of the codebase.
| @Override | ||
| public int getInt(int i) | ||
| { | ||
| if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIME_WITHOUT_TIME_ZONE) { | ||
| if (cacheTypes[i] == null) { | ||
| cacheTypes[i] = TimeType.createTimeType(((org.apache.paimon.types.TimeType) rowType.getTypeAt(i)).getPrecision()); | ||
| } | ||
| SqlTime sqlTime = (SqlTime) cacheTypes[i].getObjectValue(singlePage.getBlock(i), 0); | ||
| return (int) (sqlTime.getPicos() / 1_000_000); | ||
| } | ||
|
|
||
| long value = (long) readNativeValue(INTEGER, singlePage.getBlock(i), 0); | ||
| checkArgument(value >= Integer.MIN_VALUE && value <= Integer.MAX_VALUE, "Value out of range for int: %s", value); | ||
| return (int) value; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Apache Paimon InternalRow getInt TIME_WITHOUT_TIME_ZONE milliseconds of day
💡 Result:
In Apache Paimon, for a TIME_WITHOUT_TIME_ZONE field (TIME type), InternalRow.getInt(pos) returns the time as milliseconds since midnight (00:00:00.000). This is the standard internal representation for TIME WITHOUT TIME ZONE in Paimon, consistent with common database practices where TIME is stored as an integer representing milliseconds-of-day (0 to 86399999). Paimon's DataGetters interface (implemented by InternalRow) provides typed accessors like getInt for INT-compatible types, and TIME fits this as a compact int storage. Official docs confirm TIME as time without time zone up to nanosecond precision but defaults to int milliseconds for precision 0-2.
Citations:
- 1: https://paimon.apache.org/docs/1.1/api/java/org/apache/paimon/types/package-frame.html
- 2: https://paimon.apache.org/docs/1.1/api/java/org/apache/paimon/utils/InternalRowUtils.html
- 3: https://paimon.apache.org/docs/1.2/concepts/data-types/
- 4: https://paimon.apache.org/docs/1.0/concepts/data-types/
- 5: https://paimon.apache.org/docs/master/concepts/data-types/
- 6: https://paimon.apache.org/docs/0.9/api/java/org/apache/paimon/types/package-summary.html
- 7: https://paimon.apache.org/docs/1.1/api/java/org/apache/paimon/data/DataGetters.html
- 8: https://gitbox.apache.org/repos/asf/paimon.git
- 9: https://paimon.apache.org/docs/1.1/api/java/org/apache/paimon/spark/AbstractSparkInternalRow.html
- 10: https://paimon.apache.org/docs/1.1/api/java/org/apache/paimon/spark/SparkInternalRowWrapper.html
🏁 Script executed:
#!/bin/bash
# Locate the PaimonRow.java file
find . -name "PaimonRow.java" -type fRepository: trinodb/trino
Length of output: 130
🏁 Script executed:
#!/bin/bash
# Check the code at lines 121-135
cat -n plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java | sed -n '121,135p'Repository: trinodb/trino
Length of output: 889
🏁 Script executed:
#!/bin/bash
# Search for SqlTime class and getPicos() method documentation in Trino
rg -A 5 "class SqlTime" --type javaRepository: trinodb/trino
Length of output: 2365
🏁 Script executed:
#!/bin/bash
# Check if toIntExact is used elsewhere in the codebase (for proposed fix)
rg "toIntExact" plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/ --type javaRepository: trinodb/trino
Length of output: 890
🏁 Script executed:
#!/bin/bash
# Get the full SqlTime class to see getPicos() method signature
cat core/trino-spi/src/main/java/io/trino/spi/type/SqlTime.javaRepository: trinodb/trino
Length of output: 4064
🏁 Script executed:
#!/bin/bash
# Search for other TIME conversions in the codebase to understand Trino's practices
rg -A 3 "getPicos\(\)" --type java | head -50Repository: trinodb/trino
Length of output: 4969
🏁 Script executed:
#!/bin/bash
# Find scalePicosToMillis function to see how it handles rounding
rg "scalePicosToMillis|PICOSECONDS_PER_MILLISECOND" --type java -B 2 -A 5 | head -60Repository: trinodb/trino
Length of output: 8676
🏁 Script executed:
#!/bin/bash
# Check for rounding patterns with picoseconds conversions
rg "roundDiv|Math\\.ceil|Math\\.floor" --type java | grep -i "picos\|millis" | head -20Repository: trinodb/trino
Length of output: 3802
🏁 Script executed:
#!/bin/bash
# Get the TimeConversions utility class
cat plugin/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/json/format/util/TimeConversions.javaRepository: trinodb/trino
Length of output: 6361
Fix incorrect TIME-to-milliseconds conversion.
SqlTime.getPicos() returns picoseconds-of-day, but the current code divides by 1_000_000 (converting to microseconds), while Paimon's InternalRow.getInt() for TIME_WITHOUT_TIME_ZONE requires milliseconds-of-day. The divisor must be 1_000_000_000 (not 1_000_000), making the current return value off by a factor of 1000.
Additionally, use scalePicosToMillis() (or implement proper rounding) instead of plain truncation to handle sub-millisecond precision correctly. Trino's standard pattern for picos-to-millis conversion uses rounding via rescaleWithRounding(), which applies (value + 500_000_000) / 1_000_000_000 logic for correct nearest-millisecond rounding.
Suggested fix
- SqlTime sqlTime = (SqlTime) cacheTypes[i].getObjectValue(singlePage.getBlock(i), 0);
- return (int) (sqlTime.getPicos() / 1_000_000);
+ SqlTime sqlTime = (SqlTime) cacheTypes[i].getObjectValue(singlePage.getBlock(i), 0);
+ // Paimon TIME values are milliseconds-of-day; SqlTime stores picoseconds-of-day.
+ return toIntExact((sqlTime.getPicos() + 500_000_000L) / 1_000_000_000L);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 121 - 135, The getInt method incorrectly converts
SqlTime.getPicos() (picoseconds-of-day) by dividing by 1_000_000; update the
TIME_WITHOUT_TIME_ZONE branch in PaimonRow.getInt to convert picoseconds to
milliseconds-of-day by dividing by 1_000_000_000 and apply proper rounding (use
the existing rescaleWithRounding/scalePicosToMillis helper if available, or
implement rounding as (picos + 500_000_000) / 1_000_000_000) before casting to
int; adjust references to cacheTypes, TimeType.createTimeType, SqlTime.getPicos,
and the getInt return so the value is milliseconds-of-day with
nearest-millisecond rounding.
| @Override | ||
| public Decimal getDecimal(int i, int decimalPrecision, int decimalScale) | ||
| { | ||
| Object value = | ||
| readNativeValue( | ||
| DecimalType.createDecimalType(decimalPrecision, decimalScale), | ||
| singlePage.getBlock(i), | ||
| 0); | ||
| if (decimalPrecision <= MAX_SHORT_PRECISION) { | ||
| return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale); | ||
| } | ||
| else { | ||
| long high = ((Int128) value).getHigh(); | ||
| long low = ((Int128) value).getLow(); | ||
| BigInteger bigIntegerValue = | ||
| BigInteger.valueOf(high).shiftLeft(64).add(BigInteger.valueOf(low)); | ||
| BigDecimal bigDecimalValue = new BigDecimal(bigIntegerValue, decimalScale); | ||
| return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale); | ||
| } | ||
| } |
There was a problem hiding this comment.
Incorrect Int128 → BigInteger conversion for long decimals.
BigInteger.valueOf(low) treats low as signed, but the lower 64 bits of Int128 must be interpreted as unsigned. Whenever low has its high bit set, the resulting BigInteger is off by 2^64, producing silently-wrong decimals.
Use Int128.toBigInteger() directly, which handles the sign/two's-complement conversion correctly.
🐛 Proposed fix
`@Override`
public Decimal getDecimal(int i, int decimalPrecision, int decimalScale)
{
Object value =
readNativeValue(
DecimalType.createDecimalType(decimalPrecision, decimalScale),
singlePage.getBlock(i),
0);
if (decimalPrecision <= MAX_SHORT_PRECISION) {
return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale);
}
- else {
- long high = ((Int128) value).getHigh();
- long low = ((Int128) value).getLow();
- BigInteger bigIntegerValue =
- BigInteger.valueOf(high).shiftLeft(64).add(BigInteger.valueOf(low));
- BigDecimal bigDecimalValue = new BigDecimal(bigIntegerValue, decimalScale);
- return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
- }
+ BigDecimal bigDecimalValue = new BigDecimal(((Int128) value).toBigInteger(), decimalScale);
+ return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public Decimal getDecimal(int i, int decimalPrecision, int decimalScale) | |
| { | |
| Object value = | |
| readNativeValue( | |
| DecimalType.createDecimalType(decimalPrecision, decimalScale), | |
| singlePage.getBlock(i), | |
| 0); | |
| if (decimalPrecision <= MAX_SHORT_PRECISION) { | |
| return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale); | |
| } | |
| else { | |
| long high = ((Int128) value).getHigh(); | |
| long low = ((Int128) value).getLow(); | |
| BigInteger bigIntegerValue = | |
| BigInteger.valueOf(high).shiftLeft(64).add(BigInteger.valueOf(low)); | |
| BigDecimal bigDecimalValue = new BigDecimal(bigIntegerValue, decimalScale); | |
| return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale); | |
| } | |
| } | |
| `@Override` | |
| public Decimal getDecimal(int i, int decimalPrecision, int decimalScale) | |
| { | |
| Object value = | |
| readNativeValue( | |
| DecimalType.createDecimalType(decimalPrecision, decimalScale), | |
| singlePage.getBlock(i), | |
| 0); | |
| if (decimalPrecision <= MAX_SHORT_PRECISION) { | |
| return Decimal.fromUnscaledLong((Long) value, decimalPrecision, decimalScale); | |
| } | |
| BigDecimal bigDecimalValue = new BigDecimal(((Int128) value).toBigInteger(), decimalScale); | |
| return Decimal.fromBigDecimal(bigDecimalValue, decimalPrecision, decimalScale); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 162 - 181, In PaimonRow.getDecimal the manual Int128 → BigInteger
conversion treats the lower 64 bits as signed causing wrong values; replace the
high/low construction with a proper two's-complement conversion by calling
Int128.toBigInteger() on the (Int128) value, then construct the BigDecimal from
that BigInteger with the decimalScale and pass it to
Decimal.fromBigDecimal(decimalPrecision, decimalScale) as before (update the
branch handling large decimals in getDecimal to use Int128.toBigInteger()).
| @Override | ||
| public Timestamp getTimestamp(int i, int timestampPrecision) | ||
| { | ||
| if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE) { | ||
| SqlTimestampWithTimeZone sqlTimestampWithTimeZone = (SqlTimestampWithTimeZone) TimestampWithTimeZoneType.createTimestampWithTimeZoneType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0); | ||
| return Timestamp.fromEpochMillis(sqlTimestampWithTimeZone.getEpochMillis(), sqlTimestampWithTimeZone.getPicosOfMilli() * 1000); | ||
| } | ||
| else { | ||
| SqlTimestamp sqlTimestamp = (SqlTimestamp) TimestampType.createTimestampType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0); | ||
| return Timestamp.fromLocalDateTime(sqlTimestamp.toLocalDateTime()); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Apache Paimon Timestamp.fromEpochMillis signature nanosOfMillisecond
💡 Result:
In Apache Paimon, the org.apache.paimon.data.Timestamp class has two fromEpochMillis factory methods: 1. public static Timestamp fromEpochMillis(long milliseconds) - Creates a Timestamp from milliseconds since 1970-01-01 00:00:00, setting nanos-of-millisecond to zero. 2. public static Timestamp fromEpochMillis(long milliseconds, int nanosOfMillisecond) - Creates a Timestamp from milliseconds since 1970-01-01 00:00:00 and nanoseconds within that millisecond (0 to 999,999).
Citations:
- 1: https://paimon.apache.org/docs/0.9/api/java/org/apache/paimon/data/Timestamp.html
- 2: https://paimon.apache.org/docs/1.1/api/java/org/apache/paimon/data/class-use/Timestamp.html
🏁 Script executed:
# Search for SqlTimestampWithTimeZone in the codebase to understand getPicosOfMilli()
find . -name "*.java" -type f | xargs grep -l "SqlTimestampWithTimeZone" | head -20Repository: trinodb/trino
Length of output: 1708
🏁 Script executed:
# Look at the actual PaimonRow.java file to confirm the implementation
find . -name "PaimonRow.java" -type fRepository: trinodb/trino
Length of output: 130
🏁 Script executed:
# Read SqlTimestampWithTimeZone to understand getPicosOfMilli()
cat -n ./core/trino-spi/src/main/java/io/trino/spi/type/SqlTimestampWithTimeZone.javaRepository: trinodb/trino
Length of output: 7498
🏁 Script executed:
# Read PaimonRow.java around lines 183-194
sed -n '175,200p' ./plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.javaRepository: trinodb/trino
Length of output: 1414
Fix unit conversion from picoseconds to nanoseconds in getTimestamp for TIMESTAMP WITH LOCAL TIME ZONE.
SqlTimestampWithTimeZone.getPicosOfMilli() returns picoseconds-of-millisecond (0..999,999,999), but Timestamp.fromEpochMillis(long, int) expects nanoseconds-of-millisecond (0..999,999). The current * 1000 multiplies the picosecond value out of range (up to ~10^12); it should be / 1000 to convert picos→nanos.
Proposed fix
- return Timestamp.fromEpochMillis(sqlTimestampWithTimeZone.getEpochMillis(), sqlTimestampWithTimeZone.getPicosOfMilli() * 1000);
+ return Timestamp.fromEpochMillis(
+ sqlTimestampWithTimeZone.getEpochMillis(),
+ (int) (sqlTimestampWithTimeZone.getPicosOfMilli() / 1000));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public Timestamp getTimestamp(int i, int timestampPrecision) | |
| { | |
| if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE) { | |
| SqlTimestampWithTimeZone sqlTimestampWithTimeZone = (SqlTimestampWithTimeZone) TimestampWithTimeZoneType.createTimestampWithTimeZoneType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0); | |
| return Timestamp.fromEpochMillis(sqlTimestampWithTimeZone.getEpochMillis(), sqlTimestampWithTimeZone.getPicosOfMilli() * 1000); | |
| } | |
| else { | |
| SqlTimestamp sqlTimestamp = (SqlTimestamp) TimestampType.createTimestampType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0); | |
| return Timestamp.fromLocalDateTime(sqlTimestamp.toLocalDateTime()); | |
| } | |
| } | |
| `@Override` | |
| public Timestamp getTimestamp(int i, int timestampPrecision) | |
| { | |
| if (rowType.getTypeAt(i).getTypeRoot() == DataTypeRoot.TIMESTAMP_WITH_LOCAL_TIME_ZONE) { | |
| SqlTimestampWithTimeZone sqlTimestampWithTimeZone = (SqlTimestampWithTimeZone) TimestampWithTimeZoneType.createTimestampWithTimeZoneType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0); | |
| return Timestamp.fromEpochMillis( | |
| sqlTimestampWithTimeZone.getEpochMillis(), | |
| (int) (sqlTimestampWithTimeZone.getPicosOfMilli() / 1000)); | |
| } | |
| else { | |
| SqlTimestamp sqlTimestamp = (SqlTimestamp) TimestampType.createTimestampType(timestampPrecision).getObjectValue(singlePage.getBlock(i), 0); | |
| return Timestamp.fromLocalDateTime(sqlTimestamp.toLocalDateTime()); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonRow.java`
around lines 183 - 194, In PaimonRow.getTimestamp, the
TIMESTAMP_WITH_LOCAL_TIME_ZONE branch incorrectly converts picoseconds to
nanoseconds by multiplying picosOfMilli by 1000; change the conversion to divide
SqlTimestampWithTimeZone.getPicosOfMilli() by 1000 before passing to
Timestamp.fromEpochMillis so you supply nanoseconds-of-millisecond (int) not
picoseconds (long). Locate the block inside getTimestamp that constructs
SqlTimestampWithTimeZone and replace the picos -> nanos conversion (currently "*
1000") with "/ 1000" and ensure the resulting value is cast/typed to the int
parameter expected by Timestamp.fromEpochMillis.
| public static List<OptionInfo> getOptionInfos() | ||
| { | ||
| List<OptionInfo> optionInfos = new ArrayList<>(); | ||
| List<OptionWithMetaInfo> optionWithMetaInfos = extractConfigOptions(CoreOptions.class); | ||
| String className = ""; | ||
| for (OptionWithMetaInfo optionWithMetaInfo : optionWithMetaInfos) { | ||
| if (shouldSkip(optionWithMetaInfo.field.getName())) { | ||
| continue; | ||
| } | ||
|
|
||
| Type genericType = optionWithMetaInfo.field.getGenericType(); | ||
| if (genericType instanceof ParameterizedType parameterizedType) { | ||
| Type[] actualTypeArguments = parameterizedType.getActualTypeArguments(); | ||
| for (Type actualTypeArgument : actualTypeArguments) { | ||
| if (actualTypeArgument instanceof Class<?>) { | ||
| className = ((Class<?>) actualTypeArgument).getSimpleName(); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| optionInfos.add( | ||
| new OptionInfo( | ||
| convertOptionKey(optionWithMetaInfo.option.key()), | ||
| optionWithMetaInfo.option.key(), | ||
| buildClass(className), | ||
| isEnum(className), | ||
| className)); | ||
| } | ||
| return optionInfos; | ||
| } |
There was a problem hiding this comment.
Bug: className leaks across loop iterations.
className is declared once outside the for-loop (Line 52) and is only reassigned inside the if (genericType instanceof ParameterizedType ...) branch. For any field whose generic type is not a ParameterizedType (or whose actual type argument is not a Class<?>), the new OptionInfo will be constructed with the previous iteration's className, leading to wrong isEnum/buildClass() values and malformed property metadata in PaimonTableOptions.
Reset className at the start of each iteration (or declare it inside the loop):
🐛 Fix
- String className = "";
for (OptionWithMetaInfo optionWithMetaInfo : optionWithMetaInfos) {
if (shouldSkip(optionWithMetaInfo.field.getName())) {
continue;
}
+ String className = "";
Type genericType = optionWithMetaInfo.field.getGenericType();
if (genericType instanceof ParameterizedType parameterizedType) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/trino-paimon/src/main/java/io/trino/plugin/paimon/PaimonTableOptionUtils.java`
around lines 48 - 77, In getOptionInfos(), className is declared outside the
for-loop and can leak previous values into OptionInfo construction; move the
declaration of className (or reset it to empty) at the start of each iteration
of the for (OptionWithMetaInfo optionWithMetaInfo : optionWithMetaInfos) loop so
that when examining optionWithMetaInfo.field.getGenericType() (and the
ParameterizedType/actualTypeArgument handling) you only use the className
derived for that field when calling buildClass(className) and isEnum(className)
to construct the new OptionInfo.
|
thanks @martint @leaves12138 can you check? |
|
@leaves12138 do you need any help here? |
Description
Apache Paimonis a lake format that enables building a Realtime Lakehouse Architecture. After it gradulates, it becomes the Apache top level project. I want to introduceApache Paimonas aConnector of
Trino, just likeIcebergorHudiconnector.Paimonwebsite: https://paimon.apache.org/Paimongithub home page: https://github.com/apache/paimonSupported: read from Paimon
Additional context and related issues