Add Doris connector#29120
Conversation
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
I submitted my signed CLA to cla@trino.io on April 1, 2026, but the CLA check is still failing. Please let me know if there is anything else I should do on my side. |
01c7cf2 to
95566bb
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
1 similar comment
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
b59a6b4 to
34fc8af
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
Compared to using the generic MySQL connector 1.More Data Type Support (Including boolean and largeint) This new Doris connector properly supports and maps these types. As demonstrated in the test queries below, types like boolean, decimal(38,10) and various datetime scales are now natively supported and correctly formatted without casting errors. For example: The MySQL connector maps Doris boolean to tinyint.
2.Resolving the Mixed-Case Identifier Issue |
|
@ebyhr Hi! This PR introduces the initial Doris connector. Could you please help approve the workflow execution and take a look at the implementation when you have time? |
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
ea8a08e to
4c26ca9
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
4c26ca9 to
db17789
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
2 similar comments
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
219d2ac to
b6a67c3
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
1 similar comment
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Introduce the initial Doris connector, including metadata access, query planning, type mapping, and read paths needed to query Doris tables from Trino. Add the doc, unit tests, and product tests needed.
6f343cc to
372d052
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
Following the community’s suggestion, we narrowed the scope of this PR for the initial version of the connector. In this revision, we removed pushdown support to keep the PR minimal. Even without pushdown, the Doris connector is still faster than the MySQL connector in most query scenarios in our benchmark, thanks to the new connection approach and related improvements. Below is a comparison of the average execution time of all 22 TPC-H SF100 queries between:
We also include a performance comparison between:
|
|
Hi @ebyhr. Could you please take a look? I’d really appreciate your review. |
| </artifact> | ||
| </artifactSet> | ||
|
|
||
| <artifactSet to="plugin/doris"> |
There was a problem hiding this comment.
Done.
I added plugin/trino-doris to the standard CI test matrix and excluded :trino-doris from the fallback test-other-modules list.
| * - `doris.largeint-mapping` | ||
| - No | ||
| - Mapping for Doris `LARGEINT`. Valid values are `VARCHAR` and `DECIMAL`. | ||
| The default is `VARCHAR`. |
There was a problem hiding this comment.
Can we remove this config property and map to NUMBER type instead?
Also, is this required for the initial PR? I want to handle it as a follow-up.
There was a problem hiding this comment.
Done.
I removed the doris.largeint-mapping config, enum, docs, and config tests.
LARGEINT is now mapped directly to Trino NUMBER, and the Arrow conversion path handles NUMBER values. Additionally, I have verified in a real test environment that -170141183460469231731687303715884105 and 170141183460469231731687303715884105 can be selected successfully.
| public DorisMetadata(DorisMetadataClient metadataClient, DorisTypeMapper typeMapper) | ||
| { | ||
| this(metadataClient, typeMapper, Set.of()); | ||
| } |
There was a problem hiding this comment.
This constructor can be removed. It is called only from tests.
There was a problem hiding this comment.
Done.
I removed the test-only DorisMetadata constructor and updated the tests to use the production constructor explicitly.
| @SuppressWarnings("deprecation") | ||
| @Override | ||
| public Map<SchemaTableName, List<ColumnMetadata>> listTableColumns(ConnectorSession session, SchemaTablePrefix prefix) |
There was a problem hiding this comment.
Please implement streamRelationColumns method instead.
There was a problem hiding this comment.
Done.
I replaced the deprecated listTableColumns implementation with streamRelationColumns and return RelationColumnsMetadata for both tables and views.
|
|
||
| import io.trino.spi.connector.ConnectorSession; | ||
|
|
||
| public interface DorisQueryEventListener |
There was a problem hiding this comment.
Addressed.
The listener was only a placeholder for future extension, so I removed the interface, beginQuery/cleanupQuery hooks, bindings, and related tests.
| throw t; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Done.
I added a main method to DorisQueryRunner following the style used by other connector query runners.
| public void clearLastRequest() | ||
| { | ||
| lastRequest.set(null); | ||
| } | ||
|
|
||
| public Optional<Request> getLastRequest() | ||
| { | ||
| return Optional.ofNullable(lastRequest.get()); | ||
| } |
There was a problem hiding this comment.
Done.
Unused methods have been removed.
| import java.util.Optional; | ||
| import java.util.OptionalLong; | ||
|
|
||
| public interface DorisMetadataClient |
There was a problem hiding this comment.
Let's avoid having 2 methods with/without ConnectorSession parameter.
There was a problem hiding this comment.
Done.
I simplified DorisMetadataClient and updated the JDBC implementation and tests accordingly.
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| @Execution(ExecutionMode.SAME_THREAD) | ||
| final class TestDorisTypeMapping |
There was a problem hiding this comment.
Type mapping should cover more values, e.g. null, min, max, non-ascii characters, julian->gregorian switch, DST, and etc.
There was a problem hiding this comment.
Done.
I expanded TestDorisTypeMapping to cover nulls, min/max values, non-ASCII strings, Julian/Gregorian date boundaries, DST-like timestamp values, decimal boundaries, and char/varchar empty and trailing-space cases.
| try (Connection connection = openConnection(session); | ||
| PreparedStatement statement = connection.prepareStatement(LIST_SCHEMAS_SQL.formatted(VISIBLE_SCHEMAS_PREDICATE)); | ||
| ResultSet resultSet = statement.executeQuery()) { | ||
| List<String> schemas = new ArrayList<>(); |
There was a problem hiding this comment.
Why do we need to build this query in the connector? Does JDBC driver's DatabaseMetaData not work?
There was a problem hiding this comment.
We cannot replace the Doris metadata queries with DatabaseMetaData wholesale. The connector needs Doris-specific INFORMATION_SCHEMA fields such as ENGINE, TABLE_ROWS, and raw COLUMN_TYPE. JDBC DatabaseMetaData exposes standard table/column metadata but does not preserve these fields with the fidelity needed for Doris table filtering, row-count estimates, and Doris-specific type mapping.
Case-insensitive resolution is also handled explicitly because Trino normalizes identifiers to lowercase.
There was a problem hiding this comment.
However, listSchemaNames could perhaps be changed to use JDBC DatabaseMetaData.getCatalogs(), but getTables / getTable / loadColumns / getTableRowCount cannot be switched wholesale, because the standard DatabaseMetaData API does not expose the Doris-specific fields required here.
Add the Doris module to the CI test matrix and exclude it from the fallback module list. Remove the LARGEINT mapping configuration and map LARGEINT to NUMBER directly. Replace deprecated metadata column listing with streamRelationColumns. Simplify metadata APIs by keeping only ConnectorSession-aware methods. Remove unused query listener hooks and test request state. Add the QueryRunner main entry point and expand type mapping tests for nulls, boundary values, non-ASCII strings, date/time edge cases, and decimal limits.
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
|
Thanks for your review, @ebyhr. We have addressed the comments and pushed a new round of fixes. Could you please take another look when you have time? |
Expose the root cause from Doris Flight SQL failures in connector error messages, so users can see actionable Doris or network errors without debugging the wrapped ADBC exception. Document the FE and BE Flight SQL endpoint requirements, including BE arrow_flight_sql_port setup and public_host/proxy configuration for deployments where Doris returns internal BE addresses.
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
a3f0594 to
2299d73
Compare




Description
Introduce the initial Doris connector implementation.
The connector uses three Doris interfaces:
It includes support for metadata access, type mapping, split generation, predicate and limit pushdown, and reading Doris tables from Trino.
This change also adds the required configuration, unit tests, integration tests, and product tests to validate the implementation and provide a baseline for future improvements.
Additional context and related issues
Compared with using the MySQL connector against Doris, this connector adds Doris-specific split planning, Flight SQL-based reads, and Doris-specific type handling.
Related discussions:
Testing includes:
BaseConnectorTestpatternFollow-up changes may further improve performance, high availability, caching, and pushdown coverage.
Release notes
(x) Release notes are required, with the following suggested text: