Skip to content

Add StarRocks connector#29189

Draft
infocusmodereal wants to merge 3 commits intotrinodb:masterfrom
infocusmodereal:starrocks-connector
Draft

Add StarRocks connector#29189
infocusmodereal wants to merge 3 commits intotrinodb:masterfrom
infocusmodereal:starrocks-connector

Conversation

@infocusmodereal
Copy link
Copy Markdown

@infocusmodereal infocusmodereal commented Apr 22, 2026

Description

Add an initial native read-only StarRocks connector.

The connector uses the StarRocks JDBC driver for metadata discovery and Apache Arrow Flight SQL for reads. This avoids MySQL-specific metadata assumptions that can break on StarRocks, such as relying on JDBC COLUMN_SIZE being present during type discovery.

The initial version focuses on:

  • stable schema, table, view, and column introspection
  • DESCRIBE and SHOW CREATE TABLE support
  • read-only SELECT queries over Arrow Flight SQL
  • predicate, limit, TopN, and basic aggregation pushdown
  • Flight SQL partitioned split planning with safe single-split fallback
  • optional TLS configuration for Flight SQL connections
  • conservative handling of StarRocks-specific and unsigned integer types
  • native read support for parseable ARRAY, MAP, and STRUCT values

Out of scope for v1 are writes, schema mutations, connector-managed DDL for complex types, and advanced optimizations beyond Flight SQL partition descriptors returned by StarRocks.

Additional context and related issues

This work follows the design direction discussed in #28735 and is informed by the Doris Flight SQL connector in #29120, while remaining a StarRocks-specific implementation rather than extending the generic MySQL connector path.

Prior StarRocks-specific work for context:

The StarRocks-specific choices in this PR are:

  • metadata comes from the StarRocks JDBC driver and StarRocks INFORMATION_SCHEMA rather than MySQL connector metadata code
  • reads use Arrow Flight SQL rather than MySQL wire protocol
  • catalog-aware metadata supports starrocks.catalog-name, with fallback for StarRocks versions where INFORMATION_SCHEMA.COLUMNS.TABLE_CATALOG is absent or not populated
  • Flight SQL can use TLS via connector configuration
  • unsupported aggregate forms and non-parseable complex values fall back to Trino execution or conservative type mapping instead of breaking table introspection

Validation includes:

  • unit tests for config, type mapping, metadata fallback behavior, Arrow conversion, query building, split planning, and page source behavior
  • connector smoke tests following the BaseConnectorSmokeTest pattern
  • integration smoke tests against a live StarRocks instance
  • local errorprone-compiler validation matching the previously failing CI job

Local validation commands:

  • ./mvnw -pl plugin/trino-starrocks test
  • ./mvnw -pl plugin/trino-starrocks clean compile test-compile -DskipTests -Dair.check.skip-all=true -P errorprone-compiler
  • ./mvnw -pl plugin/trino-starrocks -Dtest=TestStarRocksIntegrationSmokeTest -Dstarrocks.test.integration.enabled=true -Dstarrocks.test.jdbc-url=jdbc:starrocks://127.0.0.1:9031 -Dstarrocks.test.flight-sql-host=127.0.0.1 -Dstarrocks.test.flight-sql-port=9408 test

Local validation notes:

  • a Docker-based StarRocks environment was used outside the repository
  • the live validation ran against StarRocks 4.0.9-f647589
  • FE MySQL was exposed on 9031 and Arrow Flight SQL on 9408
  • the gated smoke covered SHOW SCHEMAS, SHOW TABLES, DESCRIBE, SHOW CREATE TABLE, SELECT count(*), datetime reads, views, largeint fallback, and basic aggregation pushdown
  • earlier 100k-row local benchmark validation showed representative average latencies of 90.6 ms for SHOW TABLES, 198.6 ms for DESCRIBE, 241.6 ms for SELECT count(*), 252.2 ms for SELECT sum(amount), 253.0 ms for a grouped aggregate, and 261.8 ms for a filtered projection over 1k rows
  • these numbers are intended as local validation signals for the native JDBC metadata plus Arrow Flight SQL design, not as a formal performance claim

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## StarRocks connector
* Add the native read-only StarRocks connector backed by StarRocks JDBC metadata and Arrow Flight SQL reads. ({issue}`28735`)

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 22, 2026

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: ivan.torres.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 22, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@infocusmodereal
Copy link
Copy Markdown
Author

@cla-bot check

@cla-bot cla-bot Bot added the cla-signed label Apr 28, 2026
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 28, 2026

The cla-bot has been summoned, and re-checked this pull request!

@infocusmodereal infocusmodereal force-pushed the starrocks-connector branch 3 times, most recently from 12fd482 to 74da5e8 Compare April 28, 2026 18:35
Use StarRocks JDBC for metadata and Arrow Flight SQL for reads.

The connector is read-only in v1 and avoids MySQL-specific metadata
assumptions that break on StarRocks, including reliance on JDBC
COLUMN_SIZE during type discovery. It adds docs, unit tests,
connector smoke tests, and live integration smoke tests for schema,
table, view, and basic SELECT query coverage.
@infocusmodereal infocusmodereal changed the title [WIP] Add StarRocks connector Add StarRocks connector May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant