Skip to content

Fix common CI failures#29402

Merged
electrum merged 21 commits into
masterfrom
user/electrum/fix-ci
May 11, 2026
Merged

Fix common CI failures#29402
electrum merged 21 commits into
masterfrom
user/electrum/fix-ci

Conversation

@electrum
Copy link
Copy Markdown
Member

@electrum electrum commented May 9, 2026

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

@electrum electrum requested review from Copilot, dain, ebyhr and findepi May 9, 2026 22:13
@github-actions github-actions Bot added iceberg Iceberg connector delta-lake Delta Lake connector cassandra Cassandra connector google-sheets Google Sheets connector labels May 9, 2026
@electrum
Copy link
Copy Markdown
Member Author

electrum commented May 9, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR centralizes retry and transient-failure detection, tightens test synchronization and isolation, and improves cleanup in integration tests. It adds Maven Wrapper caching and verification in CI, refactors Google Sheets backoff helpers and test locking, requires QueryRunner for Cassandra visibility checks, extracts shared failure-verification helpers for Iceberg, wraps BigQuery execution with a retry policy, hardens Redshift/SQLServer retry predicates, and introduces Databricks failure classifiers with unit tests.

Possibly related PRs

  • trinodb/trino#29070: Changes similar filesystem configuration keys and related tests/builders; likely related to configuration and test updates.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.42.1)
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestEngineOnlyQueries.java
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets common CI flakiness by tightening retry/failure classification for Databricks-related product tests, improving test isolation/serialization for resource-constrained or externally dependent tests, and stabilizing CI setup performance via improved caching.

Changes:

  • Refactors Databricks transient/communication failure detection into reusable helpers and adds focused unit tests for the classification.
  • Serializes/selectively isolates flaky tests (Google Sheets live API tests, memory-heavy SPI test) and improves cleanup/uniqueness in Glue metastore tests.
  • Improves CI reliability/performance: caches Maven wrapper dists; adjusts Google Sheets HTTP backoff behavior; improves Cassandra visibility waiting via Trino.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
testing/trino-product-tests/src/test/java/io/trino/tests/product/utils/TestQueryExecutors.java Adds unit coverage for Databricks transient-failure classification.
testing/trino-product-tests/src/test/java/io/trino/tests/product/deltalake/util/TestDeltaLakeTestUtils.java Adds unit coverage for Databricks communication-failure classification.
testing/trino-product-tests/src/main/java/io/trino/tests/product/utils/QueryExecutors.java Extracts Databricks retry predicate into a helper and expands transient conditions.
testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/util/DeltaLakeTestUtils.java Centralizes regex compilation and extracts retry predicate for Databricks communication failures.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/nessie/TestIcebergNessieCatalogConnectorSmokeTest.java Special-cases permissible concurrent-delete failures for Nessie; adds causal-chain assertions.
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorSmokeTest.java Extracts concurrent-delete failure assertion into an overridable hook.
plugin/trino-google-sheets/src/test/java/io/trino/plugin/google/sheets/TestGoogleSheetsWithoutMetadataSheetId.java Forces same-thread execution and resource locking for live API stability.
plugin/trino-google-sheets/src/test/java/io/trino/plugin/google/sheets/TestGoogleSheets.java Uses new backoff/handler helpers and serializes execution with a shared lock.
plugin/trino-google-sheets/src/main/java/io/trino/plugin/google/sheets/SheetsClient.java Makes backoff/handler creation per-request; broadens unsuccessful-response backoff to include 429 and 5xx.
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeSharedGlueMetastoreWithTableRedirections.java Serializes test, avoids name collisions, and ensures Glue table cleanup even on failure.
plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/TestCassandraTypeMapping.java Passes QueryRunner into Cassandra test data setup for improved visibility checks.
plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/CassandraCreateAndInsertDataSetup.java Waits for table visibility via Trino query instead of Cassandra system tables.
core/trino-spi/src/test/java/io/trino/spi/variant/TestMetadata.java Marks the test as isolated due to high memory usage.
.github/actions/setup/action.yml Adds caching/restoring of Maven wrapper distributions and verifies wrapper availability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@electrum electrum force-pushed the user/electrum/fix-ci branch 2 times, most recently from 7026e10 to 1851a9b Compare May 9, 2026 22:39
@electrum electrum requested a review from Copilot May 10, 2026 01:23
@github-actions github-actions Bot added bigquery BigQuery connector redshift Redshift connector sqlserver SQLServer connector labels May 10, 2026
@electrum
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java (1)

1584-1587: 💤 Low value

Consider adding a comment explaining the error message being matched.

The string literal "Visibility check was unavailable" appears to be a specific BigQuery error message related to eventual consistency. While string matching on error messages is inherently fragile (BigQuery could change the message format), this is likely the only way to detect this transient failure type.

📝 Optional: Add explanatory comment
 private static boolean isTransientBigQueryFailure(Throwable throwable)
 {
+    // BigQuery returns "Visibility check was unavailable" during eventual consistency windows.
+    // This is a transient error that typically resolves within seconds.
     return nullToEmpty(throwable.getMessage()).contains("Visibility check was unavailable");
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java`
around lines 1584 - 1587, The isTransientBigQueryFailure method currently
matches the literal "Visibility check was unavailable" with no context; add a
concise comment above isTransientBigQueryFailure explaining that this specific
BigQuery error string indicates transient eventual-consistency/visibility-check
failures (why we need to treat it as transient), note that string matching is
fragile but necessary here, and mention any observed BigQuery behavior or
versions that produced this message so future maintainers know the rationale for
using this exact literal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java`:
- Around line 1584-1587: The isTransientBigQueryFailure method currently matches
the literal "Visibility check was unavailable" with no context; add a concise
comment above isTransientBigQueryFailure explaining that this specific BigQuery
error string indicates transient eventual-consistency/visibility-check failures
(why we need to treat it as transient), note that string matching is fragile but
necessary here, and mention any observed BigQuery behavior or versions that
produced this message so future maintainers know the rationale for using this
exact literal.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86cf92b1-fb48-44c4-b61b-695b74458c87

📥 Commits

Reviewing files that changed from the base of the PR and between 95aa507 and 888c419.

📒 Files selected for processing (12)
  • plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
  • plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/CassandraCreateAndInsertDataSetup.java
  • plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/TestCassandraTable.java
  • plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/TestCassandraTypeMapping.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorSmokeTest.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/nessie/TestIcebergNessieCatalogConnectorSmokeTest.java
  • plugin/trino-redshift/src/test/java/io/trino/plugin/redshift/TestingRedshiftServer.java
  • plugin/trino-sqlserver/src/test/java/io/trino/plugin/sqlserver/TestingSqlServer.java
  • testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/util/DeltaLakeTestUtils.java
  • testing/trino-product-tests/src/main/java/io/trino/tests/product/utils/QueryExecutors.java
  • testing/trino-product-tests/src/test/java/io/trino/tests/product/deltalake/util/TestDeltaLakeTestUtils.java
  • testing/trino-product-tests/src/test/java/io/trino/tests/product/utils/TestQueryExecutors.java
🚧 Files skipped from review as they are similar to previous changes (8)
  • testing/trino-product-tests/src/test/java/io/trino/tests/product/deltalake/util/TestDeltaLakeTestUtils.java
  • testing/trino-product-tests/src/main/java/io/trino/tests/product/utils/QueryExecutors.java
  • testing/trino-product-tests/src/test/java/io/trino/tests/product/utils/TestQueryExecutors.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/nessie/TestIcebergNessieCatalogConnectorSmokeTest.java
  • plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorSmokeTest.java
  • plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/TestCassandraTypeMapping.java
  • plugin/trino-cassandra/src/test/java/io/trino/plugin/cassandra/CassandraCreateAndInsertDataSetup.java
  • testing/trino-product-tests/src/main/java/io/trino/tests/product/deltalake/util/DeltaLakeTestUtils.java

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

@github-actions github-actions Bot added the hive Hive connector label May 10, 2026
@electrum electrum force-pushed the user/electrum/fix-ci branch 2 times, most recently from fdbdd51 to 1a1502a Compare May 11, 2026 06:53
@electrum electrum requested a review from Copilot May 11, 2026 06:53
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 11, 2026
electrum added 21 commits May 11, 2026 12:36
The shared ExponentialBackOff carried retry state across requests.
Tests were competing with each other while sharing API quota.
The Glue test could race inherited exact schema-listing checks
and leave its extra table behind after failures.
Trino sometimes queried the table before connector metadata saw
the Cassandra table created directly through CQL.
Nessie REST outages matched the same broad commit-failure prefix
as the expected stale-reference conflict.
Databricks tests retried arbitrary SQLExceptions, hiding
non-transient failures, while some cluster-state errors were missed.
SQL Server tests occasionally fail before startup when Testcontainers
cannot pull the MCR image. Treat that startup failure as retryable.
BigQuery sometimes rejects setup statements with a transient visibility
check error. Retry native setup calls so these service blips do not
fail the suite.
Databricks Delta tests can fail when Glue throws an internal metastore
NPE during table creation. Treat that startup-side failure as
transient.
Redshift cloud tests sometimes get connection refused after the cluster
is reported available. Retry that startup window like other Redshift
transient failures.
Azure can reject the intentionally expired URI during creation when a
one-second expiration rounds down to the service's current second.
HiveServer2 can reject Beeline connections immediately after the
Hive data lake starts.
The invalid rename path left source schemas behind, polluting shared
connector test databases.
The launcher mounted credentials at a path with a missing slash, so
Trino could not find them during multinode-gcs startup.
Connector query tests could fail when concurrent schema changes made
SHOW SCHEMAS and SHOW SCHEMAS LIKE observe different listings. The
LIKE and ESCAPE behavior is engine-side, so exercise it in the engine
suite instead.
The Kerberos smoke test can hide an early CTAS failure behind a
cleanup-time kill_query race. Surface the async query result so CI
shows whether the query failed early or completed before renewal was
exercised.
HDFS placement failures currently show only the client exception,
which hides why the NameNode rejected the live DataNode. Dump HDFS
and container state on real test failures, and verify a startup
write before running filesystem tests.
Product tests can fail during environment startup when Adoptium
transiently returns 5xx while downloading the JDK tarball.
Retry the download and remove partial archives so later attempts
do not try to extract a corrupt file.
@electrum electrum force-pushed the user/electrum/fix-ci branch from 6d646e0 to 4a9ac4a Compare May 11, 2026 19:45
@electrum electrum enabled auto-merge May 11, 2026 19:45
@electrum electrum added this pull request to the merge queue May 11, 2026
Merged via the queue into master with commit c4fb9ad May 11, 2026
127 checks passed
@electrum electrum deleted the user/electrum/fix-ci branch May 11, 2026 21:39
@github-actions github-actions Bot added this to the 481 milestone May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cassandra Cassandra connector cla-signed delta-lake Delta Lake connector google-sheets Google Sheets connector hive Hive connector iceberg Iceberg connector redshift Redshift connector sqlserver SQLServer connector

Development

Successfully merging this pull request may close these issues.

4 participants