Skip to content

Revert "Upgrade to Cassandra Java Driver 4.x"#301

Merged
tdcmeehan merged 1 commit into
prestodb:masterfrom
imjalpreet:revertCassandraDriverUpgrade
Apr 27, 2026
Merged

Revert "Upgrade to Cassandra Java Driver 4.x"#301
tdcmeehan merged 1 commit into
prestodb:masterfrom
imjalpreet:revertCassandraDriverUpgrade

Conversation

@imjalpreet
Copy link
Copy Markdown
Member

This reverts commit 0da8dcf.

This PR reverts the upgrade until Presto changes are ready, as this is blocking other Presto PRs that need a tempto upgrade.

@imjalpreet imjalpreet requested a review from tdcmeehan April 27, 2026 19:58
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 27, 2026

Reviewer's Guide

Reverts the previous upgrade to the Cassandra Java Driver 4.x, restoring the 3.x driver API usage across the Cassandra query executor and batch loader, and aligning build, configuration, and Docker image versions back to those compatible with the older driver.

Class diagram for reverted Cassandra 3.x executor and batch loader

classDiagram
    class CassandraQueryExecutor {
        - Map typeMapping
        - Cluster cluster
        - Session session
        + CassandraQueryExecutor(Configuration configuration)
        + QueryResult executeQuery(String sql)
        + Session getSession()
        + ListString getColumnNames(String keySpace, String tableName)
        + boolean tableExists(String keySpace, String tableName)
        + ListString getTableNames(String keySpace)
        + void close()
        - void ensureConnected()
        - static JDBCType getJDBCType(DataType type)
    }

    class TypeNotSupportedException {
        + TypeNotSupportedException(DataType type)
    }

    class CassandraBatchLoader {
        - Session session
        - String insertQuery
        - int columnsCount
        - int batchRowsCount
        + CassandraBatchLoader(Session session, String tableName, ListString columnNames, int batchRowsCount)
        + void load(IteratorOfListObject rows)
        - static BatchStatement createBatchStatement()
    }

    CassandraQueryExecutor ..> Cluster : uses
    CassandraQueryExecutor ..> Session : manages
    CassandraQueryExecutor ..> ResultSet : uses
    CassandraQueryExecutor ..> KeyspaceMetadata : uses
    CassandraQueryExecutor ..> TableMetadata : uses
    CassandraQueryExecutor ..> ColumnMetadata : uses
    CassandraQueryExecutor ..> TypeNotSupportedException : throws

    CassandraBatchLoader ..> Session : uses
    CassandraBatchLoader ..> PreparedStatement : uses
    CassandraBatchLoader ..> BatchStatement : uses

    TypeNotSupportedException --|> RuntimeException
Loading

Architecture diagram for CassandraQueryExecutor with Cassandra 3.x driver

flowchart LR
    Config[Configuration
    databases.cassandra.host
    databases.cassandra.port] --> CQE[CassandraQueryExecutor]

    CQE --> ClusterObj[Cluster]
    ClusterObj --> SessionObj[Session]
    SessionObj --> CassandraNode[Cassandra container
    image cassandra:2.1.15]

    subgraph DockerEnvironment
        CassandraNode
    end

    CQE -->|executes CQL| SessionObj
    CassandraBatchLoader -->|inserts batched rows| SessionObj
Loading

File-Level Changes

Change Details Files
Revert CassandraQueryExecutor to use the DataStax 3.x Cluster/Session API and older metadata/types model.
  • Replace use of CqlSession with Cluster/Session and add lazy connection via ensureConnected()
  • Restore 3.x-style DataType static factories in JDBC type mapping and adjust null/type handling in getJDBCType
  • Switch metadata accessors (keyspaces, tables, columns) back to 3.x driver structures and adjust table existence checks
  • Change row retrieval to use row.getToken(i).getValue() and adjust column definitions iteration
tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java
Revert CassandraBatchLoader to the 3.x driver batch API.
  • Change session type from CqlSession to Session
  • Replace BatchStatementBuilder/DefaultBatchType with 3.x BatchStatement(Type.UNLOGGED)
  • Adjust batch size tracking and execution logic to match 3.x API usage
tempto-core/src/main/java/io/prestodb/tempto/internal/fulfillment/table/cassandra/CassandraBatchLoader.java
Align build and runtime Cassandra dependencies and configuration with the 3.x driver and Cassandra 2.x.
  • Downgrade cassandra driver version from 4.19.2 to 3.4.0 and switch dependency coordinates back to com.datastax.cassandra
  • Simplify Cassandra test configuration by removing datacenter/keyspace/table_manager_type fields tied to driver 4.x
  • Downgrade docker Cassandra image from 3.11.19 to 2.1.15 to match the restored driver expectations
build.gradle
tempto-examples/src/main/resources/tempto-configuration.yaml
tempto-examples/docker/docker-compose.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In getJDBCType, the null check was changed from jdbcType == null to type == null, which will now skip throwing TypeNotSupportedException for unmapped types and instead only fail if the input type itself is null; consider restoring the original check so unsupported Cassandra types are still surfaced.
  • The change from row.getObject(i) to row.getToken(i).getValue() in executeQuery significantly alters the semantics of result values (returning token values instead of actual cell contents) and will likely break existing expectations; consider keeping getObject or another accessor that returns the cell value.
  • With the introduction of ensureConnected, the close() method now only closes the Cluster and not the Session directly; verify that session lifecycle is correctly handled by the driver when closing the cluster or consider explicitly closing the session to avoid resource leaks.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `getJDBCType`, the null check was changed from `jdbcType == null` to `type == null`, which will now skip throwing `TypeNotSupportedException` for unmapped types and instead only fail if the input type itself is null; consider restoring the original check so unsupported Cassandra types are still surfaced.
- The change from `row.getObject(i)` to `row.getToken(i).getValue()` in `executeQuery` significantly alters the semantics of result values (returning token values instead of actual cell contents) and will likely break existing expectations; consider keeping `getObject` or another accessor that returns the cell value.
- With the introduction of `ensureConnected`, the `close()` method now only closes the `Cluster` and not the `Session` directly; verify that session lifecycle is correctly handled by the driver when closing the cluster or consider explicitly closing the session to avoid resource leaks.

## Individual Comments

### Comment 1
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="106" />
<code_context>
             List<Object> builderRow = newArrayList();
             for (int i = 0; i < types.size(); ++i) {
-                builderRow.add(row.getObject(i));
+                builderRow.add(row.getToken(i).getValue());
             }
             resultBuilder.addRow(builderRow);
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `row.getToken(i)` instead of `row.getObject(i)` is likely incorrect for fetching column values.

`getToken(i)` only returns partitioner tokens for partition key columns and does not represent the actual column values, especially for non-key columns. This changes the semantics of the query result versus `getObject(i)` and will produce incorrect data unless you explicitly intend to expose partition tokens. Please revert to `row.getObject(i)` (or a type-specific accessor) for returning row values.
</issue_to_address>

### Comment 2
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="166" />
<code_context>
     {
         JDBCType jdbcType = typeMapping.get(type);
-        if (jdbcType == null) {
+        if (type == null) {
             throw new TypeNotSupportedException(type);
         }
</code_context>
<issue_to_address>
**issue (bug_risk):** The null check in `getJDBCType` is applied to the wrong variable.

The original logic correctly used `jdbcType == null` to detect unsupported Cassandra `DataType`s. Changing the condition to `type == null` means unsupported but valid `DataType`s now return a null `jdbcType` instead of throwing, which can lead to NPEs or incorrect type handling downstream. Please restore the check to `if (jdbcType == null)`.
</issue_to_address>

### Comment 3
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="122" />
<code_context>
-                .map(Object::toString)
-                .collect(toList());
+        checkState(tableExists(keySpace, tableName), "table %s.%s does not exist", keySpace, tableName);
+        KeyspaceMetadata keyspaceMetadata = session.getCluster().getMetadata().getKeyspace(keySpace);
+        TableMetadata tableMetadata = keyspaceMetadata.getTable(tableName);
+        return tableMetadata.getColumns().stream().map(ColumnMetadata::getName).collect(toList());
</code_context>
<issue_to_address>
**issue (bug_risk):** `getColumnNames` uses `session` without ensuring it is initialized, risking an NPE.

`ensureConnected()` now handles session initialization, but this method only checks `tableExists`, which uses `cluster` and never initializes `session`. If `getColumnNames` is called before anything that triggers `ensureConnected()`, `session` will be `null` and `session.getCluster()` will throw. Please either call `ensureConnected()` at the start of this method or access metadata via `cluster` as in `tableExists`.
</issue_to_address>

### Comment 4
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="59-66" />
<code_context>
+                .put(DataType.cfloat(), JDBCType.REAL)
+                .put(DataType.cint(), JDBCType.INTEGER)
+                .put(DataType.smallint(), JDBCType.SMALLINT)
+                //.put(DataType.text(), JDBCType.NVARCHAR)
+                .put(DataType.time(), JDBCType.TIME)
+                .put(DataType.timestamp(), JDBCType.TIMESTAMP)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Dropping the mapping for `DataType.text()` may cause unsupported-type failures for `text` columns.

The old mapping handled `text` via `DataTypes.TEXT`, but the new version removes `DataType.text()` entirely. Even though Cassandra treats `text` and `varchar` as aliases, schemas may still declare `text`, and the driver may treat `DataType.text()` separately from `DataType.varchar()`. In that case, `text` columns would fail with `TypeNotSupportedException`. Consider mapping both `DataType.text()` and `DataType.varchar()` to the same JDBC type (e.g., `JDBCType.VARCHAR` or `NVARCHAR`, depending on the intent) rather than dropping `text` support.

```suggestion
                .put(DataType.cint(), JDBCType.INTEGER)
                .put(DataType.smallint(), JDBCType.SMALLINT)
                .put(DataType.text(), JDBCType.VARCHAR)
                .put(DataType.time(), JDBCType.TIME)
                .put(DataType.timestamp(), JDBCType.TIMESTAMP)
                .put(DataType.tinyint(), JDBCType.TINYINT)
                .put(DataType.varchar(), JDBCType.VARCHAR)
                .build();
```
</issue_to_address>

### Comment 5
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="114-117" />
<code_context>
     }

-    public CqlSession getSession()
+    public Session getSession()
     {
         return session;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `getSession()` may now return `null` due to lazy initialization.

With the previous eager construction, `getSession()` was guaranteed non-null; now it can return `null` unless `ensureConnected()` has been called first. If external callers depend on a non-null session, either call `ensureConnected()` from `getSession()` or clearly document/enforce that callers must use higher-level methods (e.g., `executeQuery()`) instead of `getSession()` directly.

```suggestion
    public Session getSession()
    {
        ensureConnected();
        return session;
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

List<Object> builderRow = newArrayList();
for (int i = 0; i < types.size(); ++i) {
builderRow.add(row.getObject(i));
builderRow.add(row.getToken(i).getValue());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using row.getToken(i) instead of row.getObject(i) is likely incorrect for fetching column values.

getToken(i) only returns partitioner tokens for partition key columns and does not represent the actual column values, especially for non-key columns. This changes the semantics of the query result versus getObject(i) and will produce incorrect data unless you explicitly intend to expose partition tokens. Please revert to row.getObject(i) (or a type-specific accessor) for returning row values.

{
JDBCType jdbcType = typeMapping.get(type);
if (jdbcType == null) {
if (type == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The null check in getJDBCType is applied to the wrong variable.

The original logic correctly used jdbcType == null to detect unsupported Cassandra DataTypes. Changing the condition to type == null means unsupported but valid DataTypes now return a null jdbcType instead of throwing, which can lead to NPEs or incorrect type handling downstream. Please restore the check to if (jdbcType == null).

.map(Object::toString)
.collect(toList());
checkState(tableExists(keySpace, tableName), "table %s.%s does not exist", keySpace, tableName);
KeyspaceMetadata keyspaceMetadata = session.getCluster().getMetadata().getKeyspace(keySpace);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): getColumnNames uses session without ensuring it is initialized, risking an NPE.

ensureConnected() now handles session initialization, but this method only checks tableExists, which uses cluster and never initializes session. If getColumnNames is called before anything that triggers ensureConnected(), session will be null and session.getCluster() will throw. Please either call ensureConnected() at the start of this method or access metadata via cluster as in tableExists.

Comment on lines +59 to 66
.put(DataType.cint(), JDBCType.INTEGER)
.put(DataType.smallint(), JDBCType.SMALLINT)
//.put(DataType.text(), JDBCType.NVARCHAR)
.put(DataType.time(), JDBCType.TIME)
.put(DataType.timestamp(), JDBCType.TIMESTAMP)
.put(DataType.tinyint(), JDBCType.TINYINT)
.put(DataType.varchar(), JDBCType.VARCHAR)
.build();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Dropping the mapping for DataType.text() may cause unsupported-type failures for text columns.

The old mapping handled text via DataTypes.TEXT, but the new version removes DataType.text() entirely. Even though Cassandra treats text and varchar as aliases, schemas may still declare text, and the driver may treat DataType.text() separately from DataType.varchar(). In that case, text columns would fail with TypeNotSupportedException. Consider mapping both DataType.text() and DataType.varchar() to the same JDBC type (e.g., JDBCType.VARCHAR or NVARCHAR, depending on the intent) rather than dropping text support.

Suggested change
.put(DataType.cint(), JDBCType.INTEGER)
.put(DataType.smallint(), JDBCType.SMALLINT)
//.put(DataType.text(), JDBCType.NVARCHAR)
.put(DataType.time(), JDBCType.TIME)
.put(DataType.timestamp(), JDBCType.TIMESTAMP)
.put(DataType.tinyint(), JDBCType.TINYINT)
.put(DataType.varchar(), JDBCType.VARCHAR)
.build();
.put(DataType.cint(), JDBCType.INTEGER)
.put(DataType.smallint(), JDBCType.SMALLINT)
.put(DataType.text(), JDBCType.VARCHAR)
.put(DataType.time(), JDBCType.TIME)
.put(DataType.timestamp(), JDBCType.TIMESTAMP)
.put(DataType.tinyint(), JDBCType.TINYINT)
.put(DataType.varchar(), JDBCType.VARCHAR)
.build();

Comment on lines +114 to 117
public Session getSession()
{
return session;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): getSession() may now return null due to lazy initialization.

With the previous eager construction, getSession() was guaranteed non-null; now it can return null unless ensureConnected() has been called first. If external callers depend on a non-null session, either call ensureConnected() from getSession() or clearly document/enforce that callers must use higher-level methods (e.g., executeQuery()) instead of getSession() directly.

Suggested change
public Session getSession()
{
return session;
}
public Session getSession()
{
ensureConnected();
return session;
}

@tdcmeehan tdcmeehan merged commit e2b8c8a into prestodb:master Apr 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants