Revert "Upgrade to Cassandra Java Driver 4.x" by imjalpreet · Pull Request #301 · prestodb/tempto

imjalpreet · 2026-04-27T19:58:04Z

This reverts commit 0da8dcf.

This PR reverts the upgrade until Presto changes are ready, as this is blocking other Presto PRs that need a tempto upgrade.

This reverts commit 0da8dcf.

sourcery-ai · 2026-04-27T19:58:13Z

Reviewer's Guide

Reverts the previous upgrade to the Cassandra Java Driver 4.x, restoring the 3.x driver API usage across the Cassandra query executor and batch loader, and aligning build, configuration, and Docker image versions back to those compatible with the older driver.

Class diagram for reverted Cassandra 3.x executor and batch loader

classDiagram
    class CassandraQueryExecutor {
        - Map typeMapping
        - Cluster cluster
        - Session session
        + CassandraQueryExecutor(Configuration configuration)
        + QueryResult executeQuery(String sql)
        + Session getSession()
        + ListString getColumnNames(String keySpace, String tableName)
        + boolean tableExists(String keySpace, String tableName)
        + ListString getTableNames(String keySpace)
        + void close()
        - void ensureConnected()
        - static JDBCType getJDBCType(DataType type)
    }

    class TypeNotSupportedException {
        + TypeNotSupportedException(DataType type)
    }

    class CassandraBatchLoader {
        - Session session
        - String insertQuery
        - int columnsCount
        - int batchRowsCount
        + CassandraBatchLoader(Session session, String tableName, ListString columnNames, int batchRowsCount)
        + void load(IteratorOfListObject rows)
        - static BatchStatement createBatchStatement()
    }

    CassandraQueryExecutor ..> Cluster : uses
    CassandraQueryExecutor ..> Session : manages
    CassandraQueryExecutor ..> ResultSet : uses
    CassandraQueryExecutor ..> KeyspaceMetadata : uses
    CassandraQueryExecutor ..> TableMetadata : uses
    CassandraQueryExecutor ..> ColumnMetadata : uses
    CassandraQueryExecutor ..> TypeNotSupportedException : throws

    CassandraBatchLoader ..> Session : uses
    CassandraBatchLoader ..> PreparedStatement : uses
    CassandraBatchLoader ..> BatchStatement : uses

    TypeNotSupportedException --|> RuntimeException

Architecture diagram for CassandraQueryExecutor with Cassandra 3.x driver

flowchart LR
    Config[Configuration
    databases.cassandra.host
    databases.cassandra.port] --> CQE[CassandraQueryExecutor]

    CQE --> ClusterObj[Cluster]
    ClusterObj --> SessionObj[Session]
    SessionObj --> CassandraNode[Cassandra container
    image cassandra:2.1.15]

    subgraph DockerEnvironment
        CassandraNode
    end

    CQE -->|executes CQL| SessionObj
    CassandraBatchLoader -->|inserts batched rows| SessionObj

File-Level Changes

Change	Details	Files
Revert CassandraQueryExecutor to use the DataStax 3.x Cluster/Session API and older metadata/types model.	Replace use of CqlSession with Cluster/Session and add lazy connection via ensureConnected() Restore 3.x-style DataType static factories in JDBC type mapping and adjust null/type handling in getJDBCType Switch metadata accessors (keyspaces, tables, columns) back to 3.x driver structures and adjust table existence checks Change row retrieval to use row.getToken(i).getValue() and adjust column definitions iteration	`tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java`
Revert CassandraBatchLoader to the 3.x driver batch API.	Change session type from CqlSession to Session Replace BatchStatementBuilder/DefaultBatchType with 3.x BatchStatement(Type.UNLOGGED) Adjust batch size tracking and execution logic to match 3.x API usage	`tempto-core/src/main/java/io/prestodb/tempto/internal/fulfillment/table/cassandra/CassandraBatchLoader.java`
Align build and runtime Cassandra dependencies and configuration with the 3.x driver and Cassandra 2.x.	Downgrade cassandra driver version from 4.19.2 to 3.4.0 and switch dependency coordinates back to com.datastax.cassandra Simplify Cassandra test configuration by removing datacenter/keyspace/table_manager_type fields tied to driver 4.x Downgrade docker Cassandra image from 3.11.19 to 2.1.15 to match the restored driver expectations	`build.gradle` `tempto-examples/src/main/resources/tempto-configuration.yaml` `tempto-examples/docker/docker-compose.yml`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 5 issues, and left some high level feedback:

In getJDBCType, the null check was changed from jdbcType == null to type == null, which will now skip throwing TypeNotSupportedException for unmapped types and instead only fail if the input type itself is null; consider restoring the original check so unsupported Cassandra types are still surfaced.
The change from row.getObject(i) to row.getToken(i).getValue() in executeQuery significantly alters the semantics of result values (returning token values instead of actual cell contents) and will likely break existing expectations; consider keeping getObject or another accessor that returns the cell value.
With the introduction of ensureConnected, the close() method now only closes the Cluster and not the Session directly; verify that session lifecycle is correctly handled by the driver when closing the cluster or consider explicitly closing the session to avoid resource leaks.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `getJDBCType`, the null check was changed from `jdbcType == null` to `type == null`, which will now skip throwing `TypeNotSupportedException` for unmapped types and instead only fail if the input type itself is null; consider restoring the original check so unsupported Cassandra types are still surfaced.
- The change from `row.getObject(i)` to `row.getToken(i).getValue()` in `executeQuery` significantly alters the semantics of result values (returning token values instead of actual cell contents) and will likely break existing expectations; consider keeping `getObject` or another accessor that returns the cell value.
- With the introduction of `ensureConnected`, the `close()` method now only closes the `Cluster` and not the `Session` directly; verify that session lifecycle is correctly handled by the driver when closing the cluster or consider explicitly closing the session to avoid resource leaks.

## Individual Comments

### Comment 1
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="106" />
<code_context>
             List<Object> builderRow = newArrayList();
             for (int i = 0; i < types.size(); ++i) {
-                builderRow.add(row.getObject(i));
+                builderRow.add(row.getToken(i).getValue());
             }
             resultBuilder.addRow(builderRow);
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `row.getToken(i)` instead of `row.getObject(i)` is likely incorrect for fetching column values.

`getToken(i)` only returns partitioner tokens for partition key columns and does not represent the actual column values, especially for non-key columns. This changes the semantics of the query result versus `getObject(i)` and will produce incorrect data unless you explicitly intend to expose partition tokens. Please revert to `row.getObject(i)` (or a type-specific accessor) for returning row values.
</issue_to_address>

### Comment 2
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="166" />
<code_context>
     {
         JDBCType jdbcType = typeMapping.get(type);
-        if (jdbcType == null) {
+        if (type == null) {
             throw new TypeNotSupportedException(type);
         }
</code_context>
<issue_to_address>
**issue (bug_risk):** The null check in `getJDBCType` is applied to the wrong variable.

The original logic correctly used `jdbcType == null` to detect unsupported Cassandra `DataType`s. Changing the condition to `type == null` means unsupported but valid `DataType`s now return a null `jdbcType` instead of throwing, which can lead to NPEs or incorrect type handling downstream. Please restore the check to `if (jdbcType == null)`.
</issue_to_address>

### Comment 3
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="122" />
<code_context>
-                .map(Object::toString)
-                .collect(toList());
+        checkState(tableExists(keySpace, tableName), "table %s.%s does not exist", keySpace, tableName);
+        KeyspaceMetadata keyspaceMetadata = session.getCluster().getMetadata().getKeyspace(keySpace);
+        TableMetadata tableMetadata = keyspaceMetadata.getTable(tableName);
+        return tableMetadata.getColumns().stream().map(ColumnMetadata::getName).collect(toList());
</code_context>
<issue_to_address>
**issue (bug_risk):** `getColumnNames` uses `session` without ensuring it is initialized, risking an NPE.

`ensureConnected()` now handles session initialization, but this method only checks `tableExists`, which uses `cluster` and never initializes `session`. If `getColumnNames` is called before anything that triggers `ensureConnected()`, `session` will be `null` and `session.getCluster()` will throw. Please either call `ensureConnected()` at the start of this method or access metadata via `cluster` as in `tableExists`.
</issue_to_address>

### Comment 4
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="59-66" />
<code_context>
+                .put(DataType.cfloat(), JDBCType.REAL)
+                .put(DataType.cint(), JDBCType.INTEGER)
+                .put(DataType.smallint(), JDBCType.SMALLINT)
+                //.put(DataType.text(), JDBCType.NVARCHAR)
+                .put(DataType.time(), JDBCType.TIME)
+                .put(DataType.timestamp(), JDBCType.TIMESTAMP)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Dropping the mapping for `DataType.text()` may cause unsupported-type failures for `text` columns.

The old mapping handled `text` via `DataTypes.TEXT`, but the new version removes `DataType.text()` entirely. Even though Cassandra treats `text` and `varchar` as aliases, schemas may still declare `text`, and the driver may treat `DataType.text()` separately from `DataType.varchar()`. In that case, `text` columns would fail with `TypeNotSupportedException`. Consider mapping both `DataType.text()` and `DataType.varchar()` to the same JDBC type (e.g., `JDBCType.VARCHAR` or `NVARCHAR`, depending on the intent) rather than dropping `text` support.

```suggestion
                .put(DataType.cint(), JDBCType.INTEGER)
                .put(DataType.smallint(), JDBCType.SMALLINT)
                .put(DataType.text(), JDBCType.VARCHAR)
                .put(DataType.time(), JDBCType.TIME)
                .put(DataType.timestamp(), JDBCType.TIMESTAMP)
                .put(DataType.tinyint(), JDBCType.TINYINT)
                .put(DataType.varchar(), JDBCType.VARCHAR)
                .build();
```
</issue_to_address>

### Comment 5
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="114-117" />
<code_context>
     }

-    public CqlSession getSession()
+    public Session getSession()
     {
         return session;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `getSession()` may now return `null` due to lazy initialization.

With the previous eager construction, `getSession()` was guaranteed non-null; now it can return `null` unless `ensureConnected()` has been called first. If external callers depend on a non-null session, either call `ensureConnected()` from `getSession()` or clearly document/enforce that callers must use higher-level methods (e.g., `executeQuery()`) instead of `getSession()` directly.

```suggestion
    public Session getSession()
    {
        ensureConnected();
        return session;
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-27T19:59:41Z

            List<Object> builderRow = newArrayList();
            for (int i = 0; i < types.size(); ++i) {
-                builderRow.add(row.getObject(i));
+                builderRow.add(row.getToken(i).getValue());


issue (bug_risk): Using row.getToken(i) instead of row.getObject(i) is likely incorrect for fetching column values.

getToken(i) only returns partitioner tokens for partition key columns and does not represent the actual column values, especially for non-key columns. This changes the semantics of the query result versus getObject(i) and will produce incorrect data unless you explicitly intend to expose partition tokens. Please revert to row.getObject(i) (or a type-specific accessor) for returning row values.

sourcery-ai · 2026-04-27T19:59:41Z

    {
        JDBCType jdbcType = typeMapping.get(type);
-        if (jdbcType == null) {
+        if (type == null) {


issue (bug_risk): The null check in getJDBCType is applied to the wrong variable.

The original logic correctly used jdbcType == null to detect unsupported Cassandra DataTypes. Changing the condition to type == null means unsupported but valid DataTypes now return a null jdbcType instead of throwing, which can lead to NPEs or incorrect type handling downstream. Please restore the check to if (jdbcType == null).

sourcery-ai · 2026-04-27T19:59:41Z

-                .map(Object::toString)
-                .collect(toList());
+        checkState(tableExists(keySpace, tableName), "table %s.%s does not exist", keySpace, tableName);
+        KeyspaceMetadata keyspaceMetadata = session.getCluster().getMetadata().getKeyspace(keySpace);


issue (bug_risk): getColumnNames uses session without ensuring it is initialized, risking an NPE.

ensureConnected() now handles session initialization, but this method only checks tableExists, which uses cluster and never initializes session. If getColumnNames is called before anything that triggers ensureConnected(), session will be null and session.getCluster() will throw. Please either call ensureConnected() at the start of this method or access metadata via cluster as in tableExists.

sourcery-ai · 2026-04-27T19:59:41Z

+                .put(DataType.cint(), JDBCType.INTEGER)
+                .put(DataType.smallint(), JDBCType.SMALLINT)
+                //.put(DataType.text(), JDBCType.NVARCHAR)
+                .put(DataType.time(), JDBCType.TIME)
+                .put(DataType.timestamp(), JDBCType.TIMESTAMP)
+                .put(DataType.tinyint(), JDBCType.TINYINT)
+                .put(DataType.varchar(), JDBCType.VARCHAR)
                .build();


suggestion (bug_risk): Dropping the mapping for DataType.text() may cause unsupported-type failures for text columns.

The old mapping handled text via DataTypes.TEXT, but the new version removes DataType.text() entirely. Even though Cassandra treats text and varchar as aliases, schemas may still declare text, and the driver may treat DataType.text() separately from DataType.varchar(). In that case, text columns would fail with TypeNotSupportedException. Consider mapping both DataType.text() and DataType.varchar() to the same JDBC type (e.g., JDBCType.VARCHAR or NVARCHAR, depending on the intent) rather than dropping text support.

Suggested change

.put(DataType.cint(), JDBCType.INTEGER)

.put(DataType.smallint(), JDBCType.SMALLINT)

//.put(DataType.text(), JDBCType.NVARCHAR)

.put(DataType.time(), JDBCType.TIME)

.put(DataType.timestamp(), JDBCType.TIMESTAMP)

.put(DataType.tinyint(), JDBCType.TINYINT)

.put(DataType.varchar(), JDBCType.VARCHAR)

.build();

.put(DataType.cint(), JDBCType.INTEGER)

.put(DataType.smallint(), JDBCType.SMALLINT)

.put(DataType.text(), JDBCType.VARCHAR)

.put(DataType.time(), JDBCType.TIME)

.put(DataType.timestamp(), JDBCType.TIMESTAMP)

.put(DataType.tinyint(), JDBCType.TINYINT)

.put(DataType.varchar(), JDBCType.VARCHAR)

.build();

sourcery-ai · 2026-04-27T19:59:41Z

+    public Session getSession()
    {
        return session;
    }


suggestion (bug_risk): getSession() may now return null due to lazy initialization.

With the previous eager construction, getSession() was guaranteed non-null; now it can return null unless ensureConnected() has been called first. If external callers depend on a non-null session, either call ensureConnected() from getSession() or clearly document/enforce that callers must use higher-level methods (e.g., executeQuery()) instead of getSession() directly.

Suggested change

public Session getSession()

{

return session;

}

public Session getSession()

{

ensureConnected();

return session;

}

Revert "Upgrade to Cassandra Java Driver 4.x (prestodb#298)"

e185500

This reverts commit 0da8dcf.

imjalpreet requested a review from tdcmeehan April 27, 2026 19:58

sourcery-ai Bot reviewed Apr 27, 2026

View reviewed changes

tdcmeehan approved these changes Apr 27, 2026

View reviewed changes

tdcmeehan merged commit e2b8c8a into prestodb:master Apr 27, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Upgrade to Cassandra Java Driver 4.x"#301

Revert "Upgrade to Cassandra Java Driver 4.x"#301
tdcmeehan merged 1 commit into
prestodb:masterfrom
imjalpreet:revertCassandraDriverUpgrade

imjalpreet commented Apr 27, 2026

Uh oh!

sourcery-ai Bot commented Apr 27, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

sourcery-ai Bot Apr 27, 2026

Uh oh!

sourcery-ai Bot Apr 27, 2026

Uh oh!

sourcery-ai Bot Apr 27, 2026

Uh oh!

sourcery-ai Bot Apr 27, 2026

Uh oh!

sourcery-ai Bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imjalpreet commented Apr 27, 2026

Uh oh!

sourcery-ai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for reverted Cassandra 3.x executor and batch loader

Architecture diagram for CassandraQueryExecutor with Cassandra 3.x driver

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sourcery-ai Bot commented Apr 27, 2026 •

edited

Loading