Revert "Upgrade to Cassandra Java Driver 4.x"#301
Conversation
This reverts commit 0da8dcf.
Reviewer's GuideReverts the previous upgrade to the Cassandra Java Driver 4.x, restoring the 3.x driver API usage across the Cassandra query executor and batch loader, and aligning build, configuration, and Docker image versions back to those compatible with the older driver. Class diagram for reverted Cassandra 3.x executor and batch loaderclassDiagram
class CassandraQueryExecutor {
- Map typeMapping
- Cluster cluster
- Session session
+ CassandraQueryExecutor(Configuration configuration)
+ QueryResult executeQuery(String sql)
+ Session getSession()
+ ListString getColumnNames(String keySpace, String tableName)
+ boolean tableExists(String keySpace, String tableName)
+ ListString getTableNames(String keySpace)
+ void close()
- void ensureConnected()
- static JDBCType getJDBCType(DataType type)
}
class TypeNotSupportedException {
+ TypeNotSupportedException(DataType type)
}
class CassandraBatchLoader {
- Session session
- String insertQuery
- int columnsCount
- int batchRowsCount
+ CassandraBatchLoader(Session session, String tableName, ListString columnNames, int batchRowsCount)
+ void load(IteratorOfListObject rows)
- static BatchStatement createBatchStatement()
}
CassandraQueryExecutor ..> Cluster : uses
CassandraQueryExecutor ..> Session : manages
CassandraQueryExecutor ..> ResultSet : uses
CassandraQueryExecutor ..> KeyspaceMetadata : uses
CassandraQueryExecutor ..> TableMetadata : uses
CassandraQueryExecutor ..> ColumnMetadata : uses
CassandraQueryExecutor ..> TypeNotSupportedException : throws
CassandraBatchLoader ..> Session : uses
CassandraBatchLoader ..> PreparedStatement : uses
CassandraBatchLoader ..> BatchStatement : uses
TypeNotSupportedException --|> RuntimeException
Architecture diagram for CassandraQueryExecutor with Cassandra 3.x driverflowchart LR
Config[Configuration
databases.cassandra.host
databases.cassandra.port] --> CQE[CassandraQueryExecutor]
CQE --> ClusterObj[Cluster]
ClusterObj --> SessionObj[Session]
SessionObj --> CassandraNode[Cassandra container
image cassandra:2.1.15]
subgraph DockerEnvironment
CassandraNode
end
CQE -->|executes CQL| SessionObj
CassandraBatchLoader -->|inserts batched rows| SessionObj
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- In
getJDBCType, the null check was changed fromjdbcType == nulltotype == null, which will now skip throwingTypeNotSupportedExceptionfor unmapped types and instead only fail if the input type itself is null; consider restoring the original check so unsupported Cassandra types are still surfaced. - The change from
row.getObject(i)torow.getToken(i).getValue()inexecuteQuerysignificantly alters the semantics of result values (returning token values instead of actual cell contents) and will likely break existing expectations; consider keepinggetObjector another accessor that returns the cell value. - With the introduction of
ensureConnected, theclose()method now only closes theClusterand not theSessiondirectly; verify that session lifecycle is correctly handled by the driver when closing the cluster or consider explicitly closing the session to avoid resource leaks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `getJDBCType`, the null check was changed from `jdbcType == null` to `type == null`, which will now skip throwing `TypeNotSupportedException` for unmapped types and instead only fail if the input type itself is null; consider restoring the original check so unsupported Cassandra types are still surfaced.
- The change from `row.getObject(i)` to `row.getToken(i).getValue()` in `executeQuery` significantly alters the semantics of result values (returning token values instead of actual cell contents) and will likely break existing expectations; consider keeping `getObject` or another accessor that returns the cell value.
- With the introduction of `ensureConnected`, the `close()` method now only closes the `Cluster` and not the `Session` directly; verify that session lifecycle is correctly handled by the driver when closing the cluster or consider explicitly closing the session to avoid resource leaks.
## Individual Comments
### Comment 1
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="106" />
<code_context>
List<Object> builderRow = newArrayList();
for (int i = 0; i < types.size(); ++i) {
- builderRow.add(row.getObject(i));
+ builderRow.add(row.getToken(i).getValue());
}
resultBuilder.addRow(builderRow);
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `row.getToken(i)` instead of `row.getObject(i)` is likely incorrect for fetching column values.
`getToken(i)` only returns partitioner tokens for partition key columns and does not represent the actual column values, especially for non-key columns. This changes the semantics of the query result versus `getObject(i)` and will produce incorrect data unless you explicitly intend to expose partition tokens. Please revert to `row.getObject(i)` (or a type-specific accessor) for returning row values.
</issue_to_address>
### Comment 2
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="166" />
<code_context>
{
JDBCType jdbcType = typeMapping.get(type);
- if (jdbcType == null) {
+ if (type == null) {
throw new TypeNotSupportedException(type);
}
</code_context>
<issue_to_address>
**issue (bug_risk):** The null check in `getJDBCType` is applied to the wrong variable.
The original logic correctly used `jdbcType == null` to detect unsupported Cassandra `DataType`s. Changing the condition to `type == null` means unsupported but valid `DataType`s now return a null `jdbcType` instead of throwing, which can lead to NPEs or incorrect type handling downstream. Please restore the check to `if (jdbcType == null)`.
</issue_to_address>
### Comment 3
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="122" />
<code_context>
- .map(Object::toString)
- .collect(toList());
+ checkState(tableExists(keySpace, tableName), "table %s.%s does not exist", keySpace, tableName);
+ KeyspaceMetadata keyspaceMetadata = session.getCluster().getMetadata().getKeyspace(keySpace);
+ TableMetadata tableMetadata = keyspaceMetadata.getTable(tableName);
+ return tableMetadata.getColumns().stream().map(ColumnMetadata::getName).collect(toList());
</code_context>
<issue_to_address>
**issue (bug_risk):** `getColumnNames` uses `session` without ensuring it is initialized, risking an NPE.
`ensureConnected()` now handles session initialization, but this method only checks `tableExists`, which uses `cluster` and never initializes `session`. If `getColumnNames` is called before anything that triggers `ensureConnected()`, `session` will be `null` and `session.getCluster()` will throw. Please either call `ensureConnected()` at the start of this method or access metadata via `cluster` as in `tableExists`.
</issue_to_address>
### Comment 4
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="59-66" />
<code_context>
+ .put(DataType.cfloat(), JDBCType.REAL)
+ .put(DataType.cint(), JDBCType.INTEGER)
+ .put(DataType.smallint(), JDBCType.SMALLINT)
+ //.put(DataType.text(), JDBCType.NVARCHAR)
+ .put(DataType.time(), JDBCType.TIME)
+ .put(DataType.timestamp(), JDBCType.TIMESTAMP)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Dropping the mapping for `DataType.text()` may cause unsupported-type failures for `text` columns.
The old mapping handled `text` via `DataTypes.TEXT`, but the new version removes `DataType.text()` entirely. Even though Cassandra treats `text` and `varchar` as aliases, schemas may still declare `text`, and the driver may treat `DataType.text()` separately from `DataType.varchar()`. In that case, `text` columns would fail with `TypeNotSupportedException`. Consider mapping both `DataType.text()` and `DataType.varchar()` to the same JDBC type (e.g., `JDBCType.VARCHAR` or `NVARCHAR`, depending on the intent) rather than dropping `text` support.
```suggestion
.put(DataType.cint(), JDBCType.INTEGER)
.put(DataType.smallint(), JDBCType.SMALLINT)
.put(DataType.text(), JDBCType.VARCHAR)
.put(DataType.time(), JDBCType.TIME)
.put(DataType.timestamp(), JDBCType.TIMESTAMP)
.put(DataType.tinyint(), JDBCType.TINYINT)
.put(DataType.varchar(), JDBCType.VARCHAR)
.build();
```
</issue_to_address>
### Comment 5
<location path="tempto-core/src/main/java/io/prestodb/tempto/internal/query/CassandraQueryExecutor.java" line_range="114-117" />
<code_context>
}
- public CqlSession getSession()
+ public Session getSession()
{
return session;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `getSession()` may now return `null` due to lazy initialization.
With the previous eager construction, `getSession()` was guaranteed non-null; now it can return `null` unless `ensureConnected()` has been called first. If external callers depend on a non-null session, either call `ensureConnected()` from `getSession()` or clearly document/enforce that callers must use higher-level methods (e.g., `executeQuery()`) instead of `getSession()` directly.
```suggestion
public Session getSession()
{
ensureConnected();
return session;
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| List<Object> builderRow = newArrayList(); | ||
| for (int i = 0; i < types.size(); ++i) { | ||
| builderRow.add(row.getObject(i)); | ||
| builderRow.add(row.getToken(i).getValue()); |
There was a problem hiding this comment.
issue (bug_risk): Using row.getToken(i) instead of row.getObject(i) is likely incorrect for fetching column values.
getToken(i) only returns partitioner tokens for partition key columns and does not represent the actual column values, especially for non-key columns. This changes the semantics of the query result versus getObject(i) and will produce incorrect data unless you explicitly intend to expose partition tokens. Please revert to row.getObject(i) (or a type-specific accessor) for returning row values.
| { | ||
| JDBCType jdbcType = typeMapping.get(type); | ||
| if (jdbcType == null) { | ||
| if (type == null) { |
There was a problem hiding this comment.
issue (bug_risk): The null check in getJDBCType is applied to the wrong variable.
The original logic correctly used jdbcType == null to detect unsupported Cassandra DataTypes. Changing the condition to type == null means unsupported but valid DataTypes now return a null jdbcType instead of throwing, which can lead to NPEs or incorrect type handling downstream. Please restore the check to if (jdbcType == null).
| .map(Object::toString) | ||
| .collect(toList()); | ||
| checkState(tableExists(keySpace, tableName), "table %s.%s does not exist", keySpace, tableName); | ||
| KeyspaceMetadata keyspaceMetadata = session.getCluster().getMetadata().getKeyspace(keySpace); |
There was a problem hiding this comment.
issue (bug_risk): getColumnNames uses session without ensuring it is initialized, risking an NPE.
ensureConnected() now handles session initialization, but this method only checks tableExists, which uses cluster and never initializes session. If getColumnNames is called before anything that triggers ensureConnected(), session will be null and session.getCluster() will throw. Please either call ensureConnected() at the start of this method or access metadata via cluster as in tableExists.
| .put(DataType.cint(), JDBCType.INTEGER) | ||
| .put(DataType.smallint(), JDBCType.SMALLINT) | ||
| //.put(DataType.text(), JDBCType.NVARCHAR) | ||
| .put(DataType.time(), JDBCType.TIME) | ||
| .put(DataType.timestamp(), JDBCType.TIMESTAMP) | ||
| .put(DataType.tinyint(), JDBCType.TINYINT) | ||
| .put(DataType.varchar(), JDBCType.VARCHAR) | ||
| .build(); |
There was a problem hiding this comment.
suggestion (bug_risk): Dropping the mapping for DataType.text() may cause unsupported-type failures for text columns.
The old mapping handled text via DataTypes.TEXT, but the new version removes DataType.text() entirely. Even though Cassandra treats text and varchar as aliases, schemas may still declare text, and the driver may treat DataType.text() separately from DataType.varchar(). In that case, text columns would fail with TypeNotSupportedException. Consider mapping both DataType.text() and DataType.varchar() to the same JDBC type (e.g., JDBCType.VARCHAR or NVARCHAR, depending on the intent) rather than dropping text support.
| .put(DataType.cint(), JDBCType.INTEGER) | |
| .put(DataType.smallint(), JDBCType.SMALLINT) | |
| //.put(DataType.text(), JDBCType.NVARCHAR) | |
| .put(DataType.time(), JDBCType.TIME) | |
| .put(DataType.timestamp(), JDBCType.TIMESTAMP) | |
| .put(DataType.tinyint(), JDBCType.TINYINT) | |
| .put(DataType.varchar(), JDBCType.VARCHAR) | |
| .build(); | |
| .put(DataType.cint(), JDBCType.INTEGER) | |
| .put(DataType.smallint(), JDBCType.SMALLINT) | |
| .put(DataType.text(), JDBCType.VARCHAR) | |
| .put(DataType.time(), JDBCType.TIME) | |
| .put(DataType.timestamp(), JDBCType.TIMESTAMP) | |
| .put(DataType.tinyint(), JDBCType.TINYINT) | |
| .put(DataType.varchar(), JDBCType.VARCHAR) | |
| .build(); |
| public Session getSession() | ||
| { | ||
| return session; | ||
| } |
There was a problem hiding this comment.
suggestion (bug_risk): getSession() may now return null due to lazy initialization.
With the previous eager construction, getSession() was guaranteed non-null; now it can return null unless ensureConnected() has been called first. If external callers depend on a non-null session, either call ensureConnected() from getSession() or clearly document/enforce that callers must use higher-level methods (e.g., executeQuery()) instead of getSession() directly.
| public Session getSession() | |
| { | |
| return session; | |
| } | |
| public Session getSession() | |
| { | |
| ensureConnected(); | |
| return session; | |
| } |
This reverts commit 0da8dcf.
This PR reverts the upgrade until Presto changes are ready, as this is blocking other Presto PRs that need a tempto upgrade.