-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add Teradata connector #26574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Teradata connector #26574
Changes from 11 commits
8f3ba59
3f1bb34
31d76a6
c305582
c53a1a4
49da91d
4106e3b
d2cd258
e128e28
a92fa94
e4b2713
a448b10
a0b8ea7
db0ee55
f512297
92d4c91
f4bf536
033d2e7
06bc33b
8dda285
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,329 @@ | ||
| # Teradata connector | ||
|
|
||
| ```{raw} html | ||
| <img src="../_static/img/teradata.png" class="connector-logo"> | ||
| ``` | ||
|
|
||
| The Teradata connector allows querying and creating tables in an external Teradata database. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wrap at 80 characters. Same for other places.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed in new PR #26731 |
||
| This can be used to join data between different systems like Teradata and Hive, or between different Teradata instances. | ||
|
|
||
| ## Requirements | ||
|
|
||
| To connect to Teradata, you need: | ||
|
|
||
| - Teradata Database | ||
| - Network access from the Trino coordinator and workers to Teradata. Port 1025 is the default port | ||
|
|
||
| ## Configuration | ||
|
|
||
| To configure the Teradata connector, create a catalog properties file in `etc/catalog` named, for example, `teradata.properties`, to mount the Teradata connector as the `teradata` | ||
| catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup: | ||
|
|
||
| ```properties | ||
| connector.name=teradata | ||
| connection-url=jdbc:teradata://example.teradata.com/CHARSET=UTF8,TMODE=ANSI,LOGMECH=TD2 | ||
| connection-user=*** | ||
| connection-password=*** | ||
| ``` | ||
|
|
||
| The `connection-url` defines the connection information and parameters to pass to the Teradata JDBC driver. The supported parameters for the URL are available in | ||
| the [Teradata JDBC documentation](https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ). | ||
|
|
||
| For example, the following `connection-url` configures character encoding, transaction mode, and authentication. | ||
|
|
||
| ```properties | ||
| connection-url=jdbc:teradata://example.teradata.com/CHARSET=UTF8,TMODE=ANSI,LOGMECH=TD2 | ||
| ``` | ||
|
|
||
| The `connection-user` and `connection-password` are typically required and determine the user credentials for the connection, often a service user. | ||
|
|
||
| ### Connection security | ||
|
|
||
| If you have TLS configured with a globally-trusted certificate installed on your data source, you can enable TLS between your cluster and the data source by appending parameters to | ||
| the JDBC connection string set in the connection-url catalog configuration property. | ||
|
|
||
| For example, to specify SSLMODE: | ||
|
|
||
| ```properties | ||
| connection-url=jdbc:teradata://example.teradata.com/SSLMODE=REQUIRED | ||
| ``` | ||
|
|
||
| For more information on TLS configuration options, see the | ||
| Teradata [JDBC documentation](https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#URL_SSLMODE_). | ||
|
|
||
| ```{include} jdbc-authentication.fragment | ||
| ``` | ||
|
|
||
| ### Multiple Teradata databases | ||
|
|
||
| You can have as many catalogs as you need, so if you have additional Teradata databases, simply add another properties file to etc/catalog with a different name, making sure it | ||
| ends in .properties. For example, if you name the property file sales.properties, Trino creates a catalog named sales using the configured connector. | ||
|
|
||
| ## Type mapping | ||
|
|
||
| Because Trino and Teradata each support types that the other does not, this | ||
| connector {ref}`modifies some types <type-mapping-overview>` when reading data. | ||
| Refer to the following sections for type mapping in when reading data from Teradata | ||
| to Trino. | ||
|
|
||
| ### Teradata type to Trino type mapping | ||
|
|
||
| The connector maps Teradata types to the corresponding Trino types following | ||
| this table: | ||
|
|
||
| :::{list-table} Teradata type to Trino type mapping | ||
| :widths: 30, 30, 40 | ||
| :header-rows: 1 | ||
|
|
||
| * | ||
| - Teradata type | ||
| - Trino type | ||
| - Notes | ||
| * | ||
| - `TINYINT` | ||
| - `TINYINT` | ||
| - | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove redundant empty lines and fix indentation:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. addressed in new PR #26731 |
||
| * | ||
| - `SMALLINT` | ||
|
|
||
| - `SMALLINT` | ||
| - | ||
|
|
||
| * | ||
| - `INTEGER` | ||
|
|
||
| - `INTEGER` | ||
| - | ||
|
|
||
| * | ||
| - `BIGINT` | ||
|
|
||
| - `BIGINT` | ||
| - | ||
|
|
||
| * | ||
| - `REAL` | ||
|
|
||
| - `DOUBLE` | ||
| - | ||
|
|
||
| * | ||
| - `DOUBLE` | ||
|
|
||
| - `DOUBLE` | ||
| - | ||
|
|
||
| * | ||
| - `FLOAT` | ||
|
|
||
| - `DOUBLE` | ||
| - | ||
|
|
||
| * | ||
| - `NUMBER(p, s)` | ||
|
|
||
| - `DECIMAL(p, s)` | ||
| - `DECIMAL(p, s)` is an alias of `NUMERIC(p, s)`. See | ||
| [](postgresql-decimal-type-handling) for more information. | ||
|
|
||
| * | ||
| - `NUMERIC(p, s)` | ||
|
|
||
| - `DECIMAL(p, s)` | ||
| - `DECIMAL(p, s)` is an alias of `NUMERIC(p, s)`. See | ||
| [](postgresql-decimal-type-handling) for more information. | ||
|
|
||
| * | ||
| - `DECIMAL(p, s)` | ||
|
|
||
| - `DECIMAL(p, s)` | ||
| - `DECIMAL(p, s)` is an alias of `NUMERIC(p, s)`. See | ||
| [](postgresql-decimal-type-handling) for more information. | ||
|
|
||
| * | ||
| - `CHAR(n)` | ||
|
|
||
| - `CHAR(n)` | ||
| - | ||
|
|
||
| * | ||
| - `CHARACTER(n)` | ||
|
|
||
| - `CHAR(n)` | ||
| - | ||
|
|
||
| * | ||
| - `VARCHAR(n)` | ||
|
|
||
| - `VARCHAR(n)` | ||
| - | ||
|
|
||
| * | ||
| - `BINARY` | ||
|
|
||
| - `VARBINARY` | ||
| - | ||
|
|
||
| * | ||
| - `VARBINARY` | ||
|
|
||
| - `VARBINARY` | ||
| - | ||
|
|
||
| * | ||
| - `BLOB` | ||
|
|
||
| - `VARBINARY` | ||
| - | ||
|
|
||
| * | ||
| - `DATE` | ||
|
|
||
| - `DATE` | ||
| - | ||
|
|
||
| * | ||
| - `TIME(n)` | ||
|
|
||
| - `TIME(n)` | ||
| - | ||
|
|
||
| * | ||
| - `TIMESTAMP(n)` | ||
|
|
||
| - `TIMESTAMP(n)` | ||
| - | ||
|
|
||
| * | ||
| - `TIMESTAMP(n) WITH TIME ZONE` | ||
|
|
||
| - `TIMESTAMP(n) WITH TIME ZONE` | ||
| - | ||
|
|
||
| * | ||
| - `TIME(n) WITH TIME ZONE` | ||
|
|
||
| - `TIME(n) WITH TIME ZONE` | ||
| - | ||
|
|
||
| * | ||
| - `JSON` | ||
|
|
||
| - `JSON` | ||
| - | ||
|
|
||
| ::: | ||
|
|
||
| No other types are supported. | ||
|
|
||
| ```{include} jdbc-type-mapping.fragment | ||
| ``` | ||
|
|
||
| ## Querying Teradata | ||
|
|
||
| The Teradata connector provides a schema for every Teradata database. You can see the available Teradata databases by running SHOW SCHEMAS: | ||
|
|
||
| ``` | ||
| SHOW SCHEMAS FROM teradata; | ||
| ``` | ||
|
|
||
| If you have a Teradata database named sales, you can view the tables in this database by running SHOW TABLES: | ||
|
|
||
| ``` | ||
| SHOW TABLES FROM teradata.sales; | ||
| ``` | ||
|
|
||
| You can see a list of the columns in the orders table in the sales database using either of the following: | ||
|
|
||
| ``` | ||
| DESCRIBE teradata.sales.orders; | ||
| SHOW COLUMNS FROM teradata.sales.orders; | ||
| ``` | ||
|
|
||
| Finally, you can access the orders table in the sales database: | ||
|
|
||
| ``` | ||
| SELECT * FROM teradata.sales.orders; | ||
| ``` | ||
|
|
||
| ## SQL support | ||
|
|
||
| The connector provides read access to data and metadata in the Teradata database. In addition to | ||
| the [globally available](https://trino.io/docs/current/language/sql-support.html#globally-available-statements) | ||
| and [read operation](https://trino.io/docs/current/language/sql-support.html#read-operations) statements, the connector supports the following features: | ||
|
|
||
| ## Performance | ||
|
|
||
| The connector includes a number of performance improvements, detailed in the following sections. | ||
|
|
||
| ### Table statistics | ||
|
|
||
| The Teradata connector can use [table and column statistics](https://trino.io/docs/current/optimizer/statistics.html) | ||
| for [cost based optimizations](https://trino.io/docs/current/optimizer/cost-based-optimizations.html), to improve query processing performance based on the actual data in the data | ||
| source. | ||
| The statistics are collected by Teradata and retrieved by the connector. The table and column statistics are based on Teradata's Data Dictionary views. | ||
|
|
||
| You can update statistics in Teradata by running: | ||
|
|
||
| ``` | ||
| COLLECT STATISTICS COLUMN (regionkey), COLUMN (name) ON trino_test_teradatajdbcconnect.nation; | ||
| ``` | ||
|
|
||
| Please refer to [Statistics](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Definition-Language-Syntax-and-Examples/Statistics-Statements) for more information | ||
| on Table Statistics. | ||
|
|
||
| ### Pushdown | ||
|
|
||
| The connector supports pushdown for a number of operations: | ||
|
|
||
| - {ref}`join-pushdown` | ||
| - {ref}`limit-pushdown` | ||
| - {ref}`topn-pushdown` | ||
|
|
||
| {ref}`Aggregate pushdown <aggregation-pushdown>` for the following functions: | ||
|
|
||
| - {func}`avg` | ||
| - {func}`count` | ||
| - {func}`max` | ||
| - {func}`min` | ||
| - {func}`sum` | ||
| - {func}`stddev` | ||
| - {func}`stddev_pop` | ||
| - {func}`stddev_samp` | ||
| - {func}`variance` | ||
| - {func}`var_pop` | ||
| - {func}`var_samp` | ||
| - {func}`covar_pop` | ||
| - {func}`covar_samp` | ||
| - {func}`corr` | ||
| - {func}`regr_intercept` | ||
| - {func}`regr_slope` | ||
|
|
||
| ```{include} join-pushdown-enabled-true.fragment | ||
| ``` | ||
|
|
||
| ### Predicate pushdown support | ||
|
|
||
| Predicates are pushed down for most types, including `UUID` and temporal | ||
| types, such as `DATE`. | ||
|
|
||
| The connector does not support pushdown of range predicates, such as `>`, | ||
| `<`, or `BETWEEN`, on columns with {ref}`character string types | ||
| <string-data-types>` like `CHAR` or `VARCHAR`. Equality predicates, such as | ||
| `IN` or `=`, and inequality predicates, such as `!=` on columns with | ||
| textual types are pushed down. This ensures correctness of results since the | ||
| remote data source may sort strings differently than Trino. | ||
|
|
||
| In the following example, the predicate of the first query is not pushed down | ||
| since `name` is a column of type `VARCHAR` and `>` is a range predicate. | ||
| The other queries are pushed down. | ||
|
|
||
| ```sql | ||
| -- Not pushed down | ||
| SELECT * FROM nation WHERE name > 'CANADA'; | ||
| -- Pushed down | ||
| SELECT * FROM nation WHERE name != 'CANADA'; | ||
| SELECT * FROM nation WHERE name = 'CANADA'; | ||
| ``` | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Teradata Connector Developer Notes | ||
|
|
||
| The Teradata connector module has both unit tests and integration tests. | ||
| The integration tests require access to a [Teradata ClearScape Analytics™ Experience](https://clearscape.teradata.com/sign-in). | ||
| You can follow the steps below to run the integration tests locally. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| #### 1. Create a new ClearScape Analytics™ Experience account | ||
|
|
||
| If you don't already have one, sign up at: | ||
|
|
||
| [Teradata ClearScape Analytics™ Experience](https://www.teradata.com/getting-started/demos/clearscape-analytics) | ||
|
|
||
| #### 2. Login | ||
|
|
||
| Sign in with your new account at: | ||
|
|
||
| [ClearScape Analytics™ Experience Login](https://clearscape.teradata.com/sign-in) | ||
|
|
||
| #### 3. Collect the API Token | ||
|
|
||
| Use the **Copy API Token** button in the UI to retrieve your token. | ||
|
|
||
| #### 4. Define the following environment variables | ||
|
|
||
| ⚠️ **Note:** The Teradata database password must be **at least 8 characters long**. | ||
|
|
||
| ``` | ||
| export CLEARSCAPE_TOKEN=<API Token> | ||
| export CLEARSCAPE_PASSWORD=<Password for Teradata database (min 8 chars)> | ||
| ``` | ||
|
|
||
| ## Running Integration Tests | ||
|
|
||
| Once the environment variables are set, run the integration tests with: | ||
|
|
||
| ⚠️ **Note:** Run the following command from the Trino parent directory. | ||
|
|
||
| ``` | ||
| ./mvnw clean install -pl :trino-teradata | ||
| ``` |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update ci.yml so we can run tests on CI. You can ask maintainers to add secrets to this repository in Slack.