Skip to content

feat: Hive / Kyuubi / Spark Thrift Server connector#30

Merged
shirshanka merged 2 commits intomainfrom
feat/hive-connector
May 7, 2026
Merged

feat: Hive / Kyuubi / Spark Thrift Server connector#30
shirshanka merged 2 commits intomainfrom
feat/hive-connector

Conversation

@shirshanka
Copy link
Copy Markdown
Contributor

Summary

Implements the Hive engine support requested in #13, following the isolated-connector architecture introduced in #24.

  • New analytics-agent-connector-hive package (connectors/hive/) — runs as an MCP subprocess, keeping heavy Thrift/SASL deps out of the base install. Uses PyPI-stable releases (pyhive[hive], pure-sasl, thrift-sasl) instead of git-pinned dependencies.
  • Supports all common auth modes: NONE, NOSASL, LDAP, PLAIN, KERBEROS
  • Hive registered in _CONNECTOR_MAP — the UI "Install connector" flow and env-var wiring work automatically, same as Snowflake/BigQuery
  • connect_args passthrough in SQLAlchemyQueryEngine — generic improvement from feat: Supports hive engine for connecting to hiveserver2/kyuubi/spark thrift server #13 that benefits all SQLAlchemy-based connections
  • Frontend plugin — Hive/Kyuubi/Spark appears in the data source picker

What changed from #13

#13 approach This PR
Deps added to base wheel (pyhive @ git+kyuubi, kerberos, thrift) Isolated connector package, zero impact on base install
Git-pinned, non-reproducible deps PyPI releases only
sql_allow_limit exposed as LLM tool param Removed — _apply_row_limit already skips if LIMIT present
hiveSQLAlchemyQueryEngine inline hive → MCP subprocess (same pattern as Snowflake)

Test plan

  • uv tool install analytics-agent-connector-hive installs cleanly
  • Add Hive data source in UI — "Install connector" flow works
  • NONE auth connects to a local HiveServer2 / Kyuubi instance
  • LDAP auth (username + password) works
  • execute_sql, list_tables, get_schema, preview_table return correct results

@wForget — this PR addresses your request from #13. Could you give it a try against your Kyuubi setup and report back? The connector package will need to be installed separately (uv tool install analytics-agent-connector-hive) until it lands on PyPI — for now you can install from the repo:

uv tool install ./connectors/hive

Any feedback on the auth config or field names would be welcome.

🤖 Generated with Claude Code

Adds support for HiveServer2-compatible engines (Apache Hive, Kyuubi,
Spark Thrift Server) following the existing isolated-connector architecture.

- connectors/hive/ — new analytics-agent-connector-hive package:
  uses PyPI-stable pyhive[hive], pure-sasl, thrift-sasl (no git deps);
  supports NONE, NOSASL, LDAP, PLAIN, and KERBEROS auth modes
- factory.py — registers "hive" in _CONNECTOR_MAP so the UI install
  flow and env-var wiring work automatically
- sqlalchemy/engine.py — passes connect_args from connection config to
  create_engine(), enabling dialect-specific driver options for all
  SQLAlchemy-based connections (contributed by @wForget in #13)
- frontend — adds Hive/Kyuubi/Spark plugin to the data source picker

Closes #13

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
{ key: "host", label: "Host", type: "mono", placeholder: "kyuubi-host or localhost", required: true },
{ key: "port", label: "Port", type: "mono", placeholder: "10000" },
{ key: "database", label: "Database", type: "mono", placeholder: "default" },
{ key: "auth", label: "Auth", type: "mono", placeholder: "NONE (or NOSASL, LDAP, KERBEROS)" },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing kerberos_service_name

    { key: "kerberos_service_name",     label: "Kerberos Service Name",     type: "mono", placeholder: "hive" },

Comment thread connectors/hive/pyproject.toml Outdated
requires-python = ">=3.10"
dependencies = [
"mcp>=1.0.0",
"pyhive[hive]>=0.6.5",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For python 3.11+, we might need pyhive[hive_pure_sasl], see: https://github.com/apache/kyuubi/tree/master/python#requirements

    "pyhive[hive_pure_sasl]>=0.7.0",

Note: 'pyhive[hive]' extras uses sasl that doesn't support Python 3.11, See cloudera/python-sasl#30. Hence PyHive also supports pure-sasl via additional extras 'pyhive[hive_pure_sasl]' which support Python 3.11.

Furthermore, it seems that kerberos is required on the Kerberos environment.

    "kerberos>=1.3.0",

@wForget
Copy link
Copy Markdown
Contributor

wForget commented May 6, 2026

Thanks @shirshanka, after making the two changes above, I was able to successfully submit queries to our kyuubi server.

image image

…os_service_name UI field

- Switch from pyhive[hive]>=0.6.5 to pyhive[hive_pure_sasl]>=0.7.0 — the sasl
  extra relies on the `sasl` C library which doesn't build on Python 3.11+
- Add kerberos_service_name field to the Hive connection UI so Kerberos auth
  can be configured without manually editing env vars

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@shirshanka
Copy link
Copy Markdown
Contributor Author

Thanks @wForget — both fixes are now in: switched to pyhive[hive_pure_sasl]>=0.7.0 for Python 3.11+ compatibility and added the kerberos_service_name field to the UI. Great catch on both, and glad it's working on Kyuubi!

Copy link
Copy Markdown
Contributor

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shirshanka

@shirshanka shirshanka merged commit 7aaa12a into main May 7, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants