-
Notifications
You must be signed in to change notification settings - Fork 8
feat: Hive / Kyuubi / Spark Thrift Server connector #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # analytics-agent-connector-hive | ||
|
|
||
| Hive / Apache Kyuubi / Spark Thrift Server MCP connector for [Analytics Agent](https://github.com/datahub-project/analytics-agent). | ||
|
|
||
| Installed automatically when you add a Hive data source in the Analytics Agent UI. Can also be installed manually: | ||
|
|
||
| ```bash | ||
| uv tool install analytics-agent-connector-hive | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| All configuration is read from environment variables set by the analytics-agent core when it launches the connector subprocess. | ||
|
|
||
| | Variable | Default | Description | | ||
| |---|---|---| | ||
| | `HIVE_HOST` | *(required)* | HiveServer2 / Kyuubi host | | ||
| | `HIVE_PORT` | `10000` | HiveServer2 port | | ||
| | `HIVE_DATABASE` | `default` | Default database | | ||
| | `HIVE_AUTH` | `NONE` | Auth mode: `NONE`, `NOSASL`, `LDAP`, `PLAIN`, `KERBEROS` | | ||
| | `HIVE_USER` | | Username (required for LDAP/PLAIN, recommended for KERBEROS) | | ||
| | `HIVE_PASSWORD` | | Password (LDAP/PLAIN only) | | ||
| | `HIVE_KERBEROS_SERVICE_NAME` | `hive` | Kerberos service principal prefix | | ||
| | `SQL_ROW_LIMIT` | `500` | Maximum rows returned per query | | ||
|
|
||
| ## Auth modes | ||
|
|
||
| - **NONE / NOSASL** — no credentials needed; typical for local or trusted-network deployments | ||
| - **LDAP / PLAIN** — username + password | ||
| - **KERBEROS** — requires `kerberos` system library (`brew install krb5` / `apt-get install libkrb5-dev`) |
Empty file.
175 changes: 175 additions & 0 deletions
175
connectors/hive/analytics_agent_connector_hive/server.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| """Hive MCP connector for Analytics Agent. | ||
|
|
||
| Runs as a subprocess launched by the analytics-agent core via: | ||
| uvx analytics-agent-connector-hive | ||
|
|
||
| Reads all config from environment variables. Exposes 4 tools: | ||
| execute_sql, list_tables, get_schema, preview_table | ||
|
|
||
| Supported auth modes (HIVE_AUTH): | ||
| NONE — no authentication (default) | ||
| NOSASL — binary transport, no SASL wrapping | ||
| LDAP — username + password over SASL PLAIN | ||
| PLAIN — same as LDAP | ||
| KERBEROS — Kerberos/GSSAPI (requires kerberos system library) | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| import os | ||
| from typing import Any | ||
|
|
||
| import orjson | ||
| from mcp.server.fastmcp import FastMCP | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| SQL_ROW_LIMIT = int(os.environ.get("SQL_ROW_LIMIT", "500")) | ||
|
|
||
| mcp = FastMCP("hive-connector") | ||
|
|
||
| # ── Connection ───────────────────────────────────────────────────────────────── | ||
|
|
||
| _conn: Any = None | ||
|
|
||
|
|
||
| def _get_connection(): | ||
| global _conn | ||
| if _conn is None: | ||
| from pyhive import hive | ||
|
|
||
| host = os.environ.get("HIVE_HOST", "") | ||
| if not host: | ||
| raise RuntimeError("HIVE_HOST is not configured.") | ||
|
|
||
| kwargs: dict[str, Any] = { | ||
| "host": host, | ||
| "port": int(os.environ.get("HIVE_PORT", "10000")), | ||
| "database": os.environ.get("HIVE_DATABASE", "default"), | ||
| "auth": os.environ.get("HIVE_AUTH", "NONE").upper(), | ||
| } | ||
|
|
||
| user = os.environ.get("HIVE_USER", "") | ||
| password = os.environ.get("HIVE_PASSWORD", "") | ||
|
|
||
| if user: | ||
| kwargs["username"] = user | ||
| if password: | ||
| kwargs["password"] = password | ||
|
|
||
| kerberos_service = os.environ.get("HIVE_KERBEROS_SERVICE_NAME", "hive") | ||
| if kwargs["auth"] == "KERBEROS": | ||
| kwargs["kerberos_service_name"] = kerberos_service | ||
|
|
||
| _conn = hive.Connection(**kwargs) | ||
| return _conn | ||
|
|
||
|
|
||
| # ── SQL helpers ──────────────────────────────────────────────────────────────── | ||
|
|
||
| def _coerce(v: Any) -> Any: | ||
| import datetime | ||
| from decimal import Decimal | ||
|
|
||
| if isinstance(v, Decimal): | ||
| return float(v) if v % 1 else int(v) | ||
| if isinstance(v, (datetime.datetime, datetime.date)): | ||
| return v.isoformat() | ||
| if isinstance(v, bytes): | ||
| return v.hex() | ||
| return v | ||
|
|
||
|
|
||
| def _apply_limit(sql: str, limit: int) -> str: | ||
| effective = sql.strip().rstrip(";") | ||
| if effective.lstrip().upper().startswith("SELECT") and "LIMIT" not in effective.upper(): | ||
| return f"{effective} LIMIT {limit}" | ||
| return effective | ||
|
|
||
|
|
||
| def _run_query(sql: str, limit: int | None = None) -> dict: | ||
| effective_limit = limit or SQL_ROW_LIMIT | ||
| try: | ||
| conn = _get_connection() | ||
| except Exception as e: | ||
| return {"error": str(e), "columns": [], "rows": [], "truncated": False} | ||
|
|
||
| effective_sql = _apply_limit(sql, effective_limit) | ||
| try: | ||
| cursor = conn.cursor() | ||
| cursor.execute(effective_sql) | ||
| columns = [desc[0] for desc in cursor.description] if cursor.description else [] | ||
| rows = cursor.fetchall() | ||
| truncated = len(rows) >= effective_limit | ||
| coerced = [ | ||
| {c: _coerce(v) for c, v in zip(columns, row, strict=False)} for row in rows | ||
| ] | ||
| return {"columns": columns, "rows": coerced, "truncated": truncated} | ||
| except Exception as e: | ||
| return {"error": str(e), "columns": [], "rows": [], "truncated": False} | ||
|
|
||
|
|
||
| # ── MCP tools ────────────────────────────────────────────────────────────────── | ||
|
|
||
| @mcp.tool() | ||
| def execute_sql(sql: str) -> str: | ||
| """Execute a SQL query against the connected Hive/Kyuubi/Spark warehouse. Returns JSON with columns and rows.""" | ||
| return orjson.dumps(_run_query(sql, SQL_ROW_LIMIT)).decode() | ||
|
|
||
|
|
||
| @mcp.tool() | ||
| def list_tables(schema: str = "") -> str: | ||
| """List tables in the Hive database. Optionally filter by schema (database) name.""" | ||
| try: | ||
| conn = _get_connection() | ||
| cursor = conn.cursor() | ||
| if schema: | ||
| cursor.execute(f"SHOW TABLES IN {schema}") | ||
| else: | ||
| cursor.execute("SHOW TABLES") | ||
| rows = cursor.fetchall() | ||
| # pyhive SHOW TABLES returns (database, tableName, isTemporary) in some versions | ||
| # and just (tableName,) in others — normalise both. | ||
| tables = [] | ||
| for row in rows: | ||
| if len(row) >= 2: | ||
| tables.append({"schema": row[0], "name": row[1]}) | ||
| else: | ||
| tables.append({"name": row[0]}) | ||
| return orjson.dumps(tables).decode() | ||
| except Exception as e: | ||
| return orjson.dumps({"error": str(e)}).decode() | ||
|
|
||
|
|
||
| @mcp.tool() | ||
| def get_schema(table: str) -> str: | ||
| """Get the column schema for a Hive table. Use db.table notation for cross-database lookup.""" | ||
| try: | ||
| conn = _get_connection() | ||
| cursor = conn.cursor() | ||
| cursor.execute(f"DESCRIBE {table}") | ||
| rows = cursor.fetchall() | ||
| # DESCRIBE returns (col_name, data_type, comment) | ||
| columns = [ | ||
| {"name": row[0], "type": row[1], "comment": row[2] if len(row) > 2 else ""} | ||
| for row in rows | ||
| if row[0] and not row[0].startswith("#") # skip partition/detail sections | ||
| ] | ||
| return orjson.dumps(columns).decode() | ||
| except Exception as e: | ||
| return orjson.dumps({"error": str(e)}).decode() | ||
|
|
||
|
|
||
| @mcp.tool() | ||
| def preview_table(table: str, limit: int = 10) -> str: | ||
| """Preview the first N rows of a Hive table.""" | ||
| return orjson.dumps(_run_query(f"SELECT * FROM {table}", limit=limit)).decode() | ||
|
|
||
|
|
||
| def main() -> None: | ||
| mcp.run() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| [project] | ||
| name = "analytics-agent-connector-hive" | ||
| version = "0.1.0" | ||
| description = "Hive / Kyuubi / Spark Thrift Server MCP connector for Analytics Agent" | ||
| readme = "README.md" | ||
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "mcp>=1.0.0", | ||
| "pyhive[hive_pure_sasl]>=0.7.0", | ||
| "pure-sasl>=0.6.2", | ||
| "thrift-sasl>=0.4.3", | ||
| "orjson>=3.10.0", | ||
| ] | ||
|
|
||
| [project.scripts] | ||
| analytics-agent-connector-hive = "analytics_agent_connector_hive.server:main" | ||
|
|
||
| [build-system] | ||
| requires = ["hatchling"] | ||
| build-backend = "hatchling.build" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 18 additions & 0 deletions
18
frontend/src/components/Settings/connections/plugins/hive.tsx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| import { createSimplePlugin } from "../helpers"; | ||
|
|
||
| export const hivePlugin = createSimplePlugin({ | ||
| id: "hive", | ||
| serviceId: "hive", | ||
| label: "Hive / Kyuubi / Spark", | ||
| category: "engine", | ||
| description: "Connect to HiveServer2, Apache Kyuubi, or Spark Thrift Server", | ||
| fields: [ | ||
| { key: "host", label: "Host", type: "mono", placeholder: "kyuubi-host or localhost", required: true }, | ||
| { key: "port", label: "Port", type: "mono", placeholder: "10000" }, | ||
| { key: "database", label: "Database", type: "mono", placeholder: "default" }, | ||
| { key: "auth", label: "Auth", type: "mono", placeholder: "NONE (or NOSASL, LDAP, KERBEROS)" }, | ||
| { key: "user", label: "Username", type: "mono", placeholder: "analytics_user" }, | ||
| { key: "password", label: "Password", type: "password", placeholder: "LDAP/PLAIN only" }, | ||
| { key: "kerberos_service_name", label: "Kerberos Service Name", type: "mono", placeholder: "hive" }, | ||
| ], | ||
| }); | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing
kerberos_service_name