diff --git a/docs/holmes-investigation.md b/docs/holmes-investigation.md
new file mode 100644
index 00000000000..c104c4378b2
--- /dev/null
+++ b/docs/holmes-investigation.md
@@ -0,0 +1,108 @@
+# Holmes Investigation API
+
+The Holmes investigation API is an admin endpoint that runs [HolmesGPT](https://github.com/robusta-dev/holmesgpt) diagnostic investigations on ARO clusters. It creates a short-lived pod on the Hive AKS cluster that connects to the target cluster, runs diagnostic queries, and streams the results back to the caller.
+
+**Endpoint:** `POST /admin/subscriptions/{subscriptionId}/resourcegroups/{resourceGroup}/providers/Microsoft.RedHatOpenShift/openShiftClusters/{clusterName}/investigate`
+
+## Configuration Reference
+
+| Config | Env Var | Key Vault Secret (prod) | Default | Required |
+|--------|---------|------------------------|---------|----------|
+| Azure OpenAI API key | `HOLMES_AZURE_API_KEY` | `holmes-azure-api-key` | — | Yes |
+| Azure OpenAI endpoint | `HOLMES_AZURE_API_BASE` | `holmes-azure-api-base` | — | Yes |
+| HolmesGPT container image | `HOLMES_IMAGE` | — | `quay.io/haoran/holmesgpt:latest` | No |
+| Azure OpenAI API version | `HOLMES_AZURE_API_VERSION` | — | `2025-04-01-preview` | No |
+| LLM model name | `HOLMES_MODEL` | — | `azure/gpt-5.2` | No |
+| Pod timeout (seconds) | `HOLMES_DEFAULT_TIMEOUT` | — | `600` | No |
+| Max concurrent investigations per RP | `HOLMES_MAX_CONCURRENT` | — | `20` | No |
+
+## Config Loading
+
+Configuration is loaded once at RP startup in `NewFrontend` (`pkg/frontend/frontend.go`).
+
+**Development mode** (`RP_MODE=development`): All values are read from environment variables via `NewHolmesConfigFromEnv()`.
+
+**Production mode**: Sensitive values (API key, API base) are read from the service Key Vault (`{KEYVAULT_PREFIX}-svc`). Non-secret values (image, model, timeout, concurrency) are read from environment variables. This uses `NewHolmesConfig(ctx, serviceKeyvault)`.
+
+**Soft-load behavior**: If loading fails (e.g., Key Vault secrets not provisioned), the RP logs a warning and starts normally. Only the investigate endpoint returns an error ("Holmes investigation is not configured"). This allows the RP to operate without Holmes configured.
+
+The loaded config is stored on the `frontend` struct as `holmesConfig *holmes.HolmesConfig` and reused for all investigation requests.
+
+## How Config Reaches the Pod
+
+When an investigation request arrives, the RP creates three Kubernetes resources in the cluster's Hive namespace:
+
+1. **Secret** (`holmes-kubeconfig-{id}`) — Contains:
+   - `config`: Short-lived (1h) kubeconfig for `system:aro-diagnostics` identity
+   - `azure-api-key`: From `holmesConfig.AzureAPIKey`
+   - `azure-api-base`: From `holmesConfig.AzureAPIBase`
+   - `azure-api-version`: From `holmesConfig.AzureAPIVersion`
+
+2. **ConfigMap** (`holmes-config-{id}`) — Embedded toolset config from `pkg/hive/staticresources/holmes-config.yaml` (defines which kubectl commands Holmes can use)
+
+3. **Pod** (`holmes-investigate-{id}`) — Runs:
+   ```
+   python holmes_cli.py ask "<question>" -n --model=<Model> --config=/etc/holmes/config.yaml
+   ```
+   - Image from `holmesConfig.Image`
+   - `ActiveDeadlineSeconds` from `holmesConfig.DefaultTimeout`
+   - Azure credentials injected as environment variables from the Secret
+   - Kubeconfig mounted at `/etc/kubeconfig/config`
+
+All three resources are cleaned up after the investigation completes (or fails).
+
+## Development Setup
+
+1. Ensure prerequisites: VPN connected, `secrets/env` generated, `aks.kubeconfig` generated
+
+2. Export Holmes environment variables:
+   ```bash
+   source env && source secrets/env
+   export HIVE_KUBE_CONFIG_PATH=$(realpath aks.kubeconfig)
+   export ARO_INSTALL_VIA_HIVE=true
+   export ARO_ADOPT_BY_HIVE=true
+   export HOLMES_IMAGE="quay.io/haoran/holmesgpt:latest"
+   export HOLMES_AZURE_API_KEY="<your-azure-openai-key>"
+   export HOLMES_AZURE_API_BASE="<your-azure-openai-endpoint>"
+   ```
+
+3. Start the local RP: `make runlocal-rp`
+
+4. Run an investigation:
+   ```bash
+   ./hack/test-holmes-investigate.sh <cluster-name> "what is the cluster health status?"
+   ```
+
+## Key Vault Provisioning (Staging/Production)
+
+Create the following secrets in the service Key Vault (`{KEYVAULT_PREFIX}-svc`):
+
+| Secret Name | Value |
+|-------------|-------|
+| `holmes-azure-api-key` | Azure OpenAI API key |
+| `holmes-azure-api-base` | Azure OpenAI endpoint URL (e.g., `https://<resource>.openai.azure.com`) |
+
+Non-secret config (`HOLMES_IMAGE`, `HOLMES_MODEL`, etc.) is set via ARM deployment parameters in `pkg/deploy/generator/resources_rp.go` when added to the deployment template.
+
+## Security
+
+- **Cluster access**: Investigation pods use a `system:aro-diagnostics` identity with read-only RBAC (get/list/watch only). The kubeconfig certificate expires after 1 hour.
+- **Pod security**: Runs as non-root (UID 1000), no privilege escalation, all capabilities dropped, service account token not mounted.
+- **Toolset restrictions**: Destructive commands (`kubectl delete`, `kubectl apply`, `kubectl exec`, `rm`) are blocked in the Holmes toolset config.
+- **Rate limiting**: Per-RP-instance atomic counter limits concurrent investigations (default 20).
+- **Input validation**: Question limited to 1000 characters, control characters rejected, model name validated against safe character pattern.
+
+## Code Locations
+
+| Component | File |
+|-----------|------|
+| Config struct and loaders | `pkg/util/holmes/config.go` |
+| Config loading at startup | `pkg/frontend/frontend.go` (search `holmesConfig`) |
+| Admin API handler | `pkg/frontend/admin_openshiftcluster_investigate.go` |
+| Kubeconfig generation | `pkg/frontend/admin_openshiftcluster_investigate_kubeconfig.go` |
+| Pod creation and streaming | `pkg/hive/investigate.go` |
+| Kubeconfig transformation (dev) | `pkg/util/holmes/kubeconfig.go` |
+| Holmes toolset config | `pkg/hive/staticresources/holmes-config.yaml` |
+| RBAC ClusterRole | `pkg/operator/controllers/rbac/staticresources/clusterrole-diagnostics.yaml` |
+| RBAC ClusterRoleBinding | `pkg/operator/controllers/rbac/staticresources/clusterrolebinding-diagnostics.yaml` |
+| E2E test script | `hack/test-holmes-investigate.sh` |
diff --git a/hack/test-holmes-investigate.sh b/hack/test-holmes-investigate.sh
new file mode 100755
index 00000000000..1cb1948f3d9
--- /dev/null
+++ b/hack/test-holmes-investigate.sh
@@ -0,0 +1,91 @@
+#!/bin/bash
+# Test script for the Holmes investigation admin API endpoint.
+#
+# Prerequisites:
+#   1. VPN connected to the dev environment
+#   2. secrets/ folder generated: SECRET_SA_ACCOUNT_NAME=rharosecretsdev make secrets
+#   3. AKS kubeconfig generated: make aks.kubeconfig
+#   4. A test cluster created via: CLUSTER=<name> go run ./hack/cluster create
+#   5. Local RP running with Hive enabled (see below)
+#
+# Usage:
+#   ./hack/test-holmes-investigate.sh <cluster-name> [question]
+#
+# Examples:
+#   ./hack/test-holmes-investigate.sh haowang-holmes-test
+#   ./hack/test-holmes-investigate.sh haowang-holmes-test "why is pod X crashing?"
+#   ./hack/test-holmes-investigate.sh haowang-holmes-test "check node memory usage"
+#
+# To start the local RP with Hive + Holmes enabled:
+#
+#   source env && source secrets/env
+#   export HIVE_KUBE_CONFIG_PATH=$(realpath aks.kubeconfig)
+#   export ARO_INSTALL_VIA_HIVE=true
+#   export ARO_ADOPT_BY_HIVE=true
+#   export ARO_PODMAN_SOCKET="unix://$(podman machine inspect --format '{{.ConnectionInfo.PodmanSocket.Path}}')"
+#   export HOLMES_IMAGE="quay.io/haoran/holmesgpt:latest"
+#   export HOLMES_AZURE_API_KEY="<your-azure-openai-key>"
+#   export HOLMES_AZURE_API_BASE="<your-azure-openai-endpoint>"
+#   export HOLMES_AZURE_API_VERSION="2025-04-01-preview"
+#   export HOLMES_MODEL="azure/gpt-5.2"
+#   make runlocal-rp
+
+set -euo pipefail
+
+CLUSTER_NAME="${1:-}"
+QUESTION="${2:-what is the cluster health status?}"
+
+if [[ -z "$CLUSTER_NAME" ]]; then
+    echo "Usage: $0 <cluster-name> [question]"
+    echo ""
+    echo "Examples:"
+    echo "  $0 haowang-holmes-test"
+    echo "  $0 haowang-holmes-test 'why is pod X crashing?'"
+    exit 1
+fi
+
+# Source env if not already loaded
+if [[ -z "${AZURE_SUBSCRIPTION_ID:-}" ]]; then
+    if [[ -f env ]] && [[ -f secrets/env ]]; then
+        source env
+        source secrets/env
+    else
+        echo "Error: AZURE_SUBSCRIPTION_ID not set and env files not found."
+        echo "Run from the repo root, or source env && source secrets/env first."
+        exit 1
+    fi
+fi
+
+RESOURCEGROUP="${RESOURCEGROUP:-v4-eastus}"
+RP_URL="https://localhost:8443"
+API_PATH="/admin/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourcegroups/${RESOURCEGROUP}/providers/Microsoft.RedHatOpenShift/openShiftClusters/${CLUSTER_NAME}/investigate"
+
+echo "============================================"
+echo " Holmes Investigation Test"
+echo "============================================"
+echo " Cluster:  ${CLUSTER_NAME}"
+echo " RG:       ${RESOURCEGROUP}"
+echo " Question: ${QUESTION}"
+echo " Endpoint: POST ${RP_URL}${API_PATH}"
+echo "============================================"
+echo ""
+
+# Check RP is running
+if ! curl -sk -o /dev/null -w '' "${RP_URL}/healthz" 2>/dev/null; then
+    echo "Error: Local RP is not running at ${RP_URL}"
+    echo "Start it with: make runlocal-rp (see header comments for full env setup)"
+    exit 1
+fi
+
+echo "Sending investigation request..."
+echo "Streaming results (this may take 1-5 minutes):"
+echo "--------------------------------------------"
+
+curl -sk --no-buffer -X POST \
+    "${RP_URL}${API_PATH}" \
+    -H "Content-Type: application/json" \
+    -d "$(jq -n --arg q "${QUESTION}" '{question: $q}')"
+
+echo ""
+echo "--------------------------------------------"
+echo "Investigation complete."
diff --git a/pkg/cluster/kubeconfig.go b/pkg/cluster/kubeconfig.go
index 69e14eaef80..e87342dc272 100644
--- a/pkg/cluster/kubeconfig.go
+++ b/pkg/cluster/kubeconfig.go
@@ -25,13 +25,13 @@ import (
 // kubeconfig for the ARO service, based on the admin kubeconfig found in the
 // graph.
 func (m *manager) generateAROServiceKubeconfig(pg graph.PersistedGraph) ([]byte, error) {
-	return generateKubeconfig(pg, "system:aro-service", []string{"system:masters"}, installer.TenYears, true)
+	return GenerateKubeconfig(pg, "system:aro-service", []string{"system:masters"}, installer.TenYears, true)
 }
 
 // generateAROSREKubeconfig generates additional admin credentials and a
 // kubeconfig for ARO SREs, based on the admin kubeconfig found in the graph.
 func (m *manager) generateAROSREKubeconfig(pg graph.PersistedGraph) ([]byte, error) {
-	return generateKubeconfig(pg, "system:aro-sre", nil, installer.TenYears, true)
+	return GenerateKubeconfig(pg, "system:aro-sre", nil, installer.TenYears, true)
 }
 
 // checkUserAdminKubeconfigUpdated checks if the user kubeconfig is
@@ -82,7 +82,7 @@ func (m *manager) checkUserAdminKubeconfigUpdated() bool {
 // generateUserAdminKubeconfig generates additional admin credentials and a
 // kubeconfig for ARO User, based on the admin kubeconfig found in the graph.
 func (m *manager) generateUserAdminKubeconfig(pg graph.PersistedGraph) ([]byte, error) {
-	return generateKubeconfig(pg, "system:admin", nil, installer.OneYear, false)
+	return GenerateKubeconfig(pg, "system:admin", nil, installer.OneYear, false)
 }
 
 func (m *manager) generateKubeconfigs(ctx context.Context) error {
@@ -127,7 +127,8 @@ func (m *manager) generateKubeconfigs(ctx context.Context) error {
 	return err
 }
 
-func generateKubeconfig(pg graph.PersistedGraph, commonName string, organization []string, validity time.Duration, internal bool) ([]byte, error) {
+// GenerateKubeconfig generates a kubeconfig with a client certificate signed by the cluster CA.
+func GenerateKubeconfig(pg graph.PersistedGraph, commonName string, organization []string, validity time.Duration, internal bool) ([]byte, error) {
 	var ca *installer.AdminKubeConfigSignerCertKey
 	var adminInternalClient *installer.AdminInternalClient
 	err := pg.GetByName(false, "*tls.AdminKubeConfigSignerCertKey", &ca)
diff --git a/pkg/cluster/kubeconfig_test.go b/pkg/cluster/kubeconfig_test.go
index c006908dfc7..cf0f88890ba 100644
--- a/pkg/cluster/kubeconfig_test.go
+++ b/pkg/cluster/kubeconfig_test.go
@@ -143,6 +143,136 @@ func TestGenerateAROServiceKubeconfig(t *testing.T) {
 	}
 }
 
+func TestGenerateDiagnosticsKubeconfig(t *testing.T) {
+	validCaKey, validCaCerts, err := utiltls.GenerateKeyAndCertificate("validca", nil, nil, true, false)
+	if err != nil {
+		t.Fatal(err)
+	}
+	encodedKey, err := utilpem.Encode(validCaKey)
+	if err != nil {
+		t.Fatal(err)
+	}
+	encodedCert, err := utilpem.Encode(validCaCerts[0])
+	if err != nil {
+		t.Fatal(err)
+	}
+	ca := &installer.AdminKubeConfigSignerCertKey{
+		SelfSignedCertKey: installer.SelfSignedCertKey{
+			CertKey: installer.CertKey{
+				CertRaw: encodedCert,
+				KeyRaw:  encodedKey,
+			},
+		},
+	}
+
+	apiserverURL := "https://api-int.hash.rg.mydomain:6443"
+	clusterName := "api-hash-rg-mydomain:6443"
+	diagnosticsName := "system:aro-diagnostics"
+
+	adminInternalClient := &installer.AdminInternalClient{}
+	adminInternalClient.Config = &clientcmdv1.Config{
+		Clusters: []clientcmdv1.NamedCluster{
+			{
+				Name: clusterName,
+				Cluster: clientcmdv1.Cluster{
+					Server:                   apiserverURL,
+					CertificateAuthorityData: []byte("internal API Cert"),
+				},
+			},
+		},
+		AuthInfos: []clientcmdv1.NamedAuthInfo{},
+		Contexts: []clientcmdv1.NamedContext{
+			{
+				Name: diagnosticsName,
+				Context: clientcmdv1.Context{
+					Cluster:  clusterName,
+					AuthInfo: diagnosticsName,
+				},
+			},
+		},
+		CurrentContext: diagnosticsName,
+	}
+
+	pg := graph.PersistedGraph{}
+
+	caData, err := json.Marshal(ca)
+	if err != nil {
+		t.Fatal(err)
+	}
+	clientData, err := json.Marshal(adminInternalClient)
+	if err != nil {
+		t.Fatal(err)
+	}
+	pg["*kubeconfig.AdminInternalClient"] = clientData
+	pg["*tls.AdminKubeConfigSignerCertKey"] = caData
+
+	// Generate a 1-hour kubeconfig for system:aro-diagnostics
+	aroDiagnosticsClient, err := GenerateKubeconfig(pg, diagnosticsName, nil, time.Hour, true)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	var got *clientcmdv1.Config
+	err = yaml.Unmarshal(aroDiagnosticsClient, &got)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	innerpem := string(got.AuthInfos[0].AuthInfo.ClientCertificateData) + string(got.AuthInfos[0].AuthInfo.ClientKeyData)
+	_, innercert, err := utilpem.Parse([]byte(innerpem))
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	err = innercert[0].CheckSignatureFrom(validCaCerts[0])
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	issuer := innercert[0].Issuer.String()
+	if issuer != "CN=validca" {
+		t.Error(issuer)
+	}
+
+	subject := innercert[0].Subject.String()
+	if subject != "CN=system:aro-diagnostics" {
+		t.Error(subject)
+	}
+
+	// Verify no organization (no system:masters group)
+	if len(innercert[0].Subject.Organization) != 0 {
+		t.Errorf("expected no organization, got %v", innercert[0].Subject.Organization)
+	}
+
+	// Verify ~1 hour validity (not 10 years)
+	expectedExpiry := time.Now().Add(time.Hour)
+	if innercert[0].NotAfter.After(expectedExpiry.Add(5 * time.Minute)) {
+		t.Errorf("certificate expires too far in the future: %v", innercert[0].NotAfter)
+	}
+	if innercert[0].NotAfter.Before(expectedExpiry.Add(-5 * time.Minute)) {
+		t.Errorf("certificate expires too soon: %v", innercert[0].NotAfter)
+	}
+
+	keyUsage := innercert[0].KeyUsage
+	expectedKeyUsage := x509.KeyUsageKeyEncipherment | x509.KeyUsageDigitalSignature
+	if keyUsage != expectedKeyUsage {
+		t.Error("Invalid keyUsage.")
+	}
+
+	// Verify internal URL is preserved (not rewritten to external)
+	if got.Clusters[0].Cluster.Server != apiserverURL {
+		t.Errorf("expected server %s, got %s", apiserverURL, got.Clusters[0].Cluster.Server)
+	}
+
+	// validate the rest of the struct
+	got.AuthInfos = []clientcmdv1.NamedAuthInfo{}
+	want := adminInternalClient.Config
+
+	if !reflect.DeepEqual(got, want) {
+		t.Fatal(cmp.Diff(got, want))
+	}
+}
+
 func TestGenerateUserAdminKubeconfig(t *testing.T) {
 	validCaKey, validCaCerts, err := utiltls.GenerateKeyAndCertificate("validca", nil, nil, true, false)
 	if err != nil {
diff --git a/pkg/frontend/admin_openshiftcluster_investigate.go b/pkg/frontend/admin_openshiftcluster_investigate.go
new file mode 100644
index 00000000000..da48d7fa0a3
--- /dev/null
+++ b/pkg/frontend/admin_openshiftcluster_investigate.go
@@ -0,0 +1,159 @@
+package frontend
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"path/filepath"
+	"strings"
+	"sync/atomic"
+
+	"github.com/go-chi/chi/v5"
+	"github.com/sirupsen/logrus"
+
+	"github.com/Azure/ARO-RP/pkg/api"
+	"github.com/Azure/ARO-RP/pkg/database/cosmosdb"
+	"github.com/Azure/ARO-RP/pkg/frontend/middleware"
+)
+
+type investigateRequest struct {
+	Question string `json:"question"`
+}
+
+// trackingResponseWriter wraps http.ResponseWriter to track whether any bytes
+// have been written. This is used to avoid calling adminReply (which writes
+// JSON) after streaming has already started (which writes text/plain).
+type trackingResponseWriter struct {
+	http.ResponseWriter
+	written int64
+}
+
+func (tw *trackingResponseWriter) Write(b []byte) (int, error) {
+	n, err := tw.ResponseWriter.Write(b)
+	atomic.AddInt64(&tw.written, int64(n))
+	return n, err
+}
+
+func (tw *trackingResponseWriter) Flush() {
+	if flusher, ok := tw.ResponseWriter.(http.Flusher); ok {
+		flusher.Flush()
+	}
+}
+
+func (f *frontend) postAdminOpenShiftClusterInvestigate(w http.ResponseWriter, r *http.Request) {
+	ctx := r.Context()
+	log := ctx.Value(middleware.ContextKeyLog).(*logrus.Entry)
+	r.URL.Path = filepath.Dir(r.URL.Path)
+
+	tw := &trackingResponseWriter{ResponseWriter: w}
+	err := f._postAdminOpenShiftClusterInvestigate(ctx, r, log, tw)
+	if err != nil {
+		if atomic.LoadInt64(&tw.written) > 0 {
+			// Streaming already started — can't send a JSON error response.
+			// Log the error server-side instead.
+			log.WithError(err).Warn("investigation failed after streaming started")
+			return
+		}
+		adminReply(log, tw, nil, nil, err)
+	}
+}
+
+func (f *frontend) _postAdminOpenShiftClusterInvestigate(ctx context.Context, r *http.Request, log *logrus.Entry, w http.ResponseWriter) error {
+	resType, resName, resGroupName := chi.URLParam(r, "resourceType"), chi.URLParam(r, "resourceName"), chi.URLParam(r, "resourceGroupName")
+
+	// Parse request body from context (middleware buffers the body).
+	body := r.Context().Value(middleware.ContextKeyBody).([]byte)
+	var req investigateRequest
+	err := json.Unmarshal(body, &req)
+	if err != nil {
+		return api.NewCloudError(http.StatusBadRequest, api.CloudErrorCodeInvalidRequestContent, "", fmt.Sprintf("The request body could not be parsed: %v.", err))
+	}
+
+	if req.Question == "" {
+		return api.NewCloudError(http.StatusBadRequest, api.CloudErrorCodeInvalidParameter, "question", "The question parameter is required and must be non-empty.")
+	}
+
+	const maxQuestionLength = 1000
+	if len(req.Question) > maxQuestionLength {
+		return api.NewCloudError(http.StatusBadRequest, api.CloudErrorCodeInvalidParameter, "question", fmt.Sprintf("The question must not exceed %d characters.", maxQuestionLength))
+	}
+
+	// Reject control characters that could affect CLI argument parsing.
+	for _, ch := range req.Question {
+		if ch < 0x20 && ch != ' ' {
+			return api.NewCloudError(http.StatusBadRequest, api.CloudErrorCodeInvalidParameter, "question", "The question must not contain control characters.")
+		}
+	}
+
+	if f.holmesConfig == nil {
+		return api.NewCloudError(http.StatusInternalServerError, api.CloudErrorCodeInternalServerError, "", "Holmes investigation is not configured")
+	}
+
+	// Rate limit: reject if too many concurrent investigations are running.
+	// Use CAS loop so rejected requests don't temporarily inflate the counter.
+	// NOTE: This limit is per-RP-instance (in-memory atomic counter). With N
+	// replicas, the effective global limit is N * MaxConcurrentInvestigations.
+	// A distributed limiter (e.g., CosmosDB-backed) can be added if global
+	// quota enforcement is needed.
+	maxConcurrent := int64(f.holmesConfig.MaxConcurrentInvestigations)
+	for {
+		current := atomic.LoadInt64(&f.activeInvestigations)
+		if current >= maxConcurrent {
+			return api.NewCloudError(http.StatusTooManyRequests, api.CloudErrorCodeThrottlingLimitExceeded, "", fmt.Sprintf("Too many concurrent investigations (%d). Please try again later.", f.holmesConfig.MaxConcurrentInvestigations))
+		}
+		if atomic.CompareAndSwapInt64(&f.activeInvestigations, current, current+1) {
+			break
+		}
+	}
+	defer atomic.AddInt64(&f.activeInvestigations, -1)
+
+	resourceID := strings.TrimPrefix(r.URL.Path, "/admin")
+
+	dbOpenShiftClusters, err := f.dbGroup.OpenShiftClusters()
+	if err != nil {
+		return api.NewCloudError(http.StatusInternalServerError, api.CloudErrorCodeInternalServerError, "", err.Error())
+	}
+
+	doc, err := dbOpenShiftClusters.Get(ctx, resourceID)
+	switch {
+	case cosmosdb.IsErrorStatusCode(err, http.StatusNotFound):
+		return api.NewCloudError(http.StatusNotFound, api.CloudErrorCodeResourceNotFound, "", fmt.Sprintf("The Resource '%s/%s' under resource group '%s' was not found.", resType, resName, resGroupName))
+	case err != nil:
+		return err
+	}
+
+	if f.hiveClusterManager == nil {
+		return api.NewCloudError(http.StatusInternalServerError, api.CloudErrorCodeInternalServerError, "", "hive is not enabled")
+	}
+
+	hiveNamespace := doc.OpenShiftCluster.Properties.HiveProfile.Namespace
+	if hiveNamespace == "" {
+		return api.NewCloudError(http.StatusInternalServerError, api.CloudErrorCodeInternalServerError, "", "cluster does not have a Hive namespace configured")
+	}
+
+	// Generate a short-lived (1h) read-only kubeconfig for the diagnostics identity.
+	// This uses the cluster CA from the persisted graph to sign a fresh client cert.
+	// In development mode, the endpoint is rewritten from api-int.* to api.* since
+	// the Hive cluster cannot resolve private DNS there.
+	kubeconfig, err := f.generateDiagnosticsKubeconfig(ctx, log, doc)
+	if err != nil {
+		return fmt.Errorf("failed to generate diagnostics kubeconfig: %w", err)
+	}
+
+	log.Infof("starting Holmes investigation for cluster %s (question_length=%d)", resourceID, len(req.Question))
+
+	// Set Content-Type before streaming begins. Once bytes are written to w,
+	// the response is committed and errors cannot be reported via adminReply.
+	w.Header().Set("Content-Type", "text/plain")
+
+	err = f.hiveClusterManager.InvestigateCluster(ctx, hiveNamespace, kubeconfig, f.holmesConfig, req.Question, w)
+	if err != nil {
+		return fmt.Errorf("failed to investigate cluster: %w", err)
+	}
+
+	return nil
+}
diff --git a/pkg/frontend/admin_openshiftcluster_investigate_kubeconfig.go b/pkg/frontend/admin_openshiftcluster_investigate_kubeconfig.go
new file mode 100644
index 00000000000..c2ca492b949
--- /dev/null
+++ b/pkg/frontend/admin_openshiftcluster_investigate_kubeconfig.go
@@ -0,0 +1,80 @@
+package frontend
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"github.com/sirupsen/logrus"
+
+	"github.com/Azure/ARO-RP/pkg/api"
+	"github.com/Azure/ARO-RP/pkg/cluster"
+	"github.com/Azure/ARO-RP/pkg/cluster/graph"
+	"github.com/Azure/ARO-RP/pkg/env"
+	"github.com/Azure/ARO-RP/pkg/util/encryption"
+	"github.com/Azure/ARO-RP/pkg/util/holmes"
+	"github.com/Azure/ARO-RP/pkg/util/storage"
+	"github.com/Azure/ARO-RP/pkg/util/stringutils"
+)
+
+// generateDiagnosticsKubeconfig creates a short-lived (1 hour) kubeconfig for
+// the system:aro-diagnostics identity. The kubeconfig is generated on each
+// request using the cluster's CA from the persisted graph, so no long-lived
+// credentials are stored in CosmosDB.
+func (f *frontend) generateDiagnosticsKubeconfig(ctx context.Context, log *logrus.Entry, doc *api.OpenShiftClusterDocument) ([]byte, error) {
+	subscriptionDoc, err := f.getSubscriptionDocument(ctx, doc.Key)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get subscription document: %w", err)
+	}
+
+	credential, err := f.env.FPNewClientCertificateCredential(subscriptionDoc.Subscription.Properties.TenantID, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to create FP credential: %w", err)
+	}
+
+	options := f.env.Environment().ArmClientOptions()
+	storageManager, err := storage.NewManager(
+		subscriptionDoc.ID,
+		f.env.Environment().StorageEndpointSuffix,
+		credential,
+		doc.OpenShiftCluster.UsesWorkloadIdentity(),
+		options,
+	)
+	if err != nil {
+		return nil, fmt.Errorf("failed to create storage manager: %w", err)
+	}
+
+	clusterAead, err := encryption.NewMulti(ctx, f.env.ServiceKeyvault(), env.EncryptionSecretV2Name, env.EncryptionSecretName)
+	if err != nil {
+		return nil, fmt.Errorf("failed to create encryption client: %w", err)
+	}
+
+	graphManager := graph.NewManager(f.env, log, clusterAead, storageManager)
+	resourceGroup := stringutils.LastTokenByte(doc.OpenShiftCluster.Properties.ClusterProfile.ResourceGroupID, '/')
+	account := "cluster" + doc.OpenShiftCluster.Properties.StorageSuffix
+
+	pg, err := graphManager.LoadPersisted(ctx, resourceGroup, account)
+	if err != nil {
+		return nil, fmt.Errorf("failed to load persisted graph: %w", err)
+	}
+
+	kubeconfig, err := cluster.GenerateKubeconfig(pg, "system:aro-diagnostics", nil, time.Hour, true)
+	if err != nil {
+		return nil, fmt.Errorf("failed to generate diagnostics kubeconfig: %w", err)
+	}
+
+	// In development mode, the Hive cluster cannot resolve api-int.* private DNS
+	// names, so we rewrite to the external api.* endpoint. In production, the
+	// Hive cluster has proper network connectivity and should use api-int.* directly.
+	if f.env.IsLocalDevelopmentMode() {
+		kubeconfig, err = holmes.MakeExternalKubeconfig(kubeconfig)
+		if err != nil {
+			return nil, fmt.Errorf("failed to convert to external kubeconfig: %w", err)
+		}
+	}
+
+	return kubeconfig, nil
+}
diff --git a/pkg/frontend/admin_openshiftcluster_investigate_test.go b/pkg/frontend/admin_openshiftcluster_investigate_test.go
new file mode 100644
index 00000000000..34c8f31855a
--- /dev/null
+++ b/pkg/frontend/admin_openshiftcluster_investigate_test.go
@@ -0,0 +1,256 @@
+package frontend
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+
+	"github.com/go-chi/chi/v5"
+	"github.com/sirupsen/logrus"
+	"github.com/stretchr/testify/require"
+	"go.uber.org/mock/gomock"
+
+	"github.com/Azure/ARO-RP/pkg/api"
+	"github.com/Azure/ARO-RP/pkg/frontend/middleware"
+	"github.com/Azure/ARO-RP/pkg/metrics/noop"
+	"github.com/Azure/ARO-RP/pkg/util/holmes"
+	mock_hive "github.com/Azure/ARO-RP/pkg/util/mocks/hive"
+	testdatabase "github.com/Azure/ARO-RP/test/database"
+)
+
+const (
+	mockInvestigateSubID    = "00000000-0000-0000-0000-000000000001"
+	mockInvestigateTenantID = "00000000-0000-0000-0000-000000000002"
+)
+
+var testHolmesConfig = &holmes.HolmesConfig{
+	Image:                       "quay.io/test/holmesgpt:latest",
+	AzureAPIKey:                 "test-key",
+	AzureAPIBase:                "https://test.openai.azure.com",
+	AzureAPIVersion:             "2025-04-01-preview",
+	Model:                       "azure/gpt-4o",
+	DefaultTimeout:              600,
+	MaxConcurrentInvestigations: 20,
+}
+
+func investigateDatabaseFixture(dbFixture *testdatabase.Fixture) {
+	dbFixture.AddOpenShiftClusterDocuments(&api.OpenShiftClusterDocument{
+		Key: strings.ToLower(testdatabase.GetResourcePath(mockInvestigateSubID, "testCluster")),
+		OpenShiftCluster: &api.OpenShiftCluster{
+			ID: strings.ToLower(testdatabase.GetResourcePath(mockInvestigateSubID, "testCluster")),
+			Properties: api.OpenShiftClusterProperties{
+				ClusterProfile: api.ClusterProfile{
+					ResourceGroupID: fmt.Sprintf("/subscriptions/%s/resourceGroups/test-cluster", mockInvestigateSubID),
+				},
+				HiveProfile: api.HiveProfile{
+					Namespace: "aro-00000000-0000-0000-0000-000000000001",
+				},
+				StorageSuffix: "abcdef",
+			},
+		},
+	})
+
+	dbFixture.AddSubscriptionDocuments(&api.SubscriptionDocument{
+		ID: mockInvestigateSubID,
+		Subscription: &api.Subscription{
+			State: api.SubscriptionStateRegistered,
+			Properties: &api.SubscriptionProperties{
+				TenantID: mockInvestigateTenantID,
+			},
+		},
+	})
+}
+
+func investigateDatabaseFixtureNoHiveNamespace(dbFixture *testdatabase.Fixture) {
+	dbFixture.AddOpenShiftClusterDocuments(&api.OpenShiftClusterDocument{
+		Key: strings.ToLower(testdatabase.GetResourcePath(mockInvestigateSubID, "testCluster")),
+		OpenShiftCluster: &api.OpenShiftCluster{
+			ID: strings.ToLower(testdatabase.GetResourcePath(mockInvestigateSubID, "testCluster")),
+			Properties: api.OpenShiftClusterProperties{
+				ClusterProfile: api.ClusterProfile{
+					ResourceGroupID: fmt.Sprintf("/subscriptions/%s/resourceGroups/test-cluster", mockInvestigateSubID),
+				},
+			},
+		},
+	})
+
+	dbFixture.AddSubscriptionDocuments(&api.SubscriptionDocument{
+		ID: mockInvestigateSubID,
+		Subscription: &api.Subscription{
+			State: api.SubscriptionStateRegistered,
+			Properties: &api.SubscriptionProperties{
+				TenantID: mockInvestigateTenantID,
+			},
+		},
+	})
+}
+
+func TestPostAdminOpenShiftClusterInvestigate(t *testing.T) {
+	resourceID := strings.ToLower(testdatabase.GetResourcePath(mockInvestigateSubID, "testCluster"))
+
+	tests := []struct {
+		name           string
+		body           string
+		resourceID     string
+		fixture        func(*testdatabase.Fixture)
+		hiveEnabled    bool
+		holmesConfig   *holmes.HolmesConfig
+		mocks          func(*mock_hive.MockClusterManager)
+		wantStatusCode int
+		wantError      string
+	}{
+		{
+			name:           "empty body returns bad request",
+			body:           "",
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    true,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusBadRequest,
+			wantError:      "The request body could not be parsed",
+		},
+		{
+			name:           "empty question returns bad request",
+			body:           `{"question":""}`,
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    true,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusBadRequest,
+			wantError:      "The question parameter is required",
+		},
+		{
+			name:           "question with control characters returns bad request",
+			body:           `{"question":"what is\nthe status?"}`,
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    true,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusBadRequest,
+			wantError:      "must not contain control characters",
+		},
+		{
+			name:           "question too long returns bad request",
+			body:           `{"question":"` + strings.Repeat("a", 1001) + `"}`,
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    true,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusBadRequest,
+			wantError:      "The question must not exceed 1000 characters",
+		},
+		{
+			name:           "holmes not configured returns internal error",
+			body:           `{"question":"what is wrong?"}`,
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    true,
+			holmesConfig:   nil,
+			wantStatusCode: http.StatusInternalServerError,
+			wantError:      "Holmes investigation is not configured",
+		},
+		{
+			name:           "cluster not found returns not found",
+			body:           `{"question":"what is wrong?"}`,
+			resourceID:     strings.ToLower(testdatabase.GetResourcePath(mockInvestigateSubID, "nonexistent")),
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    true,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusNotFound,
+			wantError:      "was not found",
+		},
+		{
+			name:           "hive not enabled returns internal error",
+			body:           `{"question":"what is wrong?"}`,
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixture,
+			hiveEnabled:    false,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusInternalServerError,
+			wantError:      "hive is not enabled",
+		},
+		{
+			name:           "no hive namespace returns internal error",
+			body:           `{"question":"what is wrong?"}`,
+			resourceID:     resourceID,
+			fixture:        investigateDatabaseFixtureNoHiveNamespace,
+			hiveEnabled:    true,
+			holmesConfig:   testHolmesConfig,
+			wantStatusCode: http.StatusInternalServerError,
+			wantError:      "cluster does not have a Hive namespace configured",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			ti := newTestInfra(t).WithOpenShiftClusters().WithSubscriptions()
+			defer ti.done()
+
+			err := ti.buildFixtures(tt.fixture)
+			if err != nil {
+				t.Fatal(err)
+			}
+
+			var f *frontend
+
+			if tt.hiveEnabled {
+				controller := gomock.NewController(t)
+				defer controller.Finish()
+				clusterManager := mock_hive.NewMockClusterManager(controller)
+				if tt.mocks != nil {
+					tt.mocks(clusterManager)
+				}
+				f, err = NewFrontend(context.Background(), ti.auditLog, ti.log, ti.otelAudit, ti.env, ti.dbGroup, api.APIs, &noop.Noop{}, &noop.Noop{}, nil, clusterManager, nil, nil, nil, nil, nil)
+			} else {
+				f, err = NewFrontend(context.Background(), ti.auditLog, ti.log, ti.otelAudit, ti.env, ti.dbGroup, api.APIs, &noop.Noop{}, &noop.Noop{}, nil, nil, nil, nil, nil, nil, nil)
+			}
+			if err != nil {
+				t.Fatal(err)
+			}
+
+			// Override holmesConfig — NewFrontend soft-loads it (may be nil in test env).
+			f.holmesConfig = tt.holmesConfig
+
+			recorder := httptest.NewRecorder()
+			// The URL must include /investigate — the outer handler strips it via filepath.Dir.
+			request := httptest.NewRequest(http.MethodPost, "/admin"+tt.resourceID+"/investigate", nil)
+
+			ctx := context.WithValue(request.Context(), middleware.ContextKeyLog, logrus.NewEntry(logrus.StandardLogger()))
+			ctx = context.WithValue(ctx, middleware.ContextKeyBody, []byte(tt.body))
+			ctx = context.WithValue(ctx, chi.RouteCtxKey, &chi.Context{
+				URLParams: chi.RouteParams{
+					Keys:   []string{"resourceType", "resourceName", "resourceGroupName"},
+					Values: []string{"openshiftcluster", "testCluster", "resourceGroup"},
+				},
+			})
+			request = request.WithContext(ctx)
+
+			f.postAdminOpenShiftClusterInvestigate(recorder, request)
+
+			response := recorder.Result()
+			require.Equal(t, tt.wantStatusCode, response.StatusCode)
+
+			if tt.wantError != "" {
+				bodyBytes, err := io.ReadAll(response.Body)
+				require.NoError(t, err)
+
+				var cloudErr struct {
+					Error struct {
+						Message string `json:"message"`
+					} `json:"error"`
+				}
+				err = json.Unmarshal(bodyBytes, &cloudErr)
+				require.NoError(t, err)
+				require.Contains(t, cloudErr.Error.Message, tt.wantError)
+			}
+		})
+	}
+}
diff --git a/pkg/frontend/frontend.go b/pkg/frontend/frontend.go
index 7e2adfcf0bf..d0ae3e0130f 100644
--- a/pkg/frontend/frontend.go
+++ b/pkg/frontend/frontend.go
@@ -36,6 +36,7 @@ import (
 	"github.com/Azure/ARO-RP/pkg/util/clusterdata"
 	"github.com/Azure/ARO-RP/pkg/util/encryption"
 	"github.com/Azure/ARO-RP/pkg/util/heartbeat"
+	"github.com/Azure/ARO-RP/pkg/util/holmes"
 	utillog "github.com/Azure/ARO-RP/pkg/util/log"
 	"github.com/Azure/ARO-RP/pkg/util/log/audit"
 	"github.com/Azure/ARO-RP/pkg/util/recover"
@@ -94,6 +95,8 @@ type frontend struct {
 
 	hiveClusterManager    hive.ClusterManager
 	hiveSyncSetManager    hive.SyncSetManager
+	holmesConfig          *holmes.HolmesConfig
+	activeInvestigations  int64
 	kubeActionsFactory    kubeActionsFactory
 	azureActionsFactory   azureActionsFactory
 	appLensActionsFactory appLensActionsFactory
@@ -202,6 +205,18 @@ func NewFrontend(ctx context.Context,
 		streamResponder: defaultResponder{},
 	}
 
+	// Load Holmes config: secrets from Key Vault in prod, env vars in dev.
+	var holmesErr error
+	if _env.IsLocalDevelopmentMode() {
+		f.holmesConfig, holmesErr = holmes.NewHolmesConfigFromEnv()
+	} else {
+		f.holmesConfig, holmesErr = holmes.NewHolmesConfig(ctx, _env.ServiceKeyvault())
+	}
+	if holmesErr != nil {
+		baseLog.WithError(holmesErr).Warning("Holmes config not available; investigations will be disabled")
+		f.holmesConfig = nil
+	}
+
 	l, err := f.env.Listen()
 	if err != nil {
 		return nil, err
@@ -406,6 +421,8 @@ func (f *frontend) chiAuthenticatedRoutes(router chi.Router) {
 					})
 				})
 				r.Get("/selectors", f.getAdminOpenShiftClusterSelectors)
+
+				r.Post("/investigate", f.postAdminOpenShiftClusterInvestigate)
 			})
 		})
 
diff --git a/pkg/frontend/security_test.go b/pkg/frontend/security_test.go
index 60821da67c2..7fa4baf9e40 100644
--- a/pkg/frontend/security_test.go
+++ b/pkg/frontend/security_test.go
@@ -8,6 +8,7 @@ import (
 	"crypto/rsa"
 	"crypto/tls"
 	"crypto/x509"
+	"fmt"
 	"net/http"
 	"testing"
 	"time"
@@ -65,6 +66,7 @@ func TestSecurity(t *testing.T) {
 
 	keyvault := mock_azsecrets.NewMockClient(controller)
 	keyvault.EXPECT().GetSecret(gomock.Any(), env.RPServerSecretName, "", nil).AnyTimes().Return(azsecrets.GetSecretResponse{Secret: azsecrets.Secret{Value: pointerutils.ToPtr(string(serverPki))}}, nil)
+	keyvault.EXPECT().GetSecret(gomock.Any(), gomock.Not(gomock.Eq(env.RPServerSecretName)), gomock.Any(), gomock.Any()).AnyTimes().Return(azsecrets.GetSecretResponse{}, fmt.Errorf("secret not found"))
 
 	_env := mock_env.NewMockInterface(controller)
 	_env.EXPECT().IsLocalDevelopmentMode().AnyTimes().Return(false)
diff --git a/pkg/frontend/shared_test.go b/pkg/frontend/shared_test.go
index 120356e8fd2..d368943e6a0 100644
--- a/pkg/frontend/shared_test.go
+++ b/pkg/frontend/shared_test.go
@@ -114,6 +114,8 @@ func newTestInfraWithFeatures(t *testing.T, features map[env.Feature]bool) *test
 
 	keyvault := mock_azsecrets.NewMockClient(controller)
 	keyvault.EXPECT().GetSecret(gomock.Any(), env.RPServerSecretName, "", nil).AnyTimes().Return(azsecrets.GetSecretResponse{Secret: azsecrets.Secret{Value: pointerutils.ToPtr(string(serverPki))}}, nil)
+	// Return "not found" for any secret other than RPServerSecretName (e.g., Holmes config secrets).
+	keyvault.EXPECT().GetSecret(gomock.Any(), gomock.Not(gomock.Eq(env.RPServerSecretName)), gomock.Any(), gomock.Any()).AnyTimes().Return(azsecrets.GetSecretResponse{}, fmt.Errorf("secret not found"))
 
 	log := logrus.NewEntry(logrus.StandardLogger())
 
diff --git a/pkg/hive/investigate.go b/pkg/hive/investigate.go
new file mode 100644
index 00000000000..58cae1ff455
--- /dev/null
+++ b/pkg/hive/investigate.go
@@ -0,0 +1,320 @@
+package hive
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"time"
+
+	_ "embed"
+
+	"github.com/google/uuid"
+
+	corev1 "k8s.io/api/core/v1"
+	"k8s.io/apimachinery/pkg/api/resource"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/apimachinery/pkg/util/wait"
+
+	"github.com/Azure/ARO-RP/pkg/util/holmes"
+	"github.com/Azure/ARO-RP/pkg/util/pointerutils"
+)
+
+//go:embed staticresources/holmes-config.yaml
+var holmesConfigYAML string
+
+// InvestigateCluster creates an investigation pod on the Hive cluster, streams its logs, and cleans up.
+// It accepts kubeconfig bytes, creates a temporary secret to hold them, and removes
+// the secret (along with the pod and configmap) when the investigation completes.
+func (hr *clusterManager) InvestigateCluster(ctx context.Context, hiveNamespace string, kubeconfig []byte, holmesConfig *holmes.HolmesConfig, question string, w io.Writer) error {
+	id := uuid.New().String()[:8]
+	configMapName := "holmes-config-" + id
+	podName := "holmes-investigate-" + id
+	kubeconfigSecretName := "holmes-kubeconfig-" + id
+
+	hr.log.Infof("starting Holmes investigation %s in namespace %s", id, hiveNamespace)
+
+	// Ensure cleanup of the secret, ConfigMap, and pod on exit.
+	defer func() {
+		cleanupCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+		defer cancel()
+
+		hr.log.Infof("cleaning up investigation pod %s", podName)
+		err := hr.kubernetescli.CoreV1().Pods(hiveNamespace).Delete(cleanupCtx, podName, metav1.DeleteOptions{})
+		if err != nil {
+			hr.log.Warningf("failed to delete investigation pod %s: %v", podName, err)
+		}
+
+		hr.log.Infof("cleaning up investigation configmap %s", configMapName)
+		err = hr.kubernetescli.CoreV1().ConfigMaps(hiveNamespace).Delete(cleanupCtx, configMapName, metav1.DeleteOptions{})
+		if err != nil {
+			hr.log.Warningf("failed to delete investigation configmap %s: %v", configMapName, err)
+		}
+
+		hr.log.Infof("cleaning up investigation secret %s", kubeconfigSecretName)
+		err = hr.kubernetescli.CoreV1().Secrets(hiveNamespace).Delete(cleanupCtx, kubeconfigSecretName, metav1.DeleteOptions{})
+		if err != nil {
+			hr.log.Warningf("failed to delete investigation secret %s: %v", kubeconfigSecretName, err)
+		}
+	}()
+
+	// 0. Create the temporary secret holding the kubeconfig.
+	kubeconfigSecret := &corev1.Secret{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:      kubeconfigSecretName,
+			Namespace: hiveNamespace,
+		},
+		Data: map[string][]byte{
+			"config":            kubeconfig,
+			"azure-api-key":     []byte(holmesConfig.AzureAPIKey),
+			"azure-api-base":    []byte(holmesConfig.AzureAPIBase),
+			"azure-api-version": []byte(holmesConfig.AzureAPIVersion),
+		},
+	}
+
+	_, err := hr.kubernetescli.CoreV1().Secrets(hiveNamespace).Create(ctx, kubeconfigSecret, metav1.CreateOptions{})
+	if err != nil {
+		return fmt.Errorf("failed to create investigation kubeconfig secret: %w", err)
+	}
+
+	// 1. Create the ConfigMap with Holmes toolsets config.
+	configMap := &corev1.ConfigMap{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:      configMapName,
+			Namespace: hiveNamespace,
+		},
+		Data: map[string]string{
+			"config.yaml": holmesConfigYAML,
+		},
+	}
+
+	_, err = hr.kubernetescli.CoreV1().ConfigMaps(hiveNamespace).Create(ctx, configMap, metav1.CreateOptions{})
+	if err != nil {
+		return fmt.Errorf("failed to create investigation configmap: %w", err)
+	}
+
+	// 2. Create the investigation pod.
+	activeDeadlineSeconds := int64(holmesConfig.DefaultTimeout)
+	runAsUser := int64(1000)
+	pod := &corev1.Pod{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:      podName,
+			Namespace: hiveNamespace,
+		},
+		Spec: corev1.PodSpec{
+			AutomountServiceAccountToken: pointerutils.ToPtr(false),
+			ActiveDeadlineSeconds:        &activeDeadlineSeconds,
+			RestartPolicy:                corev1.RestartPolicyNever,
+			Containers: []corev1.Container{
+				{
+					Name:            "holmes",
+					Image:           holmesConfig.Image,
+					ImagePullPolicy: corev1.PullAlways,
+					Command:         []string{"python", "holmes_cli.py"},
+					Args:            []string{"ask", question, "-n", "--model=" + holmesConfig.Model, "--config=/etc/holmes/config.yaml"},
+					Env: []corev1.EnvVar{
+						{
+							Name: "AZURE_API_KEY",
+							ValueFrom: &corev1.EnvVarSource{
+								SecretKeyRef: &corev1.SecretKeySelector{
+									LocalObjectReference: corev1.LocalObjectReference{Name: kubeconfigSecretName},
+									Key:                  "azure-api-key",
+								},
+							},
+						},
+						{
+							Name: "AZURE_API_BASE",
+							ValueFrom: &corev1.EnvVarSource{
+								SecretKeyRef: &corev1.SecretKeySelector{
+									LocalObjectReference: corev1.LocalObjectReference{Name: kubeconfigSecretName},
+									Key:                  "azure-api-base",
+								},
+							},
+						},
+						{
+							Name: "AZURE_API_VERSION",
+							ValueFrom: &corev1.EnvVarSource{
+								SecretKeyRef: &corev1.SecretKeySelector{
+									LocalObjectReference: corev1.LocalObjectReference{Name: kubeconfigSecretName},
+									Key:                  "azure-api-version",
+								},
+							},
+						},
+						{
+							Name:  "KUBECONFIG",
+							Value: "/etc/kubeconfig/config",
+						},
+					},
+					VolumeMounts: []corev1.VolumeMount{
+						{
+							Name:      "kubeconfig",
+							MountPath: "/etc/kubeconfig",
+							ReadOnly:  true,
+						},
+						{
+							Name:      "holmes-config",
+							MountPath: "/etc/holmes/config.yaml",
+							SubPath:   "config.yaml",
+							ReadOnly:  true,
+						},
+						{
+							Name:      "tmp",
+							MountPath: "/tmp",
+						},
+						{
+							Name:      "holmes-cache",
+							MountPath: "/.holmes",
+						},
+					},
+					SecurityContext: &corev1.SecurityContext{
+						RunAsUser:                &runAsUser,
+						RunAsNonRoot:             pointerutils.ToPtr(true),
+						AllowPrivilegeEscalation: pointerutils.ToPtr(false),
+						Capabilities: &corev1.Capabilities{
+							Drop: []corev1.Capability{"ALL"},
+						},
+					},
+					Resources: corev1.ResourceRequirements{
+						Requests: corev1.ResourceList{
+							corev1.ResourceCPU:    resource.MustParse("100m"),
+							corev1.ResourceMemory: resource.MustParse("256Mi"),
+						},
+						Limits: corev1.ResourceList{
+							corev1.ResourceCPU:    resource.MustParse("1"),
+							corev1.ResourceMemory: resource.MustParse("2Gi"),
+						},
+					},
+				},
+			},
+			Volumes: []corev1.Volume{
+				{
+					Name: "kubeconfig",
+					VolumeSource: corev1.VolumeSource{
+						Secret: &corev1.SecretVolumeSource{
+							SecretName: kubeconfigSecretName,
+							Items: []corev1.KeyToPath{
+								{
+									Key:  "config",
+									Path: "config",
+								},
+							},
+						},
+					},
+				},
+				{
+					Name: "holmes-config",
+					VolumeSource: corev1.VolumeSource{
+						ConfigMap: &corev1.ConfigMapVolumeSource{
+							LocalObjectReference: corev1.LocalObjectReference{
+								Name: configMapName,
+							},
+						},
+					},
+				},
+				{
+					Name: "tmp",
+					VolumeSource: corev1.VolumeSource{
+						EmptyDir: &corev1.EmptyDirVolumeSource{},
+					},
+				},
+				{
+					Name: "holmes-cache",
+					VolumeSource: corev1.VolumeSource{
+						EmptyDir: &corev1.EmptyDirVolumeSource{},
+					},
+				},
+			},
+		},
+	}
+
+	_, err = hr.kubernetescli.CoreV1().Pods(hiveNamespace).Create(ctx, pod, metav1.CreateOptions{})
+	if err != nil {
+		return fmt.Errorf("failed to create investigation pod: %w", err)
+	}
+
+	// 3. Wait for the pod to be running.
+	err = hr.waitForPodRunning(ctx, hiveNamespace, podName, 60*time.Second)
+	if err != nil {
+		return fmt.Errorf("failed waiting for investigation pod to start: %w", err)
+	}
+
+	// 4. Stream pod logs.
+	err = hr.streamPodLogs(ctx, hiveNamespace, podName, w)
+	if err != nil {
+		return fmt.Errorf("failed to stream investigation pod logs: %w", err)
+	}
+
+	return nil
+}
+
+func (hr *clusterManager) waitForPodRunning(ctx context.Context, namespace, name string, timeout time.Duration) error {
+	timeoutCtx, cancel := context.WithTimeout(ctx, timeout)
+	defer cancel()
+
+	return wait.PollImmediateUntil(2*time.Second, func() (bool, error) {
+		pod, err := hr.kubernetescli.CoreV1().Pods(namespace).Get(timeoutCtx, name, metav1.GetOptions{})
+		if err != nil {
+			return false, fmt.Errorf("failed to get pod %s: %w", name, err)
+		}
+
+		switch pod.Status.Phase {
+		case corev1.PodRunning, corev1.PodSucceeded:
+			return true, nil
+		case corev1.PodFailed:
+			reason := pod.Status.Reason
+			message := pod.Status.Message
+			if len(pod.Status.ContainerStatuses) > 0 {
+				cs := pod.Status.ContainerStatuses[0]
+				if cs.State.Terminated != nil {
+					reason = cs.State.Terminated.Reason
+					message = cs.State.Terminated.Message
+				} else if cs.State.Waiting != nil {
+					reason = cs.State.Waiting.Reason
+					message = cs.State.Waiting.Message
+				}
+			}
+			return false, fmt.Errorf("pod %s failed: reason=%s message=%s", name, reason, message)
+		}
+
+		return false, nil
+	}, timeoutCtx.Done())
+}
+
+func (hr *clusterManager) streamPodLogs(ctx context.Context, namespace, name string, w io.Writer) error {
+	req := hr.kubernetescli.CoreV1().Pods(namespace).GetLogs(name, &corev1.PodLogOptions{
+		Follow: true,
+	})
+
+	stream, err := req.Stream(ctx)
+	if err != nil {
+		return fmt.Errorf("failed to open log stream for pod %s: %w", name, err)
+	}
+	defer stream.Close()
+
+	// Read the log stream in chunks and flush after each write so the client
+	// sees output in real-time instead of only when the pod exits.
+	flusher, canFlush := w.(interface{ Flush() })
+	buf := make([]byte, 4096)
+	for {
+		n, readErr := stream.Read(buf)
+		if n > 0 {
+			_, writeErr := w.Write(buf[:n])
+			if writeErr != nil {
+				return fmt.Errorf("failed to write log stream for pod %s: %w", name, writeErr)
+			}
+			if canFlush {
+				flusher.Flush()
+			}
+		}
+		if readErr != nil {
+			if readErr == io.EOF {
+				break
+			}
+			return fmt.Errorf("failed to read log stream for pod %s: %w", name, readErr)
+		}
+	}
+
+	return nil
+}
diff --git a/pkg/hive/manager.go b/pkg/hive/manager.go
index 673692afc79..6200ce6bd72 100644
--- a/pkg/hive/manager.go
+++ b/pkg/hive/manager.go
@@ -7,6 +7,7 @@ import (
 	"context"
 	"errors"
 	"fmt"
+	"io"
 	"reflect"
 	"sort"
 	"strings"
@@ -31,6 +32,7 @@ import (
 	"github.com/Azure/ARO-RP/pkg/env"
 	"github.com/Azure/ARO-RP/pkg/hive/failure"
 	"github.com/Azure/ARO-RP/pkg/util/dynamichelper"
+	"github.com/Azure/ARO-RP/pkg/util/holmes"
 	utillog "github.com/Azure/ARO-RP/pkg/util/log"
 )
 
@@ -53,6 +55,7 @@ type ClusterManager interface {
 	GetClusterSync(ctx context.Context, oc *api.OpenShiftCluster) (*hivev1alpha1.ClusterSync, error)
 	ListHiveK8sObjects(ctx context.Context, resource, namespace string) ([]byte, error)
 	GetHiveK8sObject(ctx context.Context, resource, namespace, name string) ([]byte, error)
+	InvestigateCluster(ctx context.Context, hiveNamespace string, kubeconfig []byte, holmesConfig *holmes.HolmesConfig, question string, w io.Writer) error
 }
 
 type clusterManager struct {
diff --git a/pkg/hive/staticresources/holmes-config.yaml b/pkg/hive/staticresources/holmes-config.yaml
new file mode 100644
index 00000000000..b921e634508
--- /dev/null
+++ b/pkg/hive/staticresources/holmes-config.yaml
@@ -0,0 +1,128 @@
+toolsets:
+  # ========== ENABLED ==========
+  kubectl-run:
+    enabled: true
+  kubernetes/kube-prometheus-stack:
+    enabled: true
+  kubernetes/logs:
+    enabled: true
+  kubernetes/core:
+    enabled: true
+  kubernetes/live-metrics:
+    enabled: true
+  bash:
+    enabled: true
+    config:
+      builtin_allowlist: extended
+      allow:
+        - "kubectl get"
+        - "kubectl describe"
+        - "kubectl logs"
+        - "kubectl top"
+        - "kubectl cluster-info"
+        - "kubectl explain"
+        - "kubectl api-resources"
+        - "kubectl version"
+        - "egrep"
+      deny:
+        - "kubectl delete"
+        - "kubectl apply"
+        - "kubectl create"
+        - "kubectl edit"
+        - "kubectl exec"
+        - "kubectl patch"
+        - "kubectl scale"
+        - "kubectl drain"
+        - "kubectl cordon"
+        - "kubectl taint"
+        - "kubectl debug"
+        - "rm"
+        - "oc"
+
+  # ========== DISABLED ==========
+  core_investigation:
+    enabled: false
+  openshift/core:
+    enabled: false
+  openshift/logs:
+    enabled: false
+  openshift/security:
+    enabled: false
+  openshift/live-metrics:
+    enabled: false
+  runbook:
+    enabled: false
+  internet:
+    enabled: false
+  connectivity_check:
+    enabled: false
+  robusta:
+    enabled: false
+  kubernetes/krew-extras:
+    enabled: false
+  kubernetes/kube-lineage-extras:
+    enabled: false
+  aks/core:
+    enabled: false
+  aks/node-health:
+    enabled: false
+  argocd/core:
+    enabled: false
+  cilium/core:
+    enabled: false
+  hubble/observability:
+    enabled: false
+  docker/core:
+    enabled: false
+  helm/core:
+    enabled: false
+  grafana/dashboards:
+    enabled: false
+  grafana/loki:
+    enabled: false
+  grafana/tempo:
+    enabled: false
+  prometheus/metrics:
+    enabled: false
+  datadog/logs:
+    enabled: false
+  datadog/general:
+    enabled: false
+  datadog/metrics:
+    enabled: false
+  datadog/traces:
+    enabled: false
+  elasticsearch/data:
+    enabled: false
+  elasticsearch/cluster:
+    enabled: false
+  opensearch/query_assist:
+    enabled: false
+  coralogix:
+    enabled: false
+  newrelic:
+    enabled: false
+  kafka/admin:
+    enabled: false
+  rabbitmq/core:
+    enabled: false
+  notion:
+    enabled: false
+  confluence:
+    enabled: false
+  slab:
+    enabled: false
+  servicenow/tables:
+    enabled: false
+  azure/sql:
+    enabled: false
+  MongoDBAtlas:
+    enabled: false
+  database/sql:
+    enabled: false
+  inspektor-gadget/node:
+    enabled: false
+  inspektor-gadget/tcpdump:
+    enabled: false
+  kubevela/core:
+    enabled: false
diff --git a/pkg/operator/controllers/rbac/bindata.go b/pkg/operator/controllers/rbac/bindata.go
index e4f8ca6b9b3..4fc4562c6c9 100644
--- a/pkg/operator/controllers/rbac/bindata.go
+++ b/pkg/operator/controllers/rbac/bindata.go
@@ -1,6 +1,8 @@
 // Code generated for package rbac by go-bindata DO NOT EDIT. (@generated)
 // sources:
+// staticresources/clusterrole-diagnostics.yaml
 // staticresources/clusterrole.yaml
+// staticresources/clusterrolebinding-diagnostics.yaml
 // staticresources/clusterrolebinding.yaml
 package rbac
 
@@ -78,6 +80,26 @@ func (fi bindataFileInfo) Sys() interface{} {
 	return nil
 }
 
+var _clusterroleDiagnosticsYaml = []byte("\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\xa4\x56\x3d\x6f\xf3\x46\x0c\xde\xfd\x2b\x84\x77\x09\x50\x20\x0e\xba\x15\x5e\x3b\x74\xe9\x14\xa0\xdd\xe9\x13\x2d\xb1\x3e\x1d\xaf\x24\x4f\x81\xf3\xeb\x0b\xe9\x28\x3b\x8e\xe3\xfa\x85\xb2\x91\x3c\xea\xf8\x3c\xfc\x3a\x41\xa6\xbf\x51\x94\x38\xed\x1a\xd9\x43\xd8\x42\xb1\x9e\x85\xde\xc1\x88\xd3\xf6\xf8\x9b\x6e\x89\x5f\xc6\x5f\x37\x47\x4a\xed\xae\xf9\x3d\x16\x35\x94\x57\x8e\xb8\x19\xd0\xa0\x05\x83\xdd\xa6\x69\x12\x0c\xb8\x6b\xf4\xa4\x86\xc3\x0e\x84\x9f\x5b\x82\x2e\xb1\x1a\x05\xdd\x48\x89\xa8\xbb\xcd\x73\x03\x99\xfe\x10\x2e\x59\xa7\x4f\x9e\x9b\x1f\x3f\x36\x4d\x23\xa8\x5c\x24\xa0\xdb\x32\xb7\x7a\x16\x5e\x22\x77\xb3\x92\xb8\xc5\x6a\x56\x94\x91\x82\x2b\x98\xda\xcc\x94\xac\x6a\x81\xd3\x81\xba\x01\xb2\x1f\x8e\xb8\x9c\x4c\xe8\x34\xc3\xf2\x59\x9e\x08\xab\x61\xb2\x91\x63\x19\x30\x44\xa0\xa1\x1e\x09\xe6\x48\x61\xe6\x1e\x38\x99\x70\x8c\x28\xcb\x51\x05\xfa\x6f\x61\x83\x2b\x30\x10\x02\x97\x25\x56\xa4\x81\x4c\x20\x75\x73\xb0\x11\x65\xef\xcc\x3a\x34\x77\xd0\x2a\xbc\x81\x85\xfe\x36\x2b\x90\x67\x02\x9f\xf2\xd2\x62\x8e\x7c\x1a\xce\x94\x5a\xc0\x81\x93\xa2\xab\x6a\x60\x78\x28\xf1\x6c\x70\x22\xae\xaf\x80\xb1\x9f\xcd\x37\x38\xfe\xe1\xbd\x67\x5b\x38\xb9\xb2\xe2\xf6\x84\xf6\xc6\x72\xa4\xd4\x79\x8b\xdd\x46\x72\x97\xcc\x91\x02\x79\xe5\x28\x75\x82\xaa\x6b\x53\x3b\xdf\x75\xfa\xb2\xe9\x5a\x52\x29\x79\xaa\xfb\xbe\xb4\xdd\xea\xac\xa9\xb1\x40\x87\x77\x49\xf9\x79\x88\xe0\x2c\x6e\xbb\xb1\x5a\xab\x0c\x66\x10\xfa\x4b\xd5\x83\xd2\x65\x14\x82\x52\x2b\x34\xd6\xfe\x5c\xd3\x68\xc5\x58\x03\x44\x4a\xdd\x2d\xd0\x79\x07\x70\x32\x88\x99\xdb\xc5\x73\x75\xa8\xbb\x9b\xe5\x36\xb0\x70\x74\x7e\x93\xb4\xa7\xd4\x52\xea\x9c\x70\x5d\x3e\x17\x8f\x0f\x86\x0f\x8e\x2b\x67\x6e\xcb\x19\x93\xf6\x74\xb0\x2f\x71\x5d\x06\xb0\xee\x99\xb5\x99\xe0\x62\xf8\x20\xd4\xec\xb3\xf2\xfe\x01\x42\x4f\xe9\x51\x04\xf7\xd2\x2b\x65\x59\x1d\xae\xf7\x08\xd1\xfa\xd0\x63\x38\xae\xc4\x52\x13\xf5\x00\x8a\xd7\x70\xac\x8f\xd0\x55\x5d\x39\xa3\x80\xb1\x2c\xb3\x7f\x10\x50\x93\x12\xac\xc8\x37\xd3\x53\x91\x15\xa9\xad\xf8\x53\xb9\xba\x54\xfd\x93\x29\x33\xc7\xb5\x68\x38\x91\xb1\x4c\x5b\x30\xb0\x20\xeb\x36\xf0\xf0\xc5\x7a\x12\x1e\xd0\x7a\x2c\x3a\xbf\xa4\x1f\x9f\x1e\xbf\xa1\xda\xa6\x01\xb5\x01\x12\x74\xab\x07\x75\x49\xf9\x83\x9c\x3c\xfd\xf2\xf4\xbd\xfb\xf5\x7f\x09\x6b\xd9\x6b\x10\x9a\xd7\xf1\x55\x47\x38\xeb\xab\x66\xa1\xa4\x06\x31\xe6\x08\x6e\x58\x62\x74\x73\xdc\x95\x95\x41\x13\x0a\x7a\xff\x71\x3a\x6f\x61\xff\x61\xb9\x17\x63\x72\x4d\xaf\xfe\xf1\x5f\xaf\x7f\xba\xcf\x4b\x9d\xae\xf7\xaa\x44\x1a\xd1\x45\x41\x68\x4f\x2e\x3b\xcd\xaa\x38\xa2\xdb\x50\xff\x05\x00\x00\xff\xff\x1d\xa1\x45\xf4\xc2\x09\x00\x00")
+
+func clusterroleDiagnosticsYamlBytes() ([]byte, error) {
+	return bindataRead(
+		_clusterroleDiagnosticsYaml,
+		"clusterrole-diagnostics.yaml",
+	)
+}
+
+func clusterroleDiagnosticsYaml() (*asset, error) {
+	bytes, err := clusterroleDiagnosticsYamlBytes()
+	if err != nil {
+		return nil, err
+	}
+
+	info := bindataFileInfo{name: "clusterrole-diagnostics.yaml", size: 0, mode: os.FileMode(0), modTime: time.Unix(0, 0)}
+	a := &asset{bytes: bytes, info: info}
+	return a, nil
+}
+
 var _clusterroleYaml = []byte("\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\xbc\x5a\x5f\x8f\xe3\xb8\x0d\x7f\x9f\x4f\x61\x5c\x1f\x0e\x28\x30\x59\xf4\xad\x98\x3e\xde\x15\x45\x81\xa2\x07\x2c\xb6\x7d\x67\x64\xc6\xe6\x45\x16\xb5\x14\x95\xd9\xf4\xd3\x17\x92\x65\xc7\xce\x3f\x67\x9c\xdb\x7d\x1a\x9b\x92\xf9\x23\x29\x8a\xff\x32\x7f\xaa\x7e\xe1\x1a\xab\x06\x1d\x0a\x28\xd6\xd5\xf6\x58\xb5\x60\xf6\x9f\x1a\x74\x35\x05\xc3\x07\x94\xa3\x01\xd3\xe2\xdf\xaa\x5f\x7f\xab\xfe\xfd\xdb\x97\xea\xef\xbf\xfe\xf3\xcb\xe6\x05\x3c\xfd\x17\x25\x10\xbb\xb7\x4a\xb6\x60\x36\x10\xb5\x65\xa1\xff\x81\x12\xbb\xcd\xfe\xaf\x61\x43\xfc\xe9\xf0\x97\x97\x3d\xb9\xfa\xad\xfa\xc5\xc6\xa0\x28\x9f\xd9\xe2\x4b\x87\x0a\x35\x28\xbc\xbd\x54\x95\x11\xcc\x1f\x7c\xa1\x0e\x83\x42\xe7\xdf\x2a\x17\xad\x7d\xa9\x2a\x07\x1d\xbe\x55\xe1\x18\x14\xbb\x37\x10\x7e\x0d\x82\x2f\x12\x2d\x86\xb7\x97\xd7\x0a\x3c\xfd\x43\x38\xfa\x90\x98\xbc\x56\x3f\xfd\xf4\x52\x55\x82\x81\xa3\x18\x2c\x34\xc3\x9d\x67\x87\x4e\x83\x82\xc6\x80\xe1\xa5\xaa\x0e\x28\xdb\xb2\xdc\xa0\xe6\xbf\x96\x82\x3e\xca\xd0\xed\xa8\xe9\xc0\x87\xfc\x8a\xae\xf6\x4c\x4e\xcb\xdb\x01\x87\x47\x4b\x1d\xa9\x80\x6b\xb0\x7f\x4f\x9a\x04\x0f\x66\x78\xe5\xba\x3c\xf9\x64\xc0\xa0\xe8\xf4\xc0\x36\x76\x68\x2c\x50\x77\x7d\xa9\x50\xb9\x1e\x1f\x14\x3b\x6f\x41\xcb\x8a\xa0\xb7\x64\xb2\x29\x0d\x3b\x15\xb6\x16\x65\x58\xea\xb5\xf8\x1a\x59\xa1\x27\x05\x94\x03\x19\x04\x63\x38\x0e\x52\x17\xda\x3d\x2b\xa5\x87\x77\x50\xd3\x3e\x66\xaf\x24\xed\x27\xcb\xcd\x25\xc7\x8b\xcf\xa1\xee\x28\x24\x67\x12\x6c\x28\xa8\x4c\x9d\xe8\x92\x71\x17\x15\x94\x5c\xf3\x8e\xdb\x96\x79\xdf\x9f\x4b\xec\x3f\xea\x95\x39\x80\xa5\xfa\xee\x9e\x15\x3a\x82\x27\xfc\xa6\xe8\x92\x9c\xe1\xa6\x70\x26\x06\xe5\x6e\x20\xd6\xb8\x23\x47\xcf\x81\x3e\x64\x13\xf0\xf4\xdc\x09\x16\x06\x28\x1b\xf6\xe8\x42\x4b\x3b\xbd\x05\x24\xf8\x35\x62\xd0\xd1\x79\x56\xa1\xe5\x5b\x74\x79\xc3\x8a\xeb\x0a\x1e\x28\x8c\xc7\x59\x03\x76\xec\x02\x16\x57\xad\xd1\x5b\x3e\x76\xe3\x85\x2b\xce\x3f\xae\xa7\x0b\x8f\xbb\x68\x0b\x61\xa5\x78\x0b\x76\x38\x09\xd1\xfb\xd6\x0f\x44\x7a\xf4\x52\x09\x2f\x70\x36\x7d\x54\x5e\x2b\x7a\xd4\x16\x9d\x96\xb0\x73\xd3\x33\x95\xf7\xe8\xd2\x79\xe2\xfb\x19\x50\x0e\xfe\x78\x9d\xf1\x79\x2a\xb9\xe4\x1b\xd0\xee\x42\xdc\xfe\x8e\x46\xc1\x18\x0c\xe1\x84\x31\x5b\xcc\x39\x63\xb6\x76\xfd\xa3\x0f\x0b\xf6\x90\x6d\x85\x2d\x6e\xc9\xd5\xe4\x9a\x70\x4e\x2f\xde\x7b\xbe\x63\x58\x7a\x38\x59\x7d\x44\xac\xe1\xf5\x8a\xc9\x7e\x88\x59\x26\xda\x0a\x06\x15\x32\xcf\x04\xc7\xa8\x1c\x0c\x58\x72\xcd\x25\x52\x16\x89\x9d\x82\xf5\x5c\x0f\x3b\x9f\x71\xf6\x01\xea\xb1\x83\x9f\x23\xbe\x56\x1d\x98\x96\x1c\x3e\x2d\xc8\x36\x93\x2f\x51\x85\xdd\xef\xbc\xed\xb1\xca\xc3\x1a\xee\x91\x6c\xbd\xa0\x60\xde\x73\x0a\x7a\x85\xf0\xbd\x01\x1f\x8d\x7a\x06\x45\x69\x97\x82\x12\xde\x49\xd2\x93\x4d\xd4\xb8\xec\x8c\x39\xa7\xad\xd4\xc2\x58\x8e\xf5\xc6\xa1\xbe\xb3\xec\x17\xfd\x83\x63\xed\x85\x0e\xa0\x48\xfe\xa9\xe4\x91\x59\x19\xc1\x3a\x85\x61\xb0\x4b\xc0\xe3\xc6\xf0\xa4\xb6\x59\xe6\xe5\x3a\xa1\xaf\x27\xc2\x95\x64\x71\xe6\x37\xe3\xa5\x61\x9f\xda\x10\x96\x19\xf1\xd0\xf7\x19\x61\xc0\x0e\x63\xe8\xac\x5d\x28\x4f\x3b\x04\x8d\x82\xcd\x58\x10\x53\x07\x0d\xa6\x7a\x02\x9d\x7a\xb6\x64\x68\xba\x50\x1e\xdd\x4e\x20\xa8\x44\x93\xbe\x1d\x68\x29\x2e\x0d\x6c\xcb\x89\x9e\x17\xee\x9c\xb4\x29\x8f\x45\xe2\x36\x96\xbb\xe7\x85\x53\x18\x1d\x5f\xbe\x0d\xc0\xc1\xb4\x58\xc7\xf5\x17\xbf\x68\xbe\x74\xc8\xfd\x2e\x63\xa9\xe6\x77\x67\x19\xea\x99\xdd\x52\x01\x2b\x0e\xac\xe5\xc6\x92\xdb\xcf\xd6\x2e\x08\x8e\xcb\x25\x39\xb7\xbe\xb7\xb1\xa1\x39\xe9\x6b\x24\xb3\x0f\x0a\xa2\x33\xf2\x11\x3a\x1b\xa0\xf3\xf7\x33\xda\x7d\xad\x53\x49\xe8\x2d\xb8\xac\x7a\x36\xf6\x82\x0d\x3c\xd7\xe5\xe0\x0c\x3b\x87\x46\xe9\x40\x7a\x34\x2d\x9a\xfd\x6a\x29\x58\x6a\x72\xf7\x0b\x12\x8b\x70\xbf\xcb\xbc\x03\x30\xf6\xda\x37\xb9\x8f\xed\xa6\x5d\x5f\xe3\xf7\x4d\xea\x6d\x88\xa1\x87\x5d\xc1\x7a\x67\xf9\xbd\x9c\xd5\xe6\xd4\x4a\xdc\x42\x4a\xbb\xd3\x7d\xe8\x60\xb8\x27\xc4\x42\x7a\xb4\x78\x40\xfb\x47\xf4\x6a\x2d\xda\x6e\xc1\x4b\xd2\x16\xd3\x82\xa8\xa0\xe7\x40\xca\x32\xdc\xd4\x72\x85\x6f\x6d\x58\x21\x4e\x0e\x3a\x0b\xf2\x4c\x03\x53\x7e\x54\x41\xe8\xbe\x3b\x60\x46\x19\xb1\x97\xb2\xeb\x07\xf9\x2a\x34\x13\x8d\xca\xdb\xc3\x75\x6d\xfe\xa8\x74\xbf\xc7\x47\x2f\xff\xb4\x30\xc9\x0c\xbc\x44\xb7\x3a\xe8\x96\x74\xf0\x28\x78\xed\x82\xa0\x61\x59\x5b\x0d\xa5\xfb\x62\x1c\x6d\x8c\x33\xbb\xab\x00\x25\xae\xbd\x82\x2a\x98\x36\xf5\x84\xaf\x4f\x8f\x18\x12\x28\x1f\xdc\x86\xe5\x4a\x15\x8d\x59\xff\x1d\x09\xbe\x83\xb5\x61\x42\x23\x3f\x7d\xfb\xca\xab\x63\x5f\xa9\x8c\x17\x4c\x5b\x76\xb5\x08\x56\xdb\x31\x96\x8f\xf4\xf9\x0b\x9e\xb2\xd0\x98\x3d\xe6\x6b\xeb\xe5\x9c\x85\xa7\x65\x67\x54\x20\x87\x22\xd1\x29\x75\x38\x75\xce\xd3\xb0\x63\x4a\xdd\xc7\x2d\x5a\xd4\x29\x69\x86\xeb\x99\xed\x15\xf2\x5a\x95\x30\x75\x60\xb7\x33\xc2\x64\x56\xc9\x77\x7d\xfa\x92\x33\x35\x0b\x13\xab\xa0\x2c\x39\x50\x8c\x35\x5b\xa1\x94\x7a\x6f\xe4\xb0\x56\x37\x76\x39\x66\xbb\x66\x63\x58\x90\xc3\xc6\x70\x77\xa5\x4e\xb5\x28\xda\x81\x4b\xa1\x66\x6a\xf5\x29\x7d\x34\x41\xe1\x39\xa6\x89\xed\x29\x63\x74\xa8\x2d\xc6\x70\x41\xc8\xe3\x87\x5e\xbd\x7e\x3e\x37\xe3\xa1\x2d\x38\xce\x7b\xd6\x46\xa8\x49\xd3\xf1\x48\x84\xea\x6f\xab\x70\xd4\x41\xad\xe1\x43\xbf\xa7\xe7\x44\x48\x96\xbe\x75\xd8\x25\x8e\x1a\x0b\x63\x6d\x7d\xb5\xd2\x9e\x94\xea\x6b\x04\xe1\x1a\x6f\x8a\x30\x5c\xc1\x51\x84\x15\x00\x0f\x1a\xf9\x5a\xb3\x73\xd6\xab\xcd\x3a\x1c\x13\xa8\x16\x1a\x7b\xa5\xb3\x28\x71\xea\x77\x4c\xa0\xe0\xc0\x87\x96\xf5\x7c\xca\x7f\x6a\x85\x50\x4d\x7d\xd9\x03\xf5\xf2\xcd\x3b\xa1\x72\x26\x67\x9c\x52\x04\x3a\x6b\xde\x12\xe9\xb4\x6d\x76\x29\xd2\xd2\xac\xb3\x29\xa4\x2b\x57\x79\xf0\xf9\x59\x53\x35\x1a\xf2\x0c\x72\xa4\xdf\xc0\x2d\x97\xc9\xc0\x2c\x74\x3c\x79\xac\xe1\x6e\xa4\x30\xa0\x60\xb9\x29\xb4\xe9\xf9\x15\x61\x66\x8d\x2a\xb9\xa0\x60\x73\xea\x29\x1a\xd9\x6e\x7a\xb0\x03\xa6\x61\x57\xd3\xc9\x4d\x06\x72\x93\xc5\x9b\x4b\xd7\x6b\x1a\xb7\xc1\x08\xf9\x27\x22\xa3\x07\xb3\x4f\xc6\xda\x3c\xa6\x77\xd9\xde\x81\xa3\xdd\xc2\xd0\xe0\x12\x0a\x65\xc7\xe9\xe4\xcc\x52\x7e\x9f\xec\xf4\xc2\x3b\x5a\xdd\x31\x66\x1f\x3f\x5e\xed\x0a\x6b\x0a\x12\xb3\xe5\xb6\xb1\x6e\x86\x22\x21\xa5\x36\x34\x31\xf5\x1f\xcf\x85\x1f\xdf\x4f\x4d\x37\xcb\x83\xf7\xb2\xb3\x4c\x2d\x56\xc3\xe5\x36\x65\x11\x2a\xef\xba\x3e\xf1\xb9\x71\x68\x1f\x61\xbc\x52\xf8\xfc\x6b\xe4\xe2\x08\xc9\x5b\xc2\x7a\x98\x98\x9f\xff\x8e\xf9\xb0\x13\x3e\x82\xf5\x51\x90\x3b\xaa\xdd\xfc\x39\xfc\x47\xfe\x48\x70\x4f\xbe\x94\xfc\x17\xc7\xf4\x51\xd7\xf2\x2f\xf3\x9e\x8f\x37\x6c\x6b\xb0\xfa\xd4\x73\xaf\xf4\x18\x66\x0b\x4f\x25\xfe\x21\x40\x6c\xc8\xf5\x03\xb4\x25\xfb\x81\x6b\x10\xac\x65\xf3\x4c\x09\x3b\xa2\x7e\x18\xec\xf4\x6d\xce\xff\xdf\x52\x1a\x0d\x2a\x40\xab\x07\x3c\x43\xe1\xb1\x29\xa9\xf6\xa6\xbd\xcb\x3f\x2c\x0c\x75\xca\xa4\xe2\x3b\x5b\x29\x85\xc9\x95\xa5\xb5\x22\x2e\x48\x76\x5e\x66\x05\x3a\xf5\x37\xa9\xb4\xea\x3f\x37\xe0\xc1\x90\xd2\xbc\x29\xb9\xd4\xe3\xd4\x82\xaf\x14\x77\xf8\x0f\x8e\xa5\x5f\x3f\x84\xf7\x28\xc3\xe6\x5c\x56\xb8\xa1\xfe\xb8\x4f\x5d\x2b\x57\x74\xb8\xf4\x93\xcc\x24\x45\x97\x0f\x56\x82\x45\xdf\x08\xd4\xb8\xe9\x8b\xbb\x25\xd8\xb2\xfb\xa9\x90\x11\xc3\xe2\xff\x38\x4c\x8a\x2f\xca\xf5\xfa\xe8\x0d\xe9\xe3\xef\x87\x9b\xb6\x14\xc0\x63\x07\xde\x97\x68\xbf\x34\x94\x7b\x6f\x51\x10\xb6\x1c\x75\x61\x7a\x44\xfe\x34\x3e\xe0\x03\x8a\xed\x31\x72\xfc\x20\x2f\x98\xea\xd8\x0f\xc5\x2b\xc7\xee\x73\x81\xf8\xcf\xe7\x7f\x95\xdd\x3f\xff\xf9\xe7\xcb\xcf\xff\x1f\x00\x00\xff\xff\xc3\x85\xb5\xcb\x69\x26\x00\x00")
 
 func clusterroleYamlBytes() ([]byte, error) {
@@ -98,6 +120,26 @@ func clusterroleYaml() (*asset, error) {
 	return a, nil
 }
 
+var _clusterrolebindingDiagnosticsYaml = []byte("\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x84\xcd\xb1\xaa\xc2\x50\x0c\x80\xe1\xfd\x3c\x45\x5e\xa0\xbd\xdc\x4d\xce\xa8\x83\x7b\x41\xf7\xb4\x27\xd6\xd8\x36\x29\x49\x8e\xa0\x4f\x2f\x82\x9b\x60\xf7\xff\xe7\xc3\x95\xcf\x64\xce\x2a\x19\xac\xc7\xa1\xc5\x1a\x57\x35\x7e\x62\xb0\x4a\x3b\xed\xbc\x65\xfd\xbb\xff\xa7\x89\xa5\x64\x38\xcc\xd5\x83\xac\xd3\x99\xf6\x2c\x85\x65\x4c\x0b\x05\x16\x0c\xcc\x09\x40\x70\xa1\x0c\xfe\xf0\xa0\x25\xa3\x69\x53\x18\x47\x51\x0f\x1e\x3c\x99\xce\xd4\xd1\xe5\xdd\xe1\xca\x47\xd3\xba\xfe\x30\x13\xc0\x17\xb9\x25\x78\xed\x6f\x34\x84\xe7\xd4\x7c\xe6\x93\x93\x6d\x5d\xaf\x00\x00\x00\xff\xff\xe6\x2c\x81\x7a\x03\x01\x00\x00")
+
+func clusterrolebindingDiagnosticsYamlBytes() ([]byte, error) {
+	return bindataRead(
+		_clusterrolebindingDiagnosticsYaml,
+		"clusterrolebinding-diagnostics.yaml",
+	)
+}
+
+func clusterrolebindingDiagnosticsYaml() (*asset, error) {
+	bytes, err := clusterrolebindingDiagnosticsYamlBytes()
+	if err != nil {
+		return nil, err
+	}
+
+	info := bindataFileInfo{name: "clusterrolebinding-diagnostics.yaml", size: 0, mode: os.FileMode(0), modTime: time.Unix(0, 0)}
+	a := &asset{bytes: bytes, info: info}
+	return a, nil
+}
+
 var _clusterrolebindingYaml = []byte("\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\xff\x7c\xcd\xbd\x0a\xc2\x40\x0c\x07\xf0\xfd\x9e\x22\x2f\xd0\x8a\x9b\xdc\xa8\x83\x7b\x41\xf7\xb4\x8d\x1a\xdb\x26\x47\x92\x13\xf4\xe9\x45\x70\x93\x3a\xff\x3f\x7e\x58\xf8\x4c\xe6\xac\x92\xc1\x7a\x1c\x5a\xac\x71\x53\xe3\x17\x06\xab\xb4\xd3\xce\x5b\xd6\xcd\x63\x9b\x26\x96\x31\xc3\x61\xae\x1e\x64\x9d\xce\xb4\x67\x19\x59\xae\x69\xa1\xc0\x11\x03\x73\x02\x10\x5c\x28\x83\x3f\x3d\x68\xc9\x68\xda\xb8\x51\x32\x9d\xa9\xa3\xcb\x27\xc7\xc2\x47\xd3\x5a\xfe\x58\x09\xe0\x87\x5a\x7b\xf6\xda\xdf\x69\x08\xcf\xa9\xf9\x8e\x4e\x4e\xb6\xd6\x7e\x07\x00\x00\xff\xff\xc4\xb6\x1b\x05\xeb\x00\x00\x00")
 
 func clusterrolebindingYamlBytes() ([]byte, error) {
@@ -170,8 +212,10 @@ func AssetNames() []string {
 
 // _bindata is a table, holding each asset generator, mapped to its name.
 var _bindata = map[string]func() (*asset, error){
-	"clusterrole.yaml":        clusterroleYaml,
-	"clusterrolebinding.yaml": clusterrolebindingYaml,
+	"clusterrole-diagnostics.yaml":        clusterroleDiagnosticsYaml,
+	"clusterrole.yaml":                    clusterroleYaml,
+	"clusterrolebinding-diagnostics.yaml": clusterrolebindingDiagnosticsYaml,
+	"clusterrolebinding.yaml":             clusterrolebindingYaml,
 }
 
 // AssetDir returns the file names below a certain
@@ -217,8 +261,10 @@ type bintree struct {
 }
 
 var _bintree = &bintree{nil, map[string]*bintree{
-	"clusterrole.yaml":        {clusterroleYaml, map[string]*bintree{}},
-	"clusterrolebinding.yaml": {clusterrolebindingYaml, map[string]*bintree{}},
+	"clusterrole-diagnostics.yaml":        {clusterroleDiagnosticsYaml, map[string]*bintree{}},
+	"clusterrole.yaml":                    {clusterroleYaml, map[string]*bintree{}},
+	"clusterrolebinding-diagnostics.yaml": {clusterrolebindingDiagnosticsYaml, map[string]*bintree{}},
+	"clusterrolebinding.yaml":             {clusterrolebindingYaml, map[string]*bintree{}},
 }}
 
 // RestoreAsset restores an asset under the given directory
@@ -231,7 +277,7 @@ func RestoreAsset(dir, name string) error {
 	if err != nil {
 		return err
 	}
-	err = os.MkdirAll(_filePath(dir, filepath.Dir(name)), os.FileMode(0o755))
+	err = os.MkdirAll(_filePath(dir, filepath.Dir(name)), os.FileMode(0755))
 	if err != nil {
 		return err
 	}
diff --git a/pkg/operator/controllers/rbac/staticresources/clusterrole-diagnostics.yaml b/pkg/operator/controllers/rbac/staticresources/clusterrole-diagnostics.yaml
new file mode 100644
index 00000000000..44f5993ba58
--- /dev/null
+++ b/pkg/operator/controllers/rbac/staticresources/clusterrole-diagnostics.yaml
@@ -0,0 +1,183 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: system:aro-diagnostics
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - pods
+  - pods/log
+  - nodes
+  - services
+  - endpoints
+  - configmaps
+  - events
+  - namespaces
+  - persistentvolumeclaims
+  - replicationcontrollers
+  - resourcequotas
+  - serviceaccounts
+  - limitranges
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - apps
+  resources:
+  - deployments
+  - daemonsets
+  - statefulsets
+  - replicasets
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - batch
+  resources:
+  - jobs
+  - cronjobs
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - networking.k8s.io
+  resources:
+  - networkpolicies
+  - ingresses
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - policy
+  resources:
+  - poddisruptionbudgets
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - storage.k8s.io
+  resources:
+  - storageclasses
+  - persistentvolumes
+  - volumeattachments
+  - csinodes
+  - csidrivers
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - autoscaling
+  resources:
+  - horizontalpodautoscalers
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - rbac.authorization.k8s.io
+  resources:
+  - roles
+  - rolebindings
+  - clusterroles
+  - clusterrolebindings
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - apps.openshift.io
+  resources:
+  - deploymentconfigs
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - route.openshift.io
+  resources:
+  - routes
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - machine.openshift.io
+  resources:
+  - machines
+  - machinesets
+  - machinehealthchecks
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - config.openshift.io
+  resources:
+  - clusterversions
+  - clusteroperators
+  - infrastructures
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - machineconfiguration.openshift.io
+  resources:
+  - machineconfigs
+  - machineconfigpools
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - monitoring.coreos.com
+  resources:
+  - prometheusrules
+  - servicemonitors
+  - alertmanagers
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - operator.openshift.io
+  resources:
+  - '*'
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - operators.coreos.com
+  resources:
+  - subscriptions
+  - clusterserviceversions
+  - installplans
+  - operatorgroups
+  verbs:
+  - get
+  - list
+  - watch
+- apiGroups:
+  - metrics.k8s.io
+  resources:
+  - nodes
+  - pods
+  verbs:
+  - get
+  - list
+- nonResourceURLs:
+  - /healthz
+  - /livez
+  - /readyz
+  - /version
+  - /metrics
+  verbs:
+  - get
diff --git a/pkg/operator/controllers/rbac/staticresources/clusterrolebinding-diagnostics.yaml b/pkg/operator/controllers/rbac/staticresources/clusterrolebinding-diagnostics.yaml
new file mode 100644
index 00000000000..cbe674395bb
--- /dev/null
+++ b/pkg/operator/controllers/rbac/staticresources/clusterrolebinding-diagnostics.yaml
@@ -0,0 +1,11 @@
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: system:aro-diagnostics
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: system:aro-diagnostics
+subjects:
+- kind: User
+  name: system:aro-diagnostics
diff --git a/pkg/util/holmes/config.go b/pkg/util/holmes/config.go
new file mode 100644
index 00000000000..75fd6df73be
--- /dev/null
+++ b/pkg/util/holmes/config.go
@@ -0,0 +1,142 @@
+package holmes
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"regexp"
+	"strconv"
+
+	"github.com/Azure/ARO-RP/pkg/util/azureclient/azuresdk/azsecrets"
+)
+
+// modelPattern validates the model name contains only safe characters
+// (alphanumeric, slashes, dots, colons, hyphens, underscores).
+var modelPattern = regexp.MustCompile(`^[a-zA-Z0-9/.:_-]+$`)
+
+const (
+	// Key Vault secret names for Holmes configuration.
+	holmesAzureAPIKeySecretName  = "holmes-azure-api-key"
+	holmesAzureAPIBaseSecretName = "holmes-azure-api-base"
+)
+
+// HolmesConfig holds configuration for HolmesGPT investigation pods.
+type HolmesConfig struct {
+	Image                       string
+	AzureAPIKey                 string
+	AzureAPIBase                string
+	AzureAPIVersion             string
+	Model                       string
+	DefaultTimeout              int
+	MaxConcurrentInvestigations int
+}
+
+// NewHolmesConfigFromEnv loads all config from environment variables.
+// Used in local development mode (RP_MODE=development).
+func NewHolmesConfigFromEnv() (*HolmesConfig, error) {
+	c, err := newHolmesConfigBase()
+	if err != nil {
+		return nil, err
+	}
+	c.AzureAPIKey = os.Getenv("HOLMES_AZURE_API_KEY")
+	c.AzureAPIBase = os.Getenv("HOLMES_AZURE_API_BASE")
+	if err := c.Validate(); err != nil {
+		return nil, err
+	}
+	return c, nil
+}
+
+// NewHolmesConfig loads non-secret config from env vars and secrets from Key Vault.
+// Used in production mode.
+func NewHolmesConfig(ctx context.Context, serviceKeyvault azsecrets.Client) (*HolmesConfig, error) {
+	apiKeyResp, err := serviceKeyvault.GetSecret(ctx, holmesAzureAPIKeySecretName, "", nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get %s from keyvault: %w", holmesAzureAPIKeySecretName, err)
+	}
+	if apiKeyResp.Value == nil {
+		return nil, fmt.Errorf("keyvault secret %s has no value", holmesAzureAPIKeySecretName)
+	}
+
+	apiBaseResp, err := serviceKeyvault.GetSecret(ctx, holmesAzureAPIBaseSecretName, "", nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get %s from keyvault: %w", holmesAzureAPIBaseSecretName, err)
+	}
+	if apiBaseResp.Value == nil {
+		return nil, fmt.Errorf("keyvault secret %s has no value", holmesAzureAPIBaseSecretName)
+	}
+
+	c, err := newHolmesConfigBase()
+	if err != nil {
+		return nil, err
+	}
+	c.AzureAPIKey = *apiKeyResp.Value
+	c.AzureAPIBase = *apiBaseResp.Value
+	if err := c.Validate(); err != nil {
+		return nil, err
+	}
+	return c, nil
+}
+
+// newHolmesConfigBase loads the non-secret configuration from environment variables.
+func newHolmesConfigBase() (*HolmesConfig, error) {
+	defaultTimeout, err := envOrDefaultInt("HOLMES_DEFAULT_TIMEOUT", 600)
+	if err != nil {
+		return nil, err
+	}
+	maxConcurrent, err := envOrDefaultInt("HOLMES_MAX_CONCURRENT", 20)
+	if err != nil {
+		return nil, err
+	}
+	return &HolmesConfig{
+		Image:                       envOrDefault("HOLMES_IMAGE", "quay.io/haoran/holmesgpt:latest"),
+		AzureAPIVersion:             envOrDefault("HOLMES_AZURE_API_VERSION", "2025-04-01-preview"),
+		Model:                       envOrDefault("HOLMES_MODEL", "azure/gpt-5.2"),
+		DefaultTimeout:              defaultTimeout,
+		MaxConcurrentInvestigations: maxConcurrent,
+	}, nil
+}
+
+// Validate checks that required configuration values are set.
+func (c *HolmesConfig) Validate() error {
+	if c.AzureAPIKey == "" {
+		return fmt.Errorf("holmes Azure API key is required")
+	}
+	if c.AzureAPIBase == "" {
+		return fmt.Errorf("holmes Azure API base is required")
+	}
+	if c.Image == "" {
+		return fmt.Errorf("holmes image is required")
+	}
+	if !modelPattern.MatchString(c.Model) {
+		return fmt.Errorf("holmes model name contains invalid characters")
+	}
+	if c.DefaultTimeout <= 0 {
+		return fmt.Errorf("holmes default timeout must be greater than 0")
+	}
+	if c.MaxConcurrentInvestigations <= 0 {
+		return fmt.Errorf("holmes max concurrent investigations must be greater than 0")
+	}
+	return nil
+}
+
+func envOrDefault(key, defaultValue string) string {
+	if v := os.Getenv(key); v != "" {
+		return v
+	}
+	return defaultValue
+}
+
+func envOrDefaultInt(key string, defaultValue int) (int, error) {
+	v := os.Getenv(key)
+	if v == "" {
+		return defaultValue, nil
+	}
+	i, err := strconv.Atoi(v)
+	if err != nil {
+		return 0, fmt.Errorf("invalid integer value for %s: %w", key, err)
+	}
+	return i, nil
+}
diff --git a/pkg/util/holmes/config_test.go b/pkg/util/holmes/config_test.go
new file mode 100644
index 00000000000..79569ac5860
--- /dev/null
+++ b/pkg/util/holmes/config_test.go
@@ -0,0 +1,166 @@
+package holmes
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"context"
+	"fmt"
+	"testing"
+
+	"github.com/stretchr/testify/require"
+	"go.uber.org/mock/gomock"
+
+	"github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/azsecrets"
+
+	mock_azsecrets "github.com/Azure/ARO-RP/pkg/util/mocks/azureclient/azuresdk/azsecrets"
+)
+
+func TestNewHolmesConfigFromEnv(t *testing.T) {
+	tests := []struct {
+		name    string
+		envVars map[string]string
+		wantErr bool
+	}{
+		{
+			name: "valid config with all required env vars",
+			envVars: map[string]string{
+				"HOLMES_AZURE_API_KEY":  "test-key",
+				"HOLMES_AZURE_API_BASE": "https://test.openai.azure.com",
+			},
+		},
+		{
+			name: "missing API key returns error",
+			envVars: map[string]string{
+				"HOLMES_AZURE_API_BASE": "https://test.openai.azure.com",
+			},
+			wantErr: true,
+		},
+		{
+			name: "missing API base returns error",
+			envVars: map[string]string{
+				"HOLMES_AZURE_API_KEY": "test-key",
+			},
+			wantErr: true,
+		},
+		{
+			name: "custom values override defaults",
+			envVars: map[string]string{
+				"HOLMES_AZURE_API_KEY":     "custom-key",
+				"HOLMES_AZURE_API_BASE":    "https://custom.openai.azure.com",
+				"HOLMES_IMAGE":             "custom-image:v1",
+				"HOLMES_MODEL":             "azure/gpt-4o",
+				"HOLMES_DEFAULT_TIMEOUT":   "300",
+				"HOLMES_MAX_CONCURRENT":    "5",
+				"HOLMES_AZURE_API_VERSION": "2024-01-01",
+			},
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			// Clear all Holmes env vars, then set test values.
+			for _, key := range []string{
+				"HOLMES_AZURE_API_KEY", "HOLMES_AZURE_API_BASE", "HOLMES_IMAGE",
+				"HOLMES_MODEL", "HOLMES_DEFAULT_TIMEOUT", "HOLMES_MAX_CONCURRENT",
+				"HOLMES_AZURE_API_VERSION",
+			} {
+				t.Setenv(key, "")
+			}
+			for k, v := range tt.envVars {
+				t.Setenv(k, v)
+			}
+
+			cfg, err := NewHolmesConfigFromEnv()
+			if tt.wantErr {
+				require.Error(t, err)
+				return
+			}
+			require.NoError(t, err)
+			require.Equal(t, tt.envVars["HOLMES_AZURE_API_KEY"], cfg.AzureAPIKey)
+			require.Equal(t, tt.envVars["HOLMES_AZURE_API_BASE"], cfg.AzureAPIBase)
+
+			if tt.envVars["HOLMES_IMAGE"] != "" {
+				require.Equal(t, tt.envVars["HOLMES_IMAGE"], cfg.Image)
+			}
+			if tt.envVars["HOLMES_MODEL"] != "" {
+				require.Equal(t, tt.envVars["HOLMES_MODEL"], cfg.Model)
+			}
+			if tt.envVars["HOLMES_DEFAULT_TIMEOUT"] != "" {
+				require.Equal(t, 300, cfg.DefaultTimeout)
+			}
+			if tt.envVars["HOLMES_MAX_CONCURRENT"] != "" {
+				require.Equal(t, 5, cfg.MaxConcurrentInvestigations)
+			}
+		})
+	}
+}
+
+func TestNewHolmesConfig(t *testing.T) {
+	ctx := context.Background()
+
+	apiKey := "keyvault-api-key"
+	apiBase := "https://keyvault.openai.azure.com"
+
+	tests := []struct {
+		name    string
+		mocks   func(*mock_azsecrets.MockClient)
+		wantErr bool
+	}{
+		{
+			name: "reads secrets from keyvault",
+			mocks: func(m *mock_azsecrets.MockClient) {
+				m.EXPECT().GetSecret(ctx, holmesAzureAPIKeySecretName, "", nil).
+					Return(azsecrets.GetSecretResponse{
+						Secret: azsecrets.Secret{Value: &apiKey},
+					}, nil)
+				m.EXPECT().GetSecret(ctx, holmesAzureAPIBaseSecretName, "", nil).
+					Return(azsecrets.GetSecretResponse{
+						Secret: azsecrets.Secret{Value: &apiBase},
+					}, nil)
+			},
+		},
+		{
+			name: "API key not found in keyvault returns error",
+			mocks: func(m *mock_azsecrets.MockClient) {
+				m.EXPECT().GetSecret(ctx, holmesAzureAPIKeySecretName, "", nil).
+					Return(azsecrets.GetSecretResponse{}, fmt.Errorf("secret not found"))
+			},
+			wantErr: true,
+		},
+		{
+			name: "API base not found in keyvault returns error",
+			mocks: func(m *mock_azsecrets.MockClient) {
+				m.EXPECT().GetSecret(ctx, holmesAzureAPIKeySecretName, "", nil).
+					Return(azsecrets.GetSecretResponse{
+						Secret: azsecrets.Secret{Value: &apiKey},
+					}, nil)
+				m.EXPECT().GetSecret(ctx, holmesAzureAPIBaseSecretName, "", nil).
+					Return(azsecrets.GetSecretResponse{}, fmt.Errorf("secret not found"))
+			},
+			wantErr: true,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			controller := gomock.NewController(t)
+			defer controller.Finish()
+
+			mockKV := mock_azsecrets.NewMockClient(controller)
+			tt.mocks(mockKV)
+
+			cfg, err := NewHolmesConfig(ctx, mockKV)
+			if tt.wantErr {
+				require.Error(t, err)
+				return
+			}
+			require.NoError(t, err)
+			require.Equal(t, apiKey, cfg.AzureAPIKey)
+			require.Equal(t, apiBase, cfg.AzureAPIBase)
+			// Non-secret values should still come from env/defaults
+			require.NotEmpty(t, cfg.Image)
+			require.NotEmpty(t, cfg.Model)
+		})
+	}
+}
diff --git a/pkg/util/holmes/kubeconfig.go b/pkg/util/holmes/kubeconfig.go
new file mode 100644
index 00000000000..1041a053493
--- /dev/null
+++ b/pkg/util/holmes/kubeconfig.go
@@ -0,0 +1,41 @@
+package holmes
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"fmt"
+	"strings"
+
+	clientcmdv1 "k8s.io/client-go/tools/clientcmd/api/v1"
+
+	"sigs.k8s.io/yaml"
+)
+
+// MakeExternalKubeconfig takes an internal kubeconfig (api-int.*) and converts
+// it to use the external API endpoint (api.*) with insecure-skip-tls-verify.
+// This is needed because the Hive AKS cluster cannot resolve api-int.* DNS
+// names (Azure Private DNS is only linked to the cluster's VNet).
+func MakeExternalKubeconfig(internalKubeconfig []byte) ([]byte, error) {
+	var cfg clientcmdv1.Config
+	err := yaml.Unmarshal(internalKubeconfig, &cfg)
+	if err != nil {
+		return nil, fmt.Errorf("failed to unmarshal kubeconfig: %w", err)
+	}
+
+	for i := range cfg.Clusters {
+		originalServer := cfg.Clusters[i].Cluster.Server
+		rewrittenServer := strings.Replace(originalServer, "https://api-int.", "https://api.", 1)
+		cfg.Clusters[i].Cluster.Server = rewrittenServer
+
+		if rewrittenServer != originalServer {
+			// The self-signed CA does not cover the external endpoint's cert,
+			// so skip TLS verification. The client cert is still used for
+			// authentication (mTLS for identity, not for server verification).
+			cfg.Clusters[i].Cluster.InsecureSkipTLSVerify = true
+			cfg.Clusters[i].Cluster.CertificateAuthorityData = nil
+		}
+	}
+
+	return yaml.Marshal(cfg)
+}
diff --git a/pkg/util/holmes/kubeconfig_test.go b/pkg/util/holmes/kubeconfig_test.go
new file mode 100644
index 00000000000..e1770063748
--- /dev/null
+++ b/pkg/util/holmes/kubeconfig_test.go
@@ -0,0 +1,98 @@
+package holmes
+
+// Copyright (c) Microsoft Corporation.
+// Licensed under the Apache License 2.0.
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/require"
+
+	clientcmdv1 "k8s.io/client-go/tools/clientcmd/api/v1"
+
+	"sigs.k8s.io/yaml"
+)
+
+func TestMakeExternalKubeconfig(t *testing.T) {
+	internalConfig := &clientcmdv1.Config{
+		Clusters: []clientcmdv1.NamedCluster{
+			{
+				Name: "test-cluster",
+				Cluster: clientcmdv1.Cluster{
+					Server:                   "https://api-int.test.example.com:6443",
+					CertificateAuthorityData: []byte("some-ca-data"),
+				},
+			},
+		},
+		AuthInfos: []clientcmdv1.NamedAuthInfo{
+			{
+				Name: "system:aro-diagnostics",
+				AuthInfo: clientcmdv1.AuthInfo{
+					ClientCertificateData: []byte("cert-data"),
+					ClientKeyData:         []byte("key-data"),
+				},
+			},
+		},
+		Contexts: []clientcmdv1.NamedContext{
+			{
+				Name: "system:aro-diagnostics",
+				Context: clientcmdv1.Context{
+					Cluster:  "test-cluster",
+					AuthInfo: "system:aro-diagnostics",
+				},
+			},
+		},
+		CurrentContext: "system:aro-diagnostics",
+	}
+
+	internalKubeconfig, err := yaml.Marshal(internalConfig)
+	require.NoError(t, err)
+
+	externalKubeconfig, err := MakeExternalKubeconfig(internalKubeconfig)
+	require.NoError(t, err)
+
+	var got clientcmdv1.Config
+	err = yaml.Unmarshal(externalKubeconfig, &got)
+	require.NoError(t, err)
+
+	// Server should be rewritten from api-int.* to api.*
+	require.Equal(t, "https://api.test.example.com:6443", got.Clusters[0].Cluster.Server)
+
+	// CA data should be stripped
+	require.Nil(t, got.Clusters[0].Cluster.CertificateAuthorityData)
+
+	// InsecureSkipTLSVerify should be set
+	require.True(t, got.Clusters[0].Cluster.InsecureSkipTLSVerify)
+
+	// Client credentials should be preserved
+	require.Equal(t, []byte("cert-data"), got.AuthInfos[0].AuthInfo.ClientCertificateData)
+	require.Equal(t, []byte("key-data"), got.AuthInfos[0].AuthInfo.ClientKeyData)
+}
+
+func TestMakeExternalKubeconfigNoRewriteNeeded(t *testing.T) {
+	// If the server already uses api.* (not api-int.*), it should not be changed
+	config := &clientcmdv1.Config{
+		Clusters: []clientcmdv1.NamedCluster{
+			{
+				Name: "test-cluster",
+				Cluster: clientcmdv1.Cluster{
+					Server:                   "https://api.test.example.com:6443",
+					CertificateAuthorityData: []byte("some-ca-data"),
+				},
+			},
+		},
+	}
+
+	kubeconfig, err := yaml.Marshal(config)
+	require.NoError(t, err)
+
+	result, err := MakeExternalKubeconfig(kubeconfig)
+	require.NoError(t, err)
+
+	var got clientcmdv1.Config
+	err = yaml.Unmarshal(result, &got)
+	require.NoError(t, err)
+
+	// Server should remain unchanged
+	require.Equal(t, "https://api.test.example.com:6443", got.Clusters[0].Cluster.Server)
+}
diff --git a/pkg/util/mocks/hive/hive.go b/pkg/util/mocks/hive/hive.go
index 58cd5325359..ceb8189833e 100644
--- a/pkg/util/mocks/hive/hive.go
+++ b/pkg/util/mocks/hive/hive.go
@@ -11,6 +11,7 @@ package mock_hive
 
 import (
 	context "context"
+	io "io"
 	reflect "reflect"
 
 	gomock "go.uber.org/mock/gomock"
@@ -22,6 +23,7 @@ import (
 	v1alpha1 "github.com/openshift/hive/apis/hiveinternal/v1alpha1"
 
 	api "github.com/Azure/ARO-RP/pkg/api"
+	holmes "github.com/Azure/ARO-RP/pkg/util/holmes"
 )
 
 // MockClusterManager is a mock of ClusterManager interface.
@@ -150,6 +152,20 @@ func (mr *MockClusterManagerMockRecorder) Install(ctx, sub, doc, version, custom
 	return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "Install", reflect.TypeOf((*MockClusterManager)(nil).Install), ctx, sub, doc, version, customManifests)
 }
 
+// InvestigateCluster mocks base method.
+func (m *MockClusterManager) InvestigateCluster(ctx context.Context, hiveNamespace string, kubeconfig []byte, holmesConfig *holmes.HolmesConfig, question string, w io.Writer) error {
+	m.ctrl.T.Helper()
+	ret := m.ctrl.Call(m, "InvestigateCluster", ctx, hiveNamespace, kubeconfig, holmesConfig, question, w)
+	ret0, _ := ret[0].(error)
+	return ret0
+}
+
+// InvestigateCluster indicates an expected call of InvestigateCluster.
+func (mr *MockClusterManagerMockRecorder) InvestigateCluster(ctx, hiveNamespace, kubeconfig, holmesConfig, question, w any) *gomock.Call {
+	mr.mock.ctrl.T.Helper()
+	return mr.mock.ctrl.RecordCallWithMethodType(mr.mock, "InvestigateCluster", reflect.TypeOf((*MockClusterManager)(nil).InvestigateCluster), ctx, hiveNamespace, kubeconfig, holmesConfig, question, w)
+}
+
 // IsClusterDeploymentReady mocks base method.
 func (m *MockClusterManager) IsClusterDeploymentReady(ctx context.Context, doc *api.OpenShiftClusterDocument) (bool, error) {
 	m.ctrl.T.Helper()