Add databricks doctor command#4730
Conversation
|
Commit: b2e9d7a
18 interesting tests: 9 SKIP, 6 RECOVERED, 3 flaky
Top 20 slowest tests (at least 2 minutes):
|
shreyas-goenka
left a comment
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant). Shreyas will do a separate, more thorough review pass.
Priority: HIGH — Config resolution diverges from real CLI auth path
MAJOR: resolveConfig diverges from real CLI auth
The resolveConfig function in databricks doctor constructs its own config resolution path instead of going through the standard SDK/CLI authentication flow. This means the doctor command could report "config is fine" while the real CLI fails (or vice versa). If the goal is to diagnose auth issues, it should use the same code path the CLI uses.
MEDIUM: Network check bypasses SDK HTTP client
The connectivity check uses http.DefaultClient directly instead of going through the SDK's configured HTTP client. In enterprise environments with proxies or custom TLS, this will give misleading results — the check might fail even though the SDK would succeed (or vice versa).
Other Observations
- Good idea for a diagnostic command overall
- The step-by-step output format is user-friendly
- Missing test coverage for the core diagnostic logic
4f9375e to
b2e9d7a
Compare
Approval status: pending
|
Adds a new top-level 'databricks doctor' command that runs diagnostic checks against the user's CLI setup and reports a checklist of results (text) or a JSON array. Checks: - CLI version - Updates (queries the GitHub releases API, skipped for dev builds) - Toolchain (git, python3, uv, terraform versions) - Proxy/TLS environment variables (HTTPS_PROXY, NO_PROXY, etc., with credentials masked) - Log file path - Config file readability and profile count - Current profile source - Authentication validity and auth type - Identity via CurrentUser.Me (skipped for account-level profiles) - Network reachability Design follows go-code-structure.md: check functions take context and primitives, Cobra RunE is a thin adapter, rendering is a pure function, and external dependencies (exec, HTTP client) are injected for tests. SPOG / unified-host account profiles are correctly classified via the existing auth.ResolveConfigType helper, so the SDK's ConfigType() returning InvalidConfig for those hosts no longer causes a misclassification. Also fixes a latent bug in libs/env/loader.go: Set was replaced with SetS so that non-string env-backed config attributes (e.g. ints, bools) are parsed correctly. Co-authored-by: Isaac
061f67a to
d598ec4
Compare
- checkIdentity now does a real API call for account-level profiles (a.Workspaces.List), matching what auth describe does. Previously account-level profiles skipped identity entirely, so invalid account PAT/OAuth could report Authentication: OK with no failing check. - maskProxyValue now masks the full userinfo segment whenever present, not only when a password is set. Protects against token-only proxy URLs like http://TOKEN@proxy from leaking the token in diagnostics. - checkToolchain distinguishes 'not found' (exec.ErrNotFound) from non-zero exit (*exec.ExitError) from other errors (permission denied, etc.), so users can tell 'install this' apart from 'this is broken'. Tests updated to cover all three cases.
Per feedback, the command is moved behind the 'experimental' subtree: databricks doctor -> databricks experimental doctor - Source: cmd/doctor/ -> experimental/doctor/cmd/ (matches the aitools convention under experimental/) - Entry point renamed New() -> NewDoctorCmd() - Hidden: true; no GroupID since it's no longer a top-level command - Registration moves from cmd/cmd.go to cmd/experimental/experimental.go - Acceptance test: acceptance/cmd/doctor -> acceptance/cmd/experimental/doctor and its script invokes 'experimental doctor' - Help output no longer lists doctor under Developer Tools - NEXT_CHANGELOG wording updated to flag the command as experimental
Experimental commands don't go into the release notes until they graduate. No CI check requires an entry, so this file matches main now.
- Wrap JSON output in DoctorReport{Results: ...} so we can add fields
later without breaking callers of the array shape.
- Type the status enum (type status string) so misuse is a compile
error, while keeping the string JSON wire contract.
- Make CheckResult.Detail any so future structured details don't
require a breaking change.
- Add omitempty to Name/Status/Message so empty results render as {}.
- Inline runChecks into a composite literal to avoid append resizes.
- Replace if/return chain in checkCurrentProfile with a switch.
Co-authored-by: Isaac
…build # Conflicts: # cmd/experimental/experimental.go
- Reword isAccountLevelConfig godoc to "can target" so callers don't read it as "account-exclusive". - Drop the withCheckTimeout wrapper and inline context.WithTimeout at call sites; the wrapper was not adding readability. - Move config-resolution error handling out of checkAuth and into runChecks, so checkAuth is only responsible for authenticating a resolved config. Co-authored-by: Isaac
… simonfaltum/doctor-rebuild
WithProfiler was removed as dead code on main (PR #4974). Match the rest of the doctor command by injecting the profiler directly into checkConfigFile, the same way exec and http.Client are injected into the other checks.
Why
Users debugging CLI setup issues (auth failures, config problems, network issues, proxy/TLS, missing toolchain) have no single command to diagnose their environment. They must manually run separate commands to check each layer.
Changes
Before: Users had to manually check auth, config, connectivity, and toolchain separately.
Now: A new experimental command
databricks experimental doctorruns all diagnostic checks and reports results as a checklist:git,python3,uv,terraform; missing binaries are flaggednot found, tools that ran and errored are flaggedexited N/error: <msg>HTTPS_PROXY,NO_PROXY,SSL_CERT_FILE,REQUESTS_CA_BUNDLE, etc. are set; userinfo in proxy URLs is replaced with***(including token-only forms likeTOKEN@proxy)DATABRICKS_LOG_FILE)CurrentUser.Me; account-level profiles callWorkspaces.Listso invalid account creds fail instead of being skippedText output uses colored status icons. JSON output (
--output json) returns a structured array. Auth failures are reported as check results, not command errors.Command placement
The command is registered under the
experimentalsubtree, so it is accessed as:It is hidden from top-level help, following the convention used by
experimental aitools. Source lives underexperimental/doctor/cmd/.Example output
Failure mode (no profile, missing credentials):
JSON output (
--output json) emits the same checks as a structured array.Key design decisions
go-code-structure.md, check functions takecontext.Contextand primitives (profile string, fromFlag bool), not*cobra.Command. The CobraRunEis a thin adapter. Rendering is a purerender(w, results, outputType)function. External dependencies (exec for toolchain, HTTP client for updates) are injected, so tests don't shell out or hit the network.Workspaces.Listcall, matching the patternauth describeuses, so invalid account PAT/OAuth tokens fail the identity check rather than being silently skipped.maskProxyValuereplaces the full userinfo segment with***whenever present, covering bothuser:pass@hostand token-onlyTOKEN@hostURLs.exec.ErrNotFoundrenders asnot found(install it),*exec.ExitErrorrenders asexited N(it's broken), other errors render with the message.auth.ResolveConfigTypehelper rather than rolling its own so SPOG profiles are classified correctly (the SDK's ownConfigType()returnsInvalidConfigfor them in v0.127.0+).libs/env/loader.go:Set→SetSso non-string env-backed config attributes (ints, bools) parse correctly.Changelog
No
NEXT_CHANGELOG.mdentry: experimental commands are omitted from release notes until they graduate out ofexperimental. The PR template makes NEXT_CHANGELOG additions opt-in.Open item
experimentaland so no longer collides with the auto-generated command namespace.Test plan
Workspaces.List, fail on 401)acceptance/cmd/experimental/doctor/(toolchain line masked for machine-independent output)make lintfullpassesmake checkspassesmake fmtfullclean