Skip to content

Add SetDefaultHostMetadataResolverFactory#1636

Merged
simonfaltum merged 5 commits intomainfrom
simonfaltum/default-host-metadata-resolver
Apr 20, 2026
Merged

Add SetDefaultHostMetadataResolverFactory#1636
simonfaltum merged 5 commits intomainfrom
simonfaltum/default-host-metadata-resolver

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Apr 17, 2026

Changes

PR #1572 added Config.HostMetadataResolver so callers could override the SDK's /.well-known/databricks-config fetch on a per-Config basis. That covers "I have one Config and I want to wrap it."

The gap: programs that construct many Configs across their command surface (e.g. the Databricks CLI) end up copying the same cfg.HostMetadataResolver = ... assignment at every construction site, in the CLI roughly 10 sites across 7 files plus a guardrail test to catch drift.

This PR adds a package-level default consulted when a Config has no explicit resolver set. Callers set a factory once during startup; every subsequent Config gets the same resolver without per-site wiring. The Config-level field still takes precedence, so PR #1572's contract is unchanged.

API

// config/host_metadata.go
var DefaultHostMetadataResolverFactory func(*Config) HostMetadataResolver

Plain public variable, set once at init. Matches the stdlib pattern for single-default hooks: http.DefaultClient, http.DefaultTransport, log.Default. Callers needing per-Config or dynamic behaviour should use Config.HostMetadataResolver instead.

Resolution order inside Config.EnsureResolved

  1. If Config.HostMetadataResolver is set, use it.
  2. Else, if DefaultHostMetadataResolverFactory is non-nil, invoke it with the resolving Config and use its return value. If it returns nil, fall through.
  3. Else, SDK's default HTTP fetch (unchanged behavior for all existing callers).

How the Databricks CLI will use this

The canonical Go idiom for "library A registers itself with library B" is a blank import that triggers an init() in A. This is how database/sql drivers (_ "github.com/lib/pq"), image codecs (_ "image/png"), and encoding formats register themselves.

After this PR lands and is bumped into the CLI, CLI PR #5011 will collapse from ~10 wired-in hostmetadata.Attach(cfg) calls + a guardrail test down to two small pieces:

repos/cli/libs/hostmetadata/resolver.go — set the caching factory at package init:

func init() {
    config.DefaultHostMetadataResolverFactory = func(cfg *config.Config) config.HostMetadataResolver {
        return NewResolver(cfg.DefaultHostMetadataResolver())
    }
}

repos/cli/cmd/databricks/main.go — one blank import to pull the package in at startup:

import (
    // Registers a disk-cached HostMetadataResolver with the SDK so every
    // Config the CLI constructs reuses the cached /.well-known lookup.
    _ "github.com/databricks/cli/libs/hostmetadata"
)

That's the full integration. Every Config the CLI creates, now and in the future from any new command a developer adds, automatically gets caching. No per-site Attach call to remember, no guardrail test to maintain, no new developer ever has to learn this mechanism exists to benefit from it.

Experimental

Marked experimental to match the existing HostMetadataResolver field. No default behavior change for callers that never set DefaultHostMetadataResolverFactory.

Tests

Three new tests in config/config_test.go, each using a small withDefaultHostMetadataResolverFactory(t, factory) helper that captures and restores the prior value, so tests never clobber each other via the package-level default:

  • Factory is invoked when Config has no resolver; back-fill works end-to-end.

  • Config-level resolver takes precedence (factory not consulted).

  • Factory returning nil falls through to the SDK's HTTP fetch.

  • make fmt test lint clean

  • go test ./config/... -count=1 -race clean

Signed-off-by: simon simon.faltum@databricks.com

## Changes

PR #1572 added `Config.HostMetadataResolver` so callers could override the
SDK's `/.well-known/databricks-config` fetch on a per-Config basis. That
covers the "I have one Config and I want to wrap it" case, but a program
that constructs many Configs (e.g. the Databricks CLI across its command
surface) ends up calling the same assignment at every construction site
and needs a lint-style guardrail to keep it from drifting.

This adds a package-level default consulted when a Config has no explicit
resolver set. Callers assign it once during startup (typically from an
init() block in a package that the main binary blank-imports); every
subsequent Config picks up the same resolver without per-site wiring.
Config-level resolvers still take precedence, so PR #1572's contract is
unchanged.

Exposed as a plain variable (not a setter function), matching the existing
`Config.HostMetadataResolver` field and stdlib precedents like
`http.DefaultClient` and `http.DefaultTransport`. Not safe for concurrent
modification after initialization — documented as such.

**What's new:**
- `config.DefaultHostMetadataResolverFactory` — experimental; set to nil to clear.

**What stays the same:**
- Default behavior (no factory set) is unchanged: the SDK performs its own HTTP fetch.
- `Config.HostMetadataResolver` takes precedence over the factory.
- A factory returning nil falls through to the default HTTP fetch.

## Tests

Three new tests in `config/config_test.go`:
- Factory is invoked when Config has no resolver; backfill works.
- Config-level resolver wins when both are set (factory not called).
- Factory returning nil falls through to the SDK's HTTP fetch.

Each test clears the factory via `t.Cleanup(func() { DefaultHostMetadataResolverFactory = nil })`
so package-level state does not leak between tests. `make fmt test lint`
clean, `-race` clean.

Signed-off-by: simon <simon.faltum@databricks.com>
@simonfaltum simonfaltum force-pushed the simonfaltum/default-host-metadata-resolver branch from ac18d2c to 9f7e4b4 Compare April 17, 2026 16:01
@simonfaltum simonfaltum changed the title Add SetDefaultHostMetadataResolverFactory Add DefaultHostMetadataResolverFactory Apr 17, 2026
Codex review flagged the exposed-variable approach: a plain global
function value is a Go data race under concurrent read/write, even if
documented as init-only. Wraps the storage in an unexported var guarded
by a sync.RWMutex, exposing SetDefaultHostMetadataResolverFactory. Also
keeps the representation private so future iteration (atomic.Pointer,
validation, observer hooks) stays non-breaking while the API is marked
experimental.

Tests use a withDefaultHostMetadataResolverFactory helper that captures
and restores the prior registration instead of resetting to nil, so
tests never clobber each other via the package-level default.

Signed-off-by: simon <simon.faltum@databricks.com>
@simonfaltum simonfaltum changed the title Add DefaultHostMetadataResolverFactory Add SetDefaultHostMetadataResolverFactory Apr 17, 2026
Signed-off-by: simon <simon.faltum@databricks.com>
Comment thread config/host_metadata.go Outdated
Comment thread config/host_metadata.go Outdated
Comment on lines +42 to +45
var (
defaultHostMetadataResolverFactoryMu sync.RWMutex
defaultHostMetadataResolverFactory func(*Config) HostMetadataResolver
)
Copy link
Copy Markdown
Contributor

@renaudhartert-db renaudhartert-db Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] If users are expected to set this at init time, then you don't need the mutex. See http and log as references from the standard library.

var DefaultHostMetadataResolverFactory func(*Config) HostMetadataResolver = nil

Users who need to use different resolvers within their code should use Config.HostMetadataResolver instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to a plain var DefaultHostMetadataResolverFactory, dropped the mutex and the sync import. Matches the stdlib precedent you called out (http.DefaultClient, http.DefaultTransport, log.Default).

Copy link
Copy Markdown
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo comments.

Co-authored-by: Renaud Hartert <renaud.hartert@databricks.com>
Renaud noted that for an init-time default the stdlib pattern is a plain
public variable, not a setter/getter pair guarded by a mutex (see
http.DefaultClient, http.DefaultTransport, log.Default). Callers who
need dynamic or per-Config resolvers use Config.HostMetadataResolver
instead, so there's no real concurrent-write scenario to guard against.

Simplifies the API surface and drops the sync import.

Signed-off-by: simon <simon.faltum@databricks.com>
@github-actions
Copy link
Copy Markdown

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-go

Inputs:

  • PR number: 1636
  • Commit SHA: d79be27791d4cd5f01a0add1a9c8c6d8959cb77b

Checks will be approved automatically on success.

@simonfaltum simonfaltum added this pull request to the merge queue Apr 20, 2026
Merged via the queue into main with commit 55d5200 Apr 20, 2026
12 of 14 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/default-host-metadata-resolver branch April 20, 2026 07:39
simonfaltum added a commit to databricks/cli that referenced this pull request Apr 21, 2026
SDK v0.128.0 (databricks/databricks-sdk-go#1636) adds
config.DefaultHostMetadataResolverFactory so a package can install a
single hook that every Config picks up on EnsureResolved, without
per-site wiring.

Replaces ten hostmetadata.Attach(cfg) call sites across seven files
and the injection guardrail test with two pieces:

- libs/hostmetadata/resolver.go: init() sets
  config.DefaultHostMetadataResolverFactory to wrap cfg.DefaultHostMetadataResolver()
  in the caching resolver.
- main.go: blank import of libs/hostmetadata triggers that init() at
  startup so every *config.Config the CLI constructs picks up the
  cached lookup automatically.

Co-authored-by: Isaac
andrewnester pushed a commit to bernardo-rodriguez/b-cli that referenced this pull request Apr 22, 2026
)

## Why

Every CLI command (`databricks auth profiles`, `bundle validate`, every
workspace or account call) goes through `Config.EnsureResolved`, which
triggers an unauthenticated GET to
`{host}/.well-known/databricks-config` to populate host metadata. That
round trip is ~700ms against production and gets paid on every
invocation, doubling the latency of otherwise single-request commands.

## Changes

**Before:** every CLI invocation hits the well-known endpoint once (or
more when multiple configs get constructed).
**Now:** the first invocation populates a local disk cache under
`~/.cache/databricks/<version>/host-metadata/`; subsequent invocations
read from it. Failures are negatively cached for 60s (except for
`context.Canceled` / `context.DeadlineExceeded`, which are transient and
never cached).

The integration hooks into SDK `v0.128.0`'s
`config.DefaultHostMetadataResolverFactory` (added in
databricks/databricks-sdk-go#1636) via two pieces:

- `libs/hostmetadata/resolver.go`: `init()` registers a factory that
wraps `cfg.DefaultHostMetadataResolver()` in the caching resolver.
`NewResolver(fetch)` remains the unit-testable primary API.
- `main.go`: a blank import of `libs/hostmetadata` triggers that
`init()` at startup, so every `*config.Config` the CLI constructs now
and in the future picks up the cached lookup automatically. No per-site
wiring, no guardrail test.

Positive cache wraps the miss path, so the hit path is a single disk
read; negative cache is only consulted when positive misses.

`internal/testutil/env.go` pins `DATABRICKS_CACHE_DIR` to a temp dir in
test cleanup so tests don't leak cache files into `HOME`.

### Collateral cleanups

- `libs/cache/file_cache.go`: drop the `failed to stat cache file` debug
log when the file is simply missing (`fs.ErrNotExist`). It was pure
noise (the next line, `cache miss, computing`, conveys the same info)
and its OS-specific error text diverged between Unix (`no such file or
directory`) and Windows (`The system cannot find the file specified.`),
breaking cross-platform acceptance goldens. Genuine stat failures
(permission, corruption) still log.
- `libs/testdiff/replacement.go`: `devVersionRegex` now accepts either
`+SHA` or `-SHA` after `0.0.0-dev`. `build.GetSanitizedVersion()` swaps
`+` to `-` for filesystem safety when the version is used in cache
paths, and the old regex only covered the `+` form.

## Test plan

- [x] `make checks` clean
- [x] `make lint` clean (0 issues)
- [x] `go test ./libs/hostmetadata/... -race` clean (factory-installed
assertion + cache hit + fetch error + cancellation-not-cached + host
isolation + end-to-end integration)
- [x] `go test ./cmd/root/... ./bundle/config/... ./cmd/auth/...
./libs/auth/... -race` clean
- [x] End-to-end acceptance test `acceptance/auth/host-metadata-cache/`
asserts exactly ONE `/.well-known/databricks-config` GET across two
`auth profiles` invocations sharing a `DATABRICKS_CACHE_DIR`
- [x] Existing acceptance tests regenerated: fewer well-known GETs in
`out.requests.txt` (caching works), new `[Local Cache]` debug lines in
cache/telemetry tests, two `Warn: Failed to resolve host metadata` lines
removed (intentional: the resolver returns `(nil, nil)` on fetch errors,
which is how the SDK interprets "no metadata available"), stat-not-found
lines removed (see Collateral cleanups)

### Live validation against dogfood (from previous push)

Built locally and ran `databricks -p e2-dogfood current-user me` with
and without a warm cache:

| Scenario | Elapsed well-known time | Cache log output |
|---|---|---|
| Cold cache (fresh `DATABRICKS_CACHE_DIR`) | ~713ms fetch | `cache
miss, computing` -> `GET /.well-known/databricks-config` -> `computed
and stored result` |
| Warm cache (second invocation) | ~1ms | single `[Local Cache] cache
hit` line |

Net per-command savings: ~700ms, matching the Why.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants