Skip to content

hub: stop sharing the embedded descriptor map across registries#632

Merged
tamalsaha merged 1 commit into
masterfrom
fix/registry-shared-map-race
May 19, 2026
Merged

hub: stop sharing the embedded descriptor map across registries#632
tamalsaha merged 1 commit into
masterfrom
fix/registry-shared-map-race

Conversation

@tamalsaha
Copy link
Copy Markdown
Contributor

Summary

  • NewRegistryOfKnownResources and NewKVLocal both wrapped resourcedescriptors.KnownDescriptors() — the package-global map — directly into a KVMap. Each Registry held its own RWMutex, but KVMap.Set wrote into the same underlying Go map under different locks.
  • With PoolSize = 1024 clusters, a CRD discovery from cluster A and one from cluster B can hit Set simultaneously and trigger Go's runtime fatal error: concurrent map writes — a process crash, not silent corruption.
  • Introduce NewKVMapFromKnown() which copies the map before wrapping, and use it from both in-tree construction sites. KnownDescriptors() is unchanged so out-of-tree consumers keep working; any pool factory that constructs a KVMap directly from KnownDescriptors() should migrate to this helper.

Test plan

  • make unit-tests
  • go test -race ./hub/... — exercise the registry concurrently via a small in-package test that spins multiple goroutines calling Set on different registries built from NewRegistryOfKnownResources. (Optional follow-up if not yet present.)
  • Smoke-test kube-ui-server with two clusters discovering distinct CRDs simultaneously.

🤖 Generated with Claude Code

NewRegistryOfKnownResources and NewKVLocal wrapped
resourcedescriptors.KnownDescriptors() — the package-global map —
directly into a KVMap. Different Registry instances therefore held
their own RWMutex but wrote into the same underlying Go map on
KVMap.Set. The pool spins up one registry per cluster UID
(PoolSize=1024); when several clusters concurrently discovered CRDs
that weren't in the embedded set, those Set calls raced on the shared
map. In Go that is a "fatal error: concurrent map writes" — not a
silent corruption but a process crash.

Introduce NewKVMapFromKnown(), which copies the known-descriptors map
into a fresh map before wrapping it in a KVMap, and use it from both
construction paths. KnownDescriptors() itself is unchanged; out-of-tree
pool factories that share the global map should migrate to this
helper.

Signed-off-by: Tamal Saha <tamal@appscode.com>
@tamalsaha tamalsaha mentioned this pull request May 19, 2026
2 tasks
@tamalsaha tamalsaha merged commit 0f2dd6e into master May 19, 2026
6 checks passed
@tamalsaha tamalsaha deleted the fix/registry-shared-map-race branch May 19, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant