Skip to content

[ARO-25464] Genericise the Monitor bucketing to work for MIMO as well#4718

Open
hawkowl wants to merge 20 commits intomasterfrom
hawkowl/mimo-bucketing-from-monitor
Open

[ARO-25464] Genericise the Monitor bucketing to work for MIMO as well#4718
hawkowl wants to merge 20 commits intomasterfrom
hawkowl/mimo-bucketing-from-monitor

Conversation

@hawkowl
Copy link
Copy Markdown
Collaborator

@hawkowl hawkowl commented Mar 27, 2026

Which issue this PR addresses:

https://redhat.atlassian.net/browse/ARO-25464

Fixes some of the problems we've noticed in recent incident where MIMO works against clusters it doesn't have to.

What this PR does / why we need it:

Genericises the Monitor logic for master and bucket allocation, moves it to another name, and then reuses that in MIMO.

Test plan for issue:

CI, E2E

Is there any documentation that needs to be updated for this PR?

Possibly wrt the Monitor dbs?

How do you know this will function as expected in production?

E2E, hopefully :)

Copilot AI review requested due to automatic review settings March 27, 2026 05:56
@hawkowl hawkowl added next-release To be included in the next RP release rollout go Pull requests that update Go code skippy pull requests raised by member of Team Skippy labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the monitor-specific master/worker bucket allocation with a generic “pool worker” leasing + bucket-balancing mechanism, then reuses it for MIMO (scheduler/actuator) so instances only act on buckets they own.

Changes:

  • Introduces PoolWorkerDocument + database.PoolWorkers and a shared bucket balancer loop in pkg/util/buckets.
  • Migrates pkg/monitor bucket coordination from Monitors DB to the new PoolWorkers DB.
  • Updates MIMO scheduler/actuator to use bucket coordination instead of hostname-based static partitioning, and adds Cosmos container + trigger deployments for PoolWorkers.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pkg/util/buckets/balancer.go New generic bucket coordination loop + balancing logic for pool workers.
pkg/util/buckets/cache.go / pkg/util/buckets/buckets.go Bucket worker behavior tweaks (e.g., bucket -1 served by all) and bucket change handling.
pkg/database/poolworkers.go New PoolWorkers DB wrapper and Cosmos queries for leasing/workers/buckets.
pkg/api/poolworker*.go New API types for PoolWorker documents and worker types.
pkg/monitor/* Migrates monitor bucket ownership to PoolWorkers and wires into generic loop.
pkg/mimo/scheduler/* Switches scheduler to coordinated buckets; adds bucket selector data and filtering.
pkg/mimo/actuator/* Switches actuator to coordinated buckets; removes hostname partitioning.
cmd/aro/* Wires PoolWorkers DB into monitor / mimo services.
pkg/deploy/* Adds PoolWorkers Cosmos container + renewLease trigger to generator and baked assets.
test/database/* Adds fake PoolWorkers client wiring for tests.
pkg/util/buckets/balancer_test.go Moves/updates balancing tests to validate PoolWorker bucket balancing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 27, 2026 06:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 30, 2026 01:30
@hawkowl hawkowl force-pushed the hawkowl/mimo-bucketing-from-monitor branch from 27c9366 to 9db8036 Compare March 30, 2026 01:30
@hawkowl hawkowl changed the title Genericise the Monitor bucketing to work for MIMO as well [ARO-25464] Genericise the Monitor bucketing to work for MIMO as well Mar 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 30, 2026 05:13
@hawkowl hawkowl force-pushed the hawkowl/mimo-bucketing-from-monitor branch from 99b1b64 to 0b6e7fe Compare March 30, 2026 05:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +146 to +148
if err != nil || doc == nil || doc.PoolWorker == nil {
return nil, err
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListBuckets returns (nil, nil) when the master document doesn't exist yet or when doc.PoolWorker is still nil. Callers treat a nil error as success, which can lead to confusing log spam/behavior during bootstrap. Consider returning a non-nil error (e.g., "bucket allocation not initialized") when doc == nil or doc.PoolWorker == nil, so the caller can handle it explicitly.

Suggested change
if err != nil || doc == nil || doc.PoolWorker == nil {
return nil, err
}
if err != nil {
return nil, err
}
if doc == nil || doc.PoolWorker == nil {
return nil, fmt.Errorf("bucket allocation not initialized")
}

Copilot uses AI. Check for mistakes.
@hawkowl hawkowl removed the next-release To be included in the next RP release rollout label Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code ready-for-review skippy pull requests raised by member of Team Skippy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants