Attempt at fixing flakey tests by hlipsig · Pull Request #4700 · Azure/ARO-RP

hlipsig · 2026-03-19T22:10:08Z

Which issue this PR addresses:

ARO-25407 Flakes for timeouts in MUO test.

What this PR does / why we need it:

Improves tests resilience by changing the timeout condition to 3 minutes.

Test plan for issue:

Local run looked good. Green e2e signal is what's needed.

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

Only fixes tests.

Copilot

Pull request overview

Improves e2e test resilience by waiting for Deployments to be ready (not just present) and by skipping MUO tests when MUO isn’t enabled/managed, addressing flakes caused by timeout/race conditions around reconciliation readiness.

Changes:

Add Eventually-based helpers to wait for MUO/Gatekeeper Deployments to become ready.
Update MUO restore test to skip when MUO isn’t enabled/managed and to wait for readiness after reconciliation.
Update Guardrails Gatekeeper restore tests to wait for readiness after reconciliation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/e2e/operator.go

Copilot · 2026-03-20T00:16:07Z

@hlipsig I've opened a new pull request, #4701, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: hlipsig <8000786+hlipsig@users.noreply.github.com>

Revert flakey test changes and increase MUO SpecTimeout to 3 minutes

cadenmarchese · 2026-03-20T12:51:04Z

test/e2e/operator.go

 		By("waiting for the MUO deployment to be reconciled")
 		GetK8sObjectWithRetry(ctx, getFunc, managedUpgradeOperatorDeployment, metav1.GetOptions{})
-	}, SpecTimeout(2*time.Minute))
+	}, SpecTimeout(3*time.Minute))


Rather than specify a timeout for this specific test, can we leverage the Eventually() pattern like other tests, using the DefaultEventuallyTimeout at test/e2e/helpers.go, which is 5 minutes?

cadenmarchese · 2026-03-20T12:52:17Z

.bingo/go.mod

If the .bingo change is required, can we leave it to a new PR since this one is meant to fix the e2e flake?

Yeah. I'll do that.

… explicit goversion from bingo's go.mod

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

test/e2e/operator.go:432

Describe container bodies in this test suite appear to consistently use func() (with ctx context.Context only on It/hooks). Changing this Describe to func(ctx context.Context) is likely an invalid signature for Ginkgo container nodes and will break compilation. Please revert this Describe body back to func() and keep context usage within It/BeforeEach/AfterEach where supported.

var _ = Describe("ARO Operator - MUO Deployment", func() {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T15:44:04Z

test/e2e/operator.go

 		By("waiting for the MUO deployment to be reconciled")
-		GetK8sObjectWithRetry(ctx, getFunc, managedUpgradeOperatorDeployment, metav1.GetOptions{})
-	}, SpecTimeout(2*time.Minute))
+		Eventually(func(g Gomega, ctx context.Context) {
+			_, err := getFunc(ctx, managedUpgradeOperatorDeployment, metav1.GetOptions{})
+			g.Expect(err).NotTo(HaveOccurred())
+		}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())
+	})


This spec previously had a spec-level timeout and the PR description/linked context indicates the intent is to increase it to ~3 minutes to reduce flakes. The current change removes the SpecTimeout(...) entirely and instead relies on DefaultEventuallyTimeout (currently 5 minutes) for this wait, which doesn’t match the stated goal and changes the overall bounding behavior of the spec. Please reintroduce an explicit spec-level timeout (e.g., 3 minutes) or set this Eventually timeout to the intended value and keep a spec-level bound for the whole test.

Copilot · 2026-04-03T15:44:04Z

test/e2e/operator.go

+		Eventually(func(g Gomega, ctx context.Context) {
+			_, err := getFunc(ctx, managedUpgradeOperatorDeployment, metav1.GetOptions{})
+			g.Expect(err).NotTo(HaveOccurred())
+		}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())


This Eventually block duplicates the existing GetK8sObjectWithRetry helper (used throughout this file) and also changes retry semantics (helper uses DefaultTimeout and PollingInterval, while this uses DefaultEventuallyTimeout and the suite’s default polling interval). To keep retries/timeouts consistent across the e2e suite and avoid future tuning in multiple places, prefer using the shared helper here (or explicitly match its timeout/polling settings).

Suggested change

Eventually(func(g Gomega, ctx context.Context) {

_, err := getFunc(ctx, managedUpgradeOperatorDeployment, metav1.GetOptions{})

g.Expect(err).NotTo(HaveOccurred())

}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())

GetK8sObjectWithRetry(ctx, getFunc, managedUpgradeOperatorDeployment, metav1.GetOptions{})

Copilot · 2026-04-03T15:44:04Z

test/e2e/operator.go

+		Eventually(func(g Gomega, ctx context.Context) {
+			_, err := getFunc(ctx, gkControllerManagerDeployment, metav1.GetOptions{})
+			g.Expect(err).NotTo(HaveOccurred())
+		}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())


This reconciliation wait was changed from GetK8sObjectWithRetry to a custom Eventually with different timeout/polling defaults. Unless there’s a specific behavior change needed here, consider using the shared helper for consistency (or explicitly set polling/timeout to match it) to avoid subtly different retry behavior across similar tests.

Copilot · 2026-04-03T15:44:05Z

test/e2e/operator.go

+		Eventually(func(g Gomega, ctx context.Context) {
+			_, err := getFunc(ctx, gkAuditDeployment, metav1.GetOptions{})
+			g.Expect(err).NotTo(HaveOccurred())
+		}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())


This reconciliation wait was changed from GetK8sObjectWithRetry to a custom Eventually with different timeout/polling defaults. Unless there’s a specific behavior change needed here, consider using the shared helper for consistency (or explicitly set polling/timeout to match it) to avoid subtly different retry behavior across similar tests.

Suggested change

Eventually(func(g Gomega, ctx context.Context) {

_, err := getFunc(ctx, gkAuditDeployment, metav1.GetOptions{})

g.Expect(err).NotTo(HaveOccurred())

}).WithContext(ctx).WithTimeout(DefaultEventuallyTimeout).Should(Succeed())

GetK8sObjectWithRetry(ctx, getFunc, gkAuditDeployment, metav1.GetOptions{})

Attempt at fixing flakey tests

deccbff

Copilot AI review requested due to automatic review settings March 19, 2026 22:10

hlipsig requested review from alcasim, bennerv, cadenmarchese, hawkowl, jharrington22, kimorris27, mociarain, mrWinston, rogbas, sankur-codes, tiguelu, tsatam, tuxerrante, ventifus, wanghaoran1988 and yjst2012 as code owners March 19, 2026 22:10

Copilot AI reviewed Mar 19, 2026

View reviewed changes

test/e2e/operator.go Outdated Show resolved Hide resolved

test/e2e/operator.go Outdated Show resolved Hide resolved

test/e2e/operator.go Show resolved Hide resolved

Copilot started reviewing on behalf of hlipsig March 19, 2026 22:19 View session

Initial plan

37bacd3

Copilot AI mentioned this pull request Mar 20, 2026

Revert flakey test changes and increase MUO SpecTimeout to 3 minutes #4701

Merged

Copilot AI and others added 3 commits March 20, 2026 00:25

Revert flakey test changes and increase SpecTimeout to 3 minutes

7543a0a

Co-authored-by: hlipsig <8000786+hlipsig@users.noreply.github.com>

Update .bingo/go.mod

cf97afa

Merge pull request #4701 from Azure/copilot/sub-pr-4700

b8451e7

Revert flakey test changes and increase MUO SpecTimeout to 3 minutes

cadenmarchese reviewed Mar 20, 2026

View reviewed changes

update both tests to use the eventually time out functions. Clear the…

01d2b1c

… explicit goversion from bingo's go.mod

Copilot AI review requested due to automatic review settings April 3, 2026 15:40

hlipsig requested a review from kevinobriendotca as a code owner April 3, 2026 15:40

Copilot started reviewing on behalf of hlipsig April 3, 2026 15:41 View session

remove unneeded context in func()

4332e70

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt at fixing flakey tests#4700

Attempt at fixing flakey tests#4700
hlipsig wants to merge 7 commits intomasterfrom
hlipsig/ARO-25407/fix-muo-gatekeeper-e2e-flake

hlipsig commented Mar 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

cadenmarchese Mar 20, 2026

Uh oh!

cadenmarchese Mar 20, 2026

Uh oh!

hlipsig Apr 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hlipsig commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue this PR addresses:

What this PR does / why we need it:

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 20, 2026

Uh oh!

cadenmarchese Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

cadenmarchese Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

hlipsig Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hlipsig commented Mar 19, 2026 •

edited

Loading

hlipsig Apr 3, 2026 •

edited

Loading