Skip to content

🌱 Prevent duplicate log watchers in E2E tests#3024

Open
lentzi90 wants to merge 1 commit intometal3-io:mainfrom
Nordix:lentzi90/duplicate-log-watcher
Open

🌱 Prevent duplicate log watchers in E2E tests#3024
lentzi90 wants to merge 1 commit intometal3-io:mainfrom
Nordix:lentzi90/duplicate-log-watcher

Conversation

@lentzi90
Copy link
Copy Markdown
Member

@lentzi90 lentzi90 commented Feb 26, 2026

What this PR does / why we need it:
We sometimes use BuildAndApplyKustomization multiple times, e.g. inside
FlakeAttempt. This is an issue for the log watchers, since we create a
new watcher each time. Each watcher writes logs to the files so we end
up with the logs duplicated. To fix this, keep track of which watchers
we have and skip setting up duplicates. Additionally, clean up the
watchers after the test is done, to avoid leaking and spamming.

I think we use FlakeAttempts only during upgrade tests, so don't expect to see a difference in the normal tests.

Part of #2927

Checklist:

  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • E2E tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lentzi90 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 26, 2026
@lentzi90
Copy link
Copy Markdown
Member Author

/test metal3-bmo-e2e-test-optional-pull

@lentzi90
Copy link
Copy Markdown
Member Author

/hold
BMO and Ironic logs are missing here. I must have missed something.

@metal3-io-bot metal3-io-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 27, 2026
@lentzi90 lentzi90 force-pushed the lentzi90/duplicate-log-watcher branch from b292e97 to 2dddfc9 Compare March 3, 2026 08:15
@metal3-io-bot metal3-io-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 3, 2026
@lentzi90
Copy link
Copy Markdown
Member Author

lentzi90 commented Mar 3, 2026

Now the logs look correct
/hold cancel

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2026
@lentzi90
Copy link
Copy Markdown
Member Author

lentzi90 commented Mar 3, 2026

I forgot to run upgrade tests...
/hold
/test metal3-bmo-e2e-test-optional-pull

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2026
@lentzi90 lentzi90 force-pushed the lentzi90/duplicate-log-watcher branch from 2dddfc9 to 9bc7756 Compare March 4, 2026 08:35
@lentzi90
Copy link
Copy Markdown
Member Author

lentzi90 commented Mar 4, 2026

/test metal3-bmo-e2e-test-optional-pull

@lentzi90
Copy link
Copy Markdown
Member Author

lentzi90 commented Mar 4, 2026

Log duplication fixes are working. There are more duplications still that we cannot fix here though. See #2927 (comment).
The test failure is unrelated. The issue behind that is fixed in #3033.

@lentzi90
Copy link
Copy Markdown
Member Author

lentzi90 commented Mar 4, 2026

/hold cancel

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2026
@lentzi90 lentzi90 force-pushed the lentzi90/duplicate-log-watcher branch from 9bc7756 to 6fa012a Compare March 6, 2026 13:08
@lentzi90
Copy link
Copy Markdown
Member Author

lentzi90 commented Mar 6, 2026

/test metal3-bmo-e2e-test-optional-pull
Still expected to fail often, until #3033 is merged.

@lentzi90 lentzi90 force-pushed the lentzi90/duplicate-log-watcher branch from 6fa012a to 72f2906 Compare March 10, 2026 08:32
@lentzi90
Copy link
Copy Markdown
Member Author

/test metal3-bmo-e2e-test-optional-pull

Comment thread test/e2e/common.go
@@ -621,13 +669,15 @@ func BuildAndApplyKustomization(ctx context.Context, input *BuildAndApplyKustomi
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be make simpler if we just return if WaitForDeploymentsAvailable fails,
and call WatchDeploymentLogsByName only when Deployment is successful.
What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure I understand. I think WaitForDeploymentAvailable already fails the test if it fails, so in that case we do not continue.
The thing I am trying to fix here is when we use FlakeAttempts, here:

err := FlakeAttempt(2, func() error {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FlakeAttempt always run twice even if first run is successful?
I think it would be better to use Eventually here. In Eventually as soon as there is success it will not try again.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it should only run 2 times if the first fails. Hmm maybe I could remove this actually. I think we added it when IPA downloader was very flaky. It may not be needed anymore.
/hold
Let me check this

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we do need FlakeAttempts still. I think it is not really flaky, rather the first fails every time. Maybe we could clean that up in some other way, but for now I think this log watcher fix is ok.
/hold cancel

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2026
We sometimes use BuildAndApplyKustomization multiple times, e.g. inside
FlakeAttempt. This is an issue for the log watchers, since we create a
new watcher each time. Each watcher writes logs to the files so we end
up with the logs duplicated. To fix this, keep track of which watchers
we have and skip setting up duplicates. Additionally, clean up the
watchers after the test is done, to avoid leaking and spamming.

Signed-off-by: Lennart Jern <lennart.jern@est.tech>
@lentzi90 lentzi90 force-pushed the lentzi90/duplicate-log-watcher branch from 72f2906 to 060947b Compare March 25, 2026 13:55
@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 25, 2026
@lentzi90
Copy link
Copy Markdown
Member Author

/test metal3-bmo-e2e-test-optional-pull

@lentzi90
Copy link
Copy Markdown
Member Author

Ok it looks like we still need the FlakeAttempts. From what I understand we need it at least when applying BMO, because we create namespaces together with resources in the namespaces, so things fail the first time because the namespaces are missing.

@lentzi90 lentzi90 force-pushed the lentzi90/duplicate-log-watcher branch from 060947b to 72e2b10 Compare March 25, 2026 15:32
@metal3-io-bot metal3-io-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 25, 2026
@lentzi90
Copy link
Copy Markdown
Member Author

/test metal3-bmo-e2e-test-optional-pull

@lentzi90
Copy link
Copy Markdown
Member Author

/ok-to-test
Comon... run the tests

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 25, 2026
@lentzi90
Copy link
Copy Markdown
Member Author

/hold
Discussed with @adilGhaffarDev . We will hold for now so we can get the CAPI fix in first. Then it will be easier to say what duplication is coming from where.

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants