Skip to content

Guard metrics collector with mutex#186

Open
ghdrope wants to merge 4 commits into
kcp-dev:mainfrom
ghdrope:160-guard-MetricsCollector-with-mutex
Open

Guard metrics collector with mutex#186
ghdrope wants to merge 4 commits into
kcp-dev:mainfrom
ghdrope:160-guard-MetricsCollector-with-mutex

Conversation

@ghdrope
Copy link
Copy Markdown

@ghdrope ghdrope commented Mar 28, 2026

Summary

Make MetricsCollector thread-safe and prevent metric disappearance during Prometheus scrapes by adding a sync.RWMutex and a Collect() method.

Note: Also added defer ticker.Stop() and commenting.

What Type of PR Is This?

/kind bug
/kind cleanup
/kind documentation

Related Issue(s)

Fixes #160

Release Notes

MetricsCollector is now thread-safe.

@kcp-ci-bot kcp-ci-bot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/documentation Categorizes issue or PR as related to documentation. labels Mar 28, 2026
@kcp-ci-bot
Copy link
Copy Markdown
Contributor

Hi @ghdrope. Thanks for your PR.

I'm waiting for a kcp-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kcp-ci-bot kcp-ci-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 28, 2026
@ghdrope ghdrope force-pushed the 160-guard-MetricsCollector-with-mutex branch from 9dc680a to 08a7ba4 Compare March 28, 2026 13:25
Copy link
Copy Markdown
Member

@ntnn ntnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you!

/assign @xrstf

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 29, 2026
@kcp-ci-bot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: d32980f2f59971dadc8e8dbf6189ef948b725759

@ntnn
Copy link
Copy Markdown
Member

ntnn commented Mar 29, 2026

/ok-to-test

@kcp-ci-bot kcp-ci-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 29, 2026
@ntnn ntnn added this to tbd Mar 29, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in tbd Mar 29, 2026
@ntnn
Copy link
Copy Markdown
Member

ntnn commented Mar 29, 2026

/retest

infra failure

 failed to connect to the docker API at unix:///var/run/docker.sock; check if the path is correct and if the daemon is running: dial unix /var/run/docker.sock: connect: no such file or directory 

@ghdrope
Copy link
Copy Markdown
Author

ghdrope commented Mar 30, 2026

/retest

Comment thread internal/metrics/collector.go Outdated
Comment thread internal/metrics/collector.go Outdated
@ghdrope ghdrope force-pushed the 160-guard-MetricsCollector-with-mutex branch from 08a7ba4 to dc60979 Compare March 30, 2026 22:36
@kcp-ci-bot kcp-ci-bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@kcp-ci-bot kcp-ci-bot requested review from ntnn and xrstf March 30, 2026 22:36
@kcp-ci-bot
Copy link
Copy Markdown
Contributor

New changes are detected. LGTM label has been removed.

@ntnn ntnn moved this from Backlog to Reviewing in tbd Mar 31, 2026
@xrstf
Copy link
Copy Markdown
Contributor

xrstf commented Apr 1, 2026

/retest

@ghdrope
Copy link
Copy Markdown
Author

ghdrope commented Apr 2, 2026

/retest

1 similar comment
@mjudeikis
Copy link
Copy Markdown
Contributor

/retest

@kcp-ci-bot kcp-ci-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 8, 2026
@ntnn
Copy link
Copy Markdown
Member

ntnn commented Apr 21, 2026

@ghdrope Could you rebase the pr? Thanks!

@xrstf xrstf changed the title 160 guard metrics collector with mutex Guard metrics collector with mutex Apr 21, 2026
@xrstf
Copy link
Copy Markdown
Contributor

xrstf commented Apr 21, 2026

/remove-kind documentation

@kcp-ci-bot kcp-ci-bot removed the kind/documentation Categorizes issue or PR as related to documentation. label Apr 21, 2026
@ghdrope
Copy link
Copy Markdown
Author

ghdrope commented Apr 21, 2026

I lost track of this one, my bad. Sure, I can rebase it @ntnn, I’ll do it later today after work hours.

@kcp-ci-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ntnn. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 21, 2026
ghdrope and others added 3 commits April 22, 2026 00:19
Signed-off-by: ghdrope <0coasts-gearing@icloud.com>
Signed-off-by: ghdrope <0coasts-gearing@icloud.com>
* implement initial support for external virtual workspaces

On-behalf-of: @SAP christoph.mewes@sap.com

* use unrealistically low resource requests to help squeeze more out of our CI nodes

* PR feedback from Copilot
@ghdrope ghdrope force-pushed the 160-guard-MetricsCollector-with-mutex branch from 4236a41 to 6dd1750 Compare April 21, 2026 22:30
@kcp-ci-bot kcp-ci-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 21, 2026
Signed-off-by: ghdrope <0coasts-gearing@icloud.com>
@ghdrope ghdrope force-pushed the 160-guard-MetricsCollector-with-mutex branch from 6dd1750 to f9034fc Compare April 22, 2026 07:09
@ghdrope
Copy link
Copy Markdown
Author

ghdrope commented Apr 22, 2026

Tests passed 😃 @ntnn @xrstf please take a look when you have time, and let me know if any changes are needed.
Thank you!

@xrstf
Copy link
Copy Markdown
Contributor

xrstf commented Apr 22, 2026

Now that I look at it, are we even sure the Collect() function is called? From what I can see in the main(), each individual metric is registered with the default collector, and our MetricsCollector is purely a goroutine to update those metrics. I do not think the mutex will actually do anything right now, unless we use our MetricsCollector as a "native" Prometheus collector, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metrics: guard MetricsCollector with a mutex to prevent transient scrape gaps

5 participants