Skip to content

add HTTP client request metrics for scaler metric requests#7644

Open
aliaqel-stripe wants to merge 25 commits intokedacore:mainfrom
aliaqel-stripe:feat/http-client-metrics
Open

add HTTP client request metrics for scaler metric requests#7644
aliaqel-stripe wants to merge 25 commits intokedacore:mainfrom
aliaqel-stripe:feat/http-client-metrics

Conversation

@aliaqel-stripe
Copy link
Copy Markdown
Contributor

@aliaqel-stripe aliaqel-stripe commented Apr 13, 2026

Add HTTP client request metrics for scaler metric collection (keda_scaler_http_requests_total / keda_scaler_http_request_duration_seconds for Prometheus, keda.scaler.http.requests.count / keda.scaler.http.request.duration.seconds for OTel). Recording is gated on all five scaler context keys being present in the request context, so only metric collection calls emitted through GetMetricsAndActivityForScaler are counted — not scaler-initialization requests.

The request counter metric (keda_scaler_http_requests_total) is labelled by namespace, scaled_resource, scaler, trigger_name, metric_name, status_code.

The latency histogram (keda_scaler_http_request_duration_seconds) uses only scaler and status_code to keep MTS cardinality low — latency by scaler type probably matters most here.

status_code is the numeric HTTP status, or "error" for transport-level failures with no HTTP response.

Changes

  • pkg/util/http_roundtripper.go (new): InstrumentedRoundTripper wraps any http.RoundTripper and records metrics after each round-trip when all scaler context keys are present.
  • pkg/util/http.go: CreateHTTPTransportWithTLSConfig now wraps the transport with NewInstrumentedRoundTripper, so all scalers using these helpers gain instrumentation automatically.
  • pkg/scaling/cache/scalers_cache.go: New buildScalerRequestCtx injects the five context keys before calling GetMetricsAndActivity. The retry path re-fetches the ScalerBuilder after refresh so labels reflect the current scaler config.
  • pkg/scalers/aws/aws_sigv4.go: Transport is now initialised once in NewSigV4RoundTripper rather than per-request, eliminating double instrumentation and restoring connection pooling.
  • pkg/metricscollector/: New RecordHTTPClientRequest in the interface and both backends; httpStatusCodeLabel moved to the shared file; OTel error log message corrected.
  • Tests: New unit test files for http_roundtripper and prommetrics; e2e assertions added to both Prometheus and OTel sequential test suites.

Checklist

Fixes #6600

@aliaqel-stripe aliaqel-stripe requested a review from a team as a code owner April 13, 2026 01:30
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Apr 13, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

@keda-automation keda-automation requested a review from a team April 13, 2026 01:30
@aliaqel-stripe aliaqel-stripe force-pushed the feat/http-client-metrics branch from 77741df to 82d1237 Compare April 13, 2026 01:37
@aliaqel-stripe aliaqel-stripe changed the title feat: add HTTP client request metrics for scaler metric fetches add HTTP client request metrics for scaler metric requests Apr 13, 2026
…quests

Instruments all outbound HTTP calls made by KEDA scalers with
keda_http_client_requests_total and keda_http_client_request_duration_seconds
(Prometheus) and keda.http.client.requests.count /
keda.http.client.request.duration.seconds (OTel) via an InstrumentedRoundTripper
injected in CreateHTTPTransportWithTLSConfig.

Closes kedacore#6600

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…tion and double instrumentation

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…after refresh

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…o remove cross-file coupling

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…bel precedence, and histogram recording

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…ey-value pairs

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…g metric fetches

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…on failures

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
The duration histogram keda_scaler_http_request_duration_seconds previously
carried 6 label dimensions (namespace, scaled_resource, scaler, trigger_name,
metric_name, status_code), creating high MTS cardinality. Latency by scaler
type is what matters, so the histogram is reduced to 2 labels: scaler and
status_code. The counter retains all 6 labels.

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@aliaqel-stripe aliaqel-stripe force-pushed the feat/http-client-metrics branch from fdb37a6 to 9c79fce Compare April 13, 2026 16:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds outbound HTTP client request metrics for scaler metric collection, enabling better observability of scaler-to-upstream HTTP success/error rates and latencies across both Prometheus and OpenTelemetry backends.

Changes:

  • Introduces an instrumented http.RoundTripper and wires it into shared HTTP client/transport helpers, with metric recording gated on scaler-context keys.
  • Injects scaler context into metric-collection request contexts (including retry/refresh path) so outbound requests can be labeled correctly.
  • Adds Prometheus + OTel metric instruments/recording paths, plus unit and sequential/e2e-style assertions.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/sequential/prometheus_metrics/prometheus_metrics_test.go Adds sequential test assertions for new Prometheus HTTP client metrics.
tests/sequential/opentelemetry_metrics/opentelemetry_metrics_test.go Adds sequential test assertions for OTel-exported HTTP client metrics.
pkg/util/http_roundtripper.go New InstrumentedRoundTripper that records per-request count + duration when scaler context is present.
pkg/util/http_roundtripper_test.go Unit tests for round-tripper wrapping behavior and helper instrumentation.
pkg/util/http.go Wraps shared transports with instrumentation; changes helper return types to http.RoundTripper.
pkg/scaling/cache/scalers_cache.go Injects scaler context keys into metric-collection context; updates retry path label sourcing.
pkg/scaling/cache/scalers_cache_test.go Unit test verifying scaler request context key injection.
pkg/scalers/aws/aws_sigv4.go Reuses a single transport in SigV4 RT to restore pooling and avoid repeated wrapping.
pkg/metricscollector/metricscollectors.go Extends metrics collector interface with RecordHTTPClientRequest and shared status-code labeling.
pkg/metricscollector/prommetrics.go Adds Prometheus counter + histogram and implements RecordHTTPClientRequest.
pkg/metricscollector/prommetrics_test.go Unit tests for status code label logic and Prometheus recording path.
pkg/metricscollector/opentelemetry.go Adds OTel instruments and implements RecordHTTPClientRequest; adjusts an error log message.
pkg/metricscollector/opentelemetry_test.go Adds unit test coverage for OTel HTTP client request recording.
CHANGELOG.md Documents addition of scaler HTTP request metrics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/util/http_roundtripper_test.go Outdated
Comment thread pkg/scaling/cache/scalers_cache.go
Comment thread pkg/metricscollector/opentelemetry.go Outdated
Comment thread pkg/metricscollector/metricscollectors.go Outdated
Comment thread pkg/util/http.go Outdated
Comment thread pkg/metricscollector/http_roundtripper.go
Copy link
Copy Markdown
Member

@wozniakjan wozniakjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great addition! Just please see the copilot review above and my nits below

Comment thread pkg/util/http.go Outdated
Comment thread pkg/util/http.go Outdated
- Rename CreateHTTPTransport/CreateHTTPTransportWithTLSConfig to CreateRT/CreateRTWithTLSConfig to reflect they return http.RoundTripper
- Fix OTel metric description: "labeled by status class" → "labeled by HTTP status code"
- Fix doc comments on RecordHTTPClientRequest to accurately describe that context keys are extracted by InstrumentedRoundTripper, not the collector
- Fix test comments and rename TestInstrumentedRoundTripper_ScalerContextKey to TestInstrumentedRoundTripper_AllContextKeys with all five required context keys set

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@keda-automation keda-automation requested a review from a team April 22, 2026 04:01
The return type change (*http.Transport -> http.RoundTripper) is non-breaking
for all callers since http.Client.Transport and all struct fields receiving
the value are already typed as http.RoundTripper. No caller accesses
*http.Transport-specific fields, so the rename to CreateRT* is unnecessary
churn.

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
…ipper return type

CreateHTTPTransport and CreateHTTPTransportWithTLSConfig now return
http.RoundTripper (wrapping InstrumentedRoundTripper), so the *Transport
suffix is misleading. Rename to CreateRT / CreateRTWithTLSConfig.

All callers assign to http.Client.Transport or struct fields typed as
http.RoundTripper, so the return type change is non-breaking.

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@wozniakjan
Copy link
Copy Markdown
Member

wozniakjan commented Apr 22, 2026

/run-e2e
Update: You can check the progress here

@wozniakjan wozniakjan added the Awaiting/2nd-approval This PR needs one more approval review label Apr 22, 2026
@JorTurFer
Copy link
Copy Markdown
Member

Hello
Even I like the idea, personally I don't like to support this metric by our own but just instrument the HTTP client using any already existing solution like
https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp#pkg-examples

The potential cardinality that we can generate if we record metrics with all these labels is huge

durationSeconds float64, statusCode int, isError bool, scaler, triggerName, metricName, namespace, scaledResource string)

@wozniakjan wozniakjan added needs-discussion and removed Awaiting/2nd-approval This PR needs one more approval review labels Apr 22, 2026
@wozniakjan
Copy link
Copy Markdown
Member

wozniakjan commented Apr 22, 2026

worth discussing, with third-party tools we would lose granularity and per SO / trigger information but perhaps that is acceptable? There are imho two points where we currently lack consensus:

A) gauge vs. histogram

  1. gauge pros
    • the metric samples are few, by default mostly exactly one per scraping period
    • smaller memory footprint
    • KEDA convention
  2. histogram pros
    • when aggregated across SOs, allows advanced analysis
    • prometheus convention for latency metrics

B) custom HTTP metrics vs. third-party package

  1. custom pros
    • fine granularity with KEDA-relevant labels
  2. third-party package pros
    • industry standard and users' familiarity with it

any other arguments worth adding?

The potential cardinality that we can generate if we record metrics with all these labels is huge

theoretical cardinality yes, also amplified by buckets for sure. But practically, I think the estimates might be less dramatic and frequently similar to other KEDA metrics

  • statusCode - prometheus typically for queries responds - 200, 400, 422, 503, other scalers for query API are likely also bounded by some reasonable status count
  • scaler, triggerName, metricName - these are not orthogonal, there will be correlation, so it's not a cartesian product but rather the max of the three and other two serve the purpose of UX and discoverability
  • namespace, scaledResource - other metrics such as keda_scaler_metrics_value already have these labels so the metrics included here will grow proportionally to the size of the environment, similarly to some already existing metrics
  • imho the biggest question mark remains histogram vs gauge and how much it affects metric growth

Signed-off-by: Ali Aqel <aliaqel@stripe.com>
@keda-automation keda-automation requested a review from a team April 23, 2026 04:58
aliaqel-stripe and others added 6 commits April 23, 2026 05:09
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: Ali Aqel <aliaqel@stripe.com>
Signed-off-by: aliaqel-stripe <120822631+aliaqel-stripe@users.noreply.github.com>
@aliaqel-stripe
Copy link
Copy Markdown
Contributor Author

@JorTurFer @wozniakjan ready for second review

Comment thread pkg/metricscollector/opentelemetry.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metrics on outbound requests from keda-operator to scalers

4 participants