fix(otel): honor parent span sampling decisions#51
Conversation
Add newSampler() that reads OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG. When unset, defaults to parentbased_traceidratio so child spans always honor the parent sampling flag, fixing broken trace continuity. Add tracesSampler field to chart values.yaml and render OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG env vars in the deployment template. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-1734 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
WalkthroughThis PR extends OpenTelemetry trace sampling configuration by reading environment-driven sampler modes and arguments, with fallback to config-based ratios. Helm values and deployment template conditionally inject sampler environment variables, while the Go implementation processes these via a new ChangesOpenTelemetry Sampler Configuration
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/utils/otel.go`:
- Around line 333-335: The switch's default branch silently falls back to
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) when
OTEL_TRACES_SAMPLER is empty or unknown; change this so that if the raw
OTEL_TRACES_SAMPLER value is non-empty but not recognized, you emit a warning
via the package's logger before returning the ParentBased(...) fallback. Locate
the code that reads/handles OTEL_TRACES_SAMPLER (the switch that now returns
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))) and add a
conditional to check the original sampler string: if it's non-empty and
unmatched, call the existing logger.Warn/Warnf with a concise message including
the unrecognized value, then proceed to return the existing ParentBased
fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 094fa028-46e1-4c6b-b9c6-e87c478566c9
📒 Files selected for processing (4)
charts/lfx-v2-query-service/templates/deployment.yamlcharts/lfx-v2-query-service/values.yamlpkg/utils/otel.gopkg/utils/otel_test.go
| default: // empty/unknown → parent-based with configured ratio | ||
| return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) | ||
| } |
There was a problem hiding this comment.
Warn when OTEL_TRACES_SAMPLER is unknown instead of silently defaulting.
Right now unknown values and empty values share the same fallback path, which can mask misconfiguration in production. Add a warning for non-empty unknown values before falling back.
Suggested change
default: // empty/unknown → parent-based with configured ratio
+ if sampler != "" {
+ slog.Warn("unknown OTEL_TRACES_SAMPLER, falling back to parentbased_traceidratio",
+ "value", sampler,
+ "fallback-ratio", cfg.TracesSampleRatio,
+ )
+ }
return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default: // empty/unknown → parent-based with configured ratio | |
| return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) | |
| } | |
| default: // empty/unknown → parent-based with configured ratio | |
| if sampler != "" { | |
| slog.Warn("unknown OTEL_TRACES_SAMPLER, falling back to parentbased_traceidratio", | |
| "value", sampler, | |
| "fallback-ratio", cfg.TracesSampleRatio, | |
| ) | |
| } | |
| return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/utils/otel.go` around lines 333 - 335, The switch's default branch
silently falls back to
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) when
OTEL_TRACES_SAMPLER is empty or unknown; change this so that if the raw
OTEL_TRACES_SAMPLER value is non-empty but not recognized, you emit a warning
via the package's logger before returning the ParentBased(...) fallback. Locate
the code that reads/handles OTEL_TRACES_SAMPLER (the switch that now returns
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))) and add a
conditional to check the original sampler string: if it's non-empty and
unmatched, call the existing logger.Warn/Warnf with a concise message including
the unrecognized value, then proceed to return the existing ParentBased
fallback.
There was a problem hiding this comment.
Pull request overview
This PR updates the service’s OpenTelemetry tracing configuration to honor upstream sampling decisions (e.g., from incoming traceparent headers) by switching to a parent-based sampler by default, improving trace continuity in Datadog.
Changes:
- Introduces
newSampler(cfg)to select an OTEL sampler based onOTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARG, defaulting toparentbased_traceidratio. - Updates tracer provider initialization to use the new sampler selection logic.
- Adds Helm values/rendering for
OTEL_TRACES_SAMPLER(+ arg) and adds unit tests for sampler creation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/utils/otel.go | Adds environment-driven sampler selection and applies it to the tracer provider to honor parent sampling. |
| pkg/utils/otel_test.go | Adds tests for sampler creation and invalid sampler arg handling. |
| charts/lfx-v2-query-service/templates/deployment.yaml | Optionally renders OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG from Helm values. |
| charts/lfx-v2-query-service/values.yaml | Adds app.otel.tracesSampler value to control sampler rendering. |
Comments suppressed due to low confidence (1)
pkg/utils/otel_test.go:369
- TestNewSampler_InvalidArg claims to verify fallback to cfg.TracesSampleRatio, but it only asserts the sampler is non-nil. To actually test the fallback, assert sampling behavior driven by cfg.TracesSampleRatio (e.g., set cfg ratio to 0.0 or 1.0 and verify ShouldSample result for a root span) when OTEL_TRACES_SAMPLER_ARG is invalid.
// TestNewSampler_InvalidArg verifies that an invalid OTEL_TRACES_SAMPLER_ARG
// falls back to cfg.TracesSampleRatio without panicking.
func TestNewSampler_InvalidArg(t *testing.T) {
cfg := OTelConfig{TracesSampleRatio: 0.5}
t.Setenv("OTEL_TRACES_SAMPLER", "parentbased_traceidratio")
t.Setenv("OTEL_TRACES_SAMPLER_ARG", "invalid")
s := newSampler(cfg)
if s == nil {
t.Error("newSampler returned nil for invalid OTEL_TRACES_SAMPLER_ARG")
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // newSampler creates a trace.Sampler from OTEL_TRACES_SAMPLER and | ||
| // OTEL_TRACES_SAMPLER_ARG environment variables, falling back to | ||
| // parentbased_traceidratio with cfg.TracesSampleRatio when unset. | ||
| // This ensures parent span sampling decisions are always honored. | ||
| func newSampler(cfg OTelConfig) trace.Sampler { | ||
| sampler := os.Getenv("OTEL_TRACES_SAMPLER") | ||
| arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG") | ||
|
|
||
| parseRatio := func() float64 { | ||
| if arg != "" { | ||
| r, err := strconv.ParseFloat(arg, 64) | ||
| if err == nil && r >= 0.0 && r <= 1.0 { | ||
| return r | ||
| } | ||
| slog.Warn("invalid OTEL_TRACES_SAMPLER_ARG, using TracesSampleRatio", "value", arg) | ||
| } | ||
| return cfg.TracesSampleRatio | ||
| } | ||
|
|
||
| switch sampler { | ||
| case "always_on": | ||
| return trace.AlwaysSample() | ||
| case "always_off": | ||
| return trace.NeverSample() | ||
| case "traceidratio": | ||
| return trace.TraceIDRatioBased(parseRatio()) | ||
| case "parentbased_always_on": | ||
| return trace.ParentBased(trace.AlwaysSample()) | ||
| case "parentbased_always_off": | ||
| return trace.ParentBased(trace.NeverSample()) | ||
| case "parentbased_traceidratio": | ||
| return trace.ParentBased(trace.TraceIDRatioBased(parseRatio())) | ||
| default: // empty/unknown → parent-based with configured ratio | ||
| return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) | ||
| } |
| // TestNewSampler verifies that newSampler returns a non-nil sampler for all | ||
| // supported OTEL_TRACES_SAMPLER values, including the default (empty) case. | ||
| func TestNewSampler(t *testing.T) { | ||
| cfg := OTelConfig{TracesSampleRatio: 0.5} | ||
|
|
||
| tests := []struct { | ||
| name string | ||
| sampler string | ||
| arg string | ||
| }{ | ||
| {"default (empty)", "", ""}, | ||
| {"always_on", "always_on", ""}, | ||
| {"always_off", "always_off", ""}, | ||
| {"traceidratio", "traceidratio", "0.5"}, | ||
| {"parentbased_always_on", "parentbased_always_on", ""}, | ||
| {"parentbased_always_off", "parentbased_always_off", ""}, | ||
| {"parentbased_traceidratio", "parentbased_traceidratio", "0.5"}, | ||
| {"unknown", "unknown", ""}, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| t.Setenv("OTEL_TRACES_SAMPLER", tt.sampler) | ||
| t.Setenv("OTEL_TRACES_SAMPLER_ARG", tt.arg) | ||
|
|
||
| s := newSampler(cfg) | ||
| if s == nil { | ||
| t.Errorf("newSampler(%q) returned nil", tt.sampler) | ||
| } | ||
| }) | ||
| } | ||
| } |
Summary
Fixes LFXV2-1734 — all Go service spans have
parentid: 0in Datadog becauseTraceIDRatioBasedmakes independent sampling decisions, ignoring incomingtraceparentheaders.Changes
pkg/utils/otel.go— Replace baretrace.TraceIDRatioBased(ratio)with a newnewSampler(cfg)function that:OTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARG(standard env vars)parentbased_traceidratiowhen unset — fixes trace continuity immediately with no config change requiredalways_on,always_off,traceidratio,parentbased_always_on,parentbased_always_off,parentbased_traceidratiocfg.TracesSampleRatio(fromOTEL_TRACES_SAMPLE_RATIO) whenOTEL_TRACES_SAMPLER_ARGis unset, preserving backward compatibilitypkg/utils/otel_test.go— AddedTestNewSamplerandTestNewSampler_InvalidArgcharts/lfx-v2-query-service/templates/deployment.yaml— RendersOTEL_TRACES_SAMPLERandOTEL_TRACES_SAMPLER_ARGenv vars whenapp.otel.tracesSampleris setcharts/lfx-v2-query-service/values.yaml— AddedtracesSampler: ""toapp.otelWhy
trace.TraceIDRatioBasedre-decides sampling per span regardless of the parent's sampling flag. When a UI trace propagates a sampledtraceparentheader to a Go backend, the backend was ignoring it and applying its own ratio (0.2 in dev/staging), dropping 80% of child spans. Wrapping withtrace.ParentBased(or usingparentbased_traceidratio) ensures child spans always follow the parent's decision.Issue: LFXV2-1734