fix(otel): honor parent span sampling decisions#47
Conversation
Add newSampler() that reads OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG. When unset, defaults to parentbased_traceidratio so child spans always honor the parent sampling flag, fixing broken trace continuity. Add tracesSampler field to chart values.yaml and render OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG env vars in the deployment template. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-1734 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates the service’s OpenTelemetry tracing setup to honor upstream/parent span sampling decisions (fixing broken trace continuity where spans appear with parentid: 0 in Datadog) by defaulting to parent-based sampling, while still supporting explicit sampler configuration via standard OTEL env vars.
Changes:
- Added
newSampler(cfg)to build atrace.SamplerfromOTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARG, defaulting toparentbased_traceidratiowith the configured ratio. - Updated trace provider construction to use
newSampler(cfg)instead of a bareTraceIDRatioBased. - Extended Helm chart values/template to optionally render
OTEL_TRACES_SAMPLER(+ arg), and added unit tests for the new sampler selection logic.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/utils/otel.go | Introduces env-driven sampler selection and switches tracer provider sampling to parent-based by default. |
| pkg/utils/otel_test.go | Adds tests covering newSampler supported values and invalid-arg handling. |
| charts/lfx-v2-auth-service/templates/deployment.yaml | Optionally injects OTEL sampler env vars into the Deployment when configured via values. |
| charts/lfx-v2-auth-service/values.yaml | Adds app.otel.tracesSampler chart value and documents supported sampler strings. |
Comments suppressed due to low confidence (1)
pkg/utils/otel_test.go:566
TestNewSampler_InvalidArgonly checks for non-nil, so it doesn’t actually verify the fallback behavior described in the comment (usingcfg.TracesSampleRatio). Consider asserting the sampler’s effective ratio by exercisingShouldSamplewith a root span (no parent) and checking the decision matches whatcfg.TracesSampleRatiowould produce.
// TestNewSampler_InvalidArg verifies that an invalid OTEL_TRACES_SAMPLER_ARG
// falls back to cfg.TracesSampleRatio without panicking.
func TestNewSampler_InvalidArg(t *testing.T) {
cfg := OTelConfig{TracesSampleRatio: 0.5}
t.Setenv("OTEL_TRACES_SAMPLER", "parentbased_traceidratio")
t.Setenv("OTEL_TRACES_SAMPLER_ARG", "invalid")
s := newSampler(cfg)
if s == nil {
t.Error("newSampler returned nil for invalid OTEL_TRACES_SAMPLER_ARG")
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // newSampler creates a trace.Sampler from OTEL_TRACES_SAMPLER and | ||
| // OTEL_TRACES_SAMPLER_ARG environment variables, falling back to | ||
| // parentbased_traceidratio with cfg.TracesSampleRatio when unset. | ||
| // This ensures parent span sampling decisions are always honored. | ||
| func newSampler(cfg OTelConfig) trace.Sampler { | ||
| sampler := os.Getenv("OTEL_TRACES_SAMPLER") | ||
| arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG") | ||
|
|
||
| parseRatio := func() float64 { | ||
| if arg != "" { | ||
| r, err := strconv.ParseFloat(arg, 64) | ||
| if err == nil && r >= 0.0 && r <= 1.0 { | ||
| return r | ||
| } | ||
| slog.Warn("invalid OTEL_TRACES_SAMPLER_ARG, using TracesSampleRatio", "value", arg) | ||
| } | ||
| return cfg.TracesSampleRatio | ||
| } | ||
|
|
||
| switch sampler { | ||
| case "always_on": | ||
| return trace.AlwaysSample() | ||
| case "always_off": | ||
| return trace.NeverSample() | ||
| case "traceidratio": | ||
| return trace.TraceIDRatioBased(parseRatio()) | ||
| case "parentbased_always_on": | ||
| return trace.ParentBased(trace.AlwaysSample()) | ||
| case "parentbased_always_off": | ||
| return trace.ParentBased(trace.NeverSample()) | ||
| case "parentbased_traceidratio": | ||
| return trace.ParentBased(trace.TraceIDRatioBased(parseRatio())) | ||
| default: // empty/unknown → parent-based with configured ratio | ||
| return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) | ||
| } |
| if err == nil && r >= 0.0 && r <= 1.0 { | ||
| return r | ||
| } | ||
| slog.Warn("invalid OTEL_TRACES_SAMPLER_ARG, using TracesSampleRatio", "value", arg) |
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| t.Setenv("OTEL_TRACES_SAMPLER", tt.sampler) | ||
| t.Setenv("OTEL_TRACES_SAMPLER_ARG", tt.arg) | ||
|
|
||
| s := newSampler(cfg) | ||
| if s == nil { | ||
| t.Errorf("newSampler(%q) returned nil", tt.sampler) | ||
| } |
| - name: OTEL_TRACES_SAMPLER | ||
| value: {{ $otelTracesSampler | quote }} | ||
| {{- if ne $otelTracesSampleRatio "" }} |
Summary
Fixes LFXV2-1734 — all Go service spans have
parentid: 0in Datadog becauseTraceIDRatioBasedmakes independent sampling decisions, ignoring incomingtraceparentheaders.Changes
pkg/utils/otel.go— Replace baretrace.TraceIDRatioBased(ratio)with a newnewSampler(cfg)function that:OTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARG(standard env vars)parentbased_traceidratiowhen unset — fixes trace continuity immediately with no config change requiredcfg.TracesSampleRatiowhenOTEL_TRACES_SAMPLER_ARGis unset, preserving backward compatibilitypkg/utils/otel_test.go— AddedTestNewSamplerandTestNewSampler_InvalidArgcharts/lfx-v2-auth-service/templates/deployment.yaml— RendersOTEL_TRACES_SAMPLERandOTEL_TRACES_SAMPLER_ARGenv vars whenapp.otel.tracesSampleris setcharts/lfx-v2-auth-service/values.yaml— AddedtracesSampler: ""toapp.otelWhy
trace.TraceIDRatioBasedre-decides sampling per span regardless of the parent's sampling flag. Wrapping withparentbased_traceidratioensures child spans always follow the parent's decision, restoring end-to-end trace continuity.Issue: LFXV2-1734