fix(otel): honor parent span sampling decisions#59
Conversation
Add newSampler() that reads OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG. When unset, defaults to parentbased_traceidratio so child spans always honor the parent sampling flag, fixing broken trace continuity. Add tracesSampler field to chart values.yaml and render OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG env vars in the deployment template. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Issue: LFXV2-1734 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
WalkthroughThis PR adds configurable OpenTelemetry trace sampling to the LFX v2 indexer service. A new ChangesOpenTelemetry Trace Sampler Configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/utils/otel.go (1)
306-335: ⚡ Quick winWarn on unknown sampler values before fallback.
At Line 333, unknown non-empty
OTEL_TRACES_SAMPLERvalues silently fall back, which can hide config typos in production.Proposed change
-func newSampler(cfg OTelConfig) trace.Sampler { - sampler := os.Getenv("OTEL_TRACES_SAMPLER") +func newSampler(cfg OTelConfig) trace.Sampler { + sampler := strings.ToLower(strings.TrimSpace(os.Getenv("OTEL_TRACES_SAMPLER"))) arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG") @@ - default: // empty/unknown → parent-based with configured ratio + default: // empty/unknown → parent-based with configured ratio + if sampler != "" { + slog.Warn("unknown OTEL_TRACES_SAMPLER, using parentbased_traceidratio", "value", sampler) + } return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/utils/otel.go` around lines 306 - 335, The switch over OTEL_TRACES_SAMPLER silently falls back for unknown non-empty values; update the logic in pkg/utils/otel.go (around the parseRatio func and the switch) to emit a warning when sampler is non-empty and not one of the handled cases: log the unrecognized sampler value and that you are falling back to parent-based TraceIDRatioBased with cfg.TracesSampleRatio (use the same slog.Warn style as the parseRatio warning) before returning the default trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/utils/otel.go`:
- Around line 306-335: The switch over OTEL_TRACES_SAMPLER silently falls back
for unknown non-empty values; update the logic in pkg/utils/otel.go (around the
parseRatio func and the switch) to emit a warning when sampler is non-empty and
not one of the handled cases: log the unrecognized sampler value and that you
are falling back to parent-based TraceIDRatioBased with cfg.TracesSampleRatio
(use the same slog.Warn style as the parseRatio warning) before returning the
default trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 87838ac1-18fc-4472-8072-be7604485493
📒 Files selected for processing (4)
charts/lfx-v2-indexer-service/templates/deployment.yamlcharts/lfx-v2-indexer-service/values.yamlpkg/utils/otel.gopkg/utils/otel_test.go
There was a problem hiding this comment.
Pull request overview
This PR fixes broken trace continuity by switching the Go SDK sampler from a standalone TraceIDRatioBased sampler to a parent-based sampler that honors upstream traceparent sampling decisions, and adds configuration hooks (code + Helm) to select OpenTelemetry sampler types via standard OTEL_TRACES_SAMPLER* env vars.
Changes:
- Add
newSampler(cfg)that supports the 6 standard OTEL sampler types and defaults toparentbased_traceidratio. - Update tracer provider initialization to use
newSampler(cfg)instead ofTraceIDRatioBased(cfg.TracesSampleRatio). - Add unit tests for
newSamplerand exposeOTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARGvia Helm values + deployment template.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/utils/otel.go | Introduces newSampler and wires it into tracer provider creation to honor parent sampling decisions. |
| pkg/utils/otel_test.go | Adds basic tests covering supported sampler env values and invalid arg handling. |
| charts/lfx-v2-indexer-service/templates/deployment.yaml | Adds optional rendering of OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG env vars. |
| charts/lfx-v2-indexer-service/values.yaml | Adds app.otel.tracesSampler value with documentation of supported sampler types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func newSampler(cfg OTelConfig) trace.Sampler { | ||
| sampler := os.Getenv("OTEL_TRACES_SAMPLER") | ||
| arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG") | ||
|
|
| r, err := strconv.ParseFloat(arg, 64) | ||
| if err == nil && r >= 0.0 && r <= 1.0 { | ||
| return r | ||
| } | ||
| slog.Warn("invalid OTEL_TRACES_SAMPLER_ARG, using TracesSampleRatio", "value", arg) |
| // TestNewSampler verifies that newSampler returns a non-nil sampler for all | ||
| // supported OTEL_TRACES_SAMPLER values, including the default (empty) case. | ||
| func TestNewSampler(t *testing.T) { | ||
| cfg := OTelConfig{TracesSampleRatio: 0.5} | ||
|
|
||
| tests := []struct { | ||
| name string | ||
| sampler string | ||
| arg string | ||
| }{ | ||
| {"default (empty)", "", ""}, | ||
| {"always_on", "always_on", ""}, | ||
| {"always_off", "always_off", ""}, | ||
| {"traceidratio", "traceidratio", "0.5"}, | ||
| {"parentbased_always_on", "parentbased_always_on", ""}, | ||
| {"parentbased_always_off", "parentbased_always_off", ""}, | ||
| {"parentbased_traceidratio", "parentbased_traceidratio", "0.5"}, | ||
| {"unknown", "unknown", ""}, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| t.Setenv("OTEL_TRACES_SAMPLER", tt.sampler) | ||
| t.Setenv("OTEL_TRACES_SAMPLER_ARG", tt.arg) | ||
|
|
||
| s := newSampler(cfg) | ||
| if s == nil { | ||
| t.Errorf("newSampler(%q) returned nil", tt.sampler) | ||
| } | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| // TestNewSampler_InvalidArg verifies that an invalid OTEL_TRACES_SAMPLER_ARG | ||
| // falls back to cfg.TracesSampleRatio without panicking. | ||
| func TestNewSampler_InvalidArg(t *testing.T) { | ||
| cfg := OTelConfig{TracesSampleRatio: 0.5} | ||
| t.Setenv("OTEL_TRACES_SAMPLER", "parentbased_traceidratio") | ||
| t.Setenv("OTEL_TRACES_SAMPLER_ARG", "invalid") | ||
|
|
||
| s := newSampler(cfg) | ||
| if s == nil { | ||
| t.Error("newSampler returned nil for invalid OTEL_TRACES_SAMPLER_ARG") | ||
| } | ||
| } |
| {{- if ne $otelTracesSampler "" }} | ||
| - name: OTEL_TRACES_SAMPLER | ||
| value: {{ $otelTracesSampler | quote }} | ||
| {{- if ne $otelTracesSampleRatio "" }} |
Summary
Fixes LFXV2-1734 — all Go service spans have
parentid: 0in Datadog becauseTraceIDRatioBasedmakes independent sampling decisions, ignoring incomingtraceparentheaders.Changes
pkg/utils/otel.go— Replace baretrace.TraceIDRatioBased(ratio)with a newnewSampler(cfg)function that:OTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARG(standard env vars)parentbased_traceidratiowhen unset — fixes trace continuity immediately with no config change requiredcfg.TracesSampleRatiowhenOTEL_TRACES_SAMPLER_ARGis unset, preserving backward compatibilitypkg/utils/otel_test.go— AddedTestNewSamplerandTestNewSampler_InvalidArgcharts/lfx-v2-indexer-service/templates/deployment.yaml— RendersOTEL_TRACES_SAMPLERandOTEL_TRACES_SAMPLER_ARGenv vars whenapp.otel.tracesSampleris setcharts/lfx-v2-indexer-service/values.yaml— AddedtracesSampler: ""toapp.otelWhy
trace.TraceIDRatioBasedre-decides sampling per span regardless of the parent's sampling flag. Wrapping withparentbased_traceidratioensures child spans always follow the parent's decision, restoring end-to-end trace continuity.Issue: LFXV2-1734