Skip to content

fix(otel): honor parent span sampling decisions#59

Open
bramwelt wants to merge 1 commit into
mainfrom
fix/LFXV2-1734-parent-based-sampling
Open

fix(otel): honor parent span sampling decisions#59
bramwelt wants to merge 1 commit into
mainfrom
fix/LFXV2-1734-parent-based-sampling

Conversation

@bramwelt
Copy link
Copy Markdown
Contributor

Summary

Fixes LFXV2-1734 — all Go service spans have parentid: 0 in Datadog because TraceIDRatioBased makes independent sampling decisions, ignoring incoming traceparent headers.

Changes

pkg/utils/otel.go — Replace bare trace.TraceIDRatioBased(ratio) with a new newSampler(cfg) function that:

  • Reads OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG (standard env vars)
  • Defaults to parentbased_traceidratio when unset — fixes trace continuity immediately with no config change required
  • Supports all 6 OTEL sampler types
  • Falls back to cfg.TracesSampleRatio when OTEL_TRACES_SAMPLER_ARG is unset, preserving backward compatibility

pkg/utils/otel_test.go — Added TestNewSampler and TestNewSampler_InvalidArg

charts/lfx-v2-indexer-service/templates/deployment.yaml — Renders OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG env vars when app.otel.tracesSampler is set

charts/lfx-v2-indexer-service/values.yaml — Added tracesSampler: "" to app.otel

Why

trace.TraceIDRatioBased re-decides sampling per span regardless of the parent's sampling flag. Wrapping with parentbased_traceidratio ensures child spans always follow the parent's decision, restoring end-to-end trace continuity.

Issue: LFXV2-1734

Add newSampler() that reads OTEL_TRACES_SAMPLER and
OTEL_TRACES_SAMPLER_ARG. When unset, defaults to
parentbased_traceidratio so child spans always honor
the parent sampling flag, fixing broken trace continuity.

Add tracesSampler field to chart values.yaml and render
OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG env vars
in the deployment template.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Issue: LFXV2-1734
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
Copilot AI review requested due to automatic review settings May 13, 2026 16:09
@bramwelt bramwelt requested a review from a team as a code owner May 13, 2026 16:09
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

Walkthrough

This PR adds configurable OpenTelemetry trace sampling to the LFX v2 indexer service. A new newSampler() function in Go derives OTEL trace samplers from environment variables and configuration with validation and warnings for invalid settings. The trace provider is updated to use this sampler, and Helm configuration plus deployment templates are modified to inject sampler settings as environment variables.

Changes

OpenTelemetry Trace Sampler Configuration

Layer / File(s) Summary
Trace sampler implementation and validation
pkg/utils/otel.go, pkg/utils/otel_test.go
Added newSampler(cfg) to construct trace samplers from OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG environment variables, with ratio validation (0.0–1.0), warning logs for invalid config, and parent-based traceidratio fallback. Comprehensive tests verify sampler creation for known OTEL sampler values and confirm graceful handling of invalid arguments without panics.
Trace provider sampler integration
pkg/utils/otel.go
Modified newTraceProvider() to use newSampler(cfg) instead of hardcoded trace.TraceIDRatioBased(cfg.TracesSampleRatio), enabling environment-configured sampling modes and parent-based sampling logic.
Helm configuration and deployment
charts/lfx-v2-indexer-service/values.yaml, charts/lfx-v2-indexer-service/templates/deployment.yaml
Added app.otel.tracesSampler Helm value (empty by default) with documentation of supported sampler types; Deployment template now conditionally injects OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG environment variables.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing parent span sampling decisions in OpenTelemetry configuration.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the issue, changes, and rationale.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/LFXV2-1734-parent-based-sampling

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/utils/otel.go (1)

306-335: ⚡ Quick win

Warn on unknown sampler values before fallback.

At Line 333, unknown non-empty OTEL_TRACES_SAMPLER values silently fall back, which can hide config typos in production.

Proposed change
-func newSampler(cfg OTelConfig) trace.Sampler {
-	sampler := os.Getenv("OTEL_TRACES_SAMPLER")
+func newSampler(cfg OTelConfig) trace.Sampler {
+	sampler := strings.ToLower(strings.TrimSpace(os.Getenv("OTEL_TRACES_SAMPLER")))
 	arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG")
@@
-	default: // empty/unknown → parent-based with configured ratio
+	default: // empty/unknown → parent-based with configured ratio
+		if sampler != "" {
+			slog.Warn("unknown OTEL_TRACES_SAMPLER, using parentbased_traceidratio", "value", sampler)
+		}
 		return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/utils/otel.go` around lines 306 - 335, The switch over
OTEL_TRACES_SAMPLER silently falls back for unknown non-empty values; update the
logic in pkg/utils/otel.go (around the parseRatio func and the switch) to emit a
warning when sampler is non-empty and not one of the handled cases: log the
unrecognized sampler value and that you are falling back to parent-based
TraceIDRatioBased with cfg.TracesSampleRatio (use the same slog.Warn style as
the parseRatio warning) before returning the default
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/utils/otel.go`:
- Around line 306-335: The switch over OTEL_TRACES_SAMPLER silently falls back
for unknown non-empty values; update the logic in pkg/utils/otel.go (around the
parseRatio func and the switch) to emit a warning when sampler is non-empty and
not one of the handled cases: log the unrecognized sampler value and that you
are falling back to parent-based TraceIDRatioBased with cfg.TracesSampleRatio
(use the same slog.Warn style as the parseRatio warning) before returning the
default trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 87838ac1-18fc-4472-8072-be7604485493

📥 Commits

Reviewing files that changed from the base of the PR and between b24e82e and c634579.

📒 Files selected for processing (4)
  • charts/lfx-v2-indexer-service/templates/deployment.yaml
  • charts/lfx-v2-indexer-service/values.yaml
  • pkg/utils/otel.go
  • pkg/utils/otel_test.go

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes broken trace continuity by switching the Go SDK sampler from a standalone TraceIDRatioBased sampler to a parent-based sampler that honors upstream traceparent sampling decisions, and adds configuration hooks (code + Helm) to select OpenTelemetry sampler types via standard OTEL_TRACES_SAMPLER* env vars.

Changes:

  • Add newSampler(cfg) that supports the 6 standard OTEL sampler types and defaults to parentbased_traceidratio.
  • Update tracer provider initialization to use newSampler(cfg) instead of TraceIDRatioBased(cfg.TracesSampleRatio).
  • Add unit tests for newSampler and expose OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG via Helm values + deployment template.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
pkg/utils/otel.go Introduces newSampler and wires it into tracer provider creation to honor parent sampling decisions.
pkg/utils/otel_test.go Adds basic tests covering supported sampler env values and invalid arg handling.
charts/lfx-v2-indexer-service/templates/deployment.yaml Adds optional rendering of OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG env vars.
charts/lfx-v2-indexer-service/values.yaml Adds app.otel.tracesSampler value with documentation of supported sampler types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/utils/otel.go
Comment on lines +305 to +308
func newSampler(cfg OTelConfig) trace.Sampler {
sampler := os.Getenv("OTEL_TRACES_SAMPLER")
arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG")

Comment thread pkg/utils/otel.go
Comment on lines +311 to +315
r, err := strconv.ParseFloat(arg, 64)
if err == nil && r >= 0.0 && r <= 1.0 {
return r
}
slog.Warn("invalid OTEL_TRACES_SAMPLER_ARG, using TracesSampleRatio", "value", arg)
Comment thread pkg/utils/otel_test.go
Comment on lines +327 to +371
// TestNewSampler verifies that newSampler returns a non-nil sampler for all
// supported OTEL_TRACES_SAMPLER values, including the default (empty) case.
func TestNewSampler(t *testing.T) {
cfg := OTelConfig{TracesSampleRatio: 0.5}

tests := []struct {
name string
sampler string
arg string
}{
{"default (empty)", "", ""},
{"always_on", "always_on", ""},
{"always_off", "always_off", ""},
{"traceidratio", "traceidratio", "0.5"},
{"parentbased_always_on", "parentbased_always_on", ""},
{"parentbased_always_off", "parentbased_always_off", ""},
{"parentbased_traceidratio", "parentbased_traceidratio", "0.5"},
{"unknown", "unknown", ""},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Setenv("OTEL_TRACES_SAMPLER", tt.sampler)
t.Setenv("OTEL_TRACES_SAMPLER_ARG", tt.arg)

s := newSampler(cfg)
if s == nil {
t.Errorf("newSampler(%q) returned nil", tt.sampler)
}
})
}
}

// TestNewSampler_InvalidArg verifies that an invalid OTEL_TRACES_SAMPLER_ARG
// falls back to cfg.TracesSampleRatio without panicking.
func TestNewSampler_InvalidArg(t *testing.T) {
cfg := OTelConfig{TracesSampleRatio: 0.5}
t.Setenv("OTEL_TRACES_SAMPLER", "parentbased_traceidratio")
t.Setenv("OTEL_TRACES_SAMPLER_ARG", "invalid")

s := newSampler(cfg)
if s == nil {
t.Error("newSampler returned nil for invalid OTEL_TRACES_SAMPLER_ARG")
}
}
{{- if ne $otelTracesSampler "" }}
- name: OTEL_TRACES_SAMPLER
value: {{ $otelTracesSampler | quote }}
{{- if ne $otelTracesSampleRatio "" }}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants