Skip to content

fix(otel): honor parent span sampling decisions#51

Open
bramwelt wants to merge 1 commit into
mainfrom
fix/LFXV2-1734-parent-based-sampling
Open

fix(otel): honor parent span sampling decisions#51
bramwelt wants to merge 1 commit into
mainfrom
fix/LFXV2-1734-parent-based-sampling

Conversation

@bramwelt
Copy link
Copy Markdown
Contributor

Summary

Fixes LFXV2-1734 — all Go service spans have parentid: 0 in Datadog because TraceIDRatioBased makes independent sampling decisions, ignoring incoming traceparent headers.

Changes

pkg/utils/otel.go — Replace bare trace.TraceIDRatioBased(ratio) with a new newSampler(cfg) function that:

  • Reads OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG (standard env vars)
  • Defaults to parentbased_traceidratio when unset — fixes trace continuity immediately with no config change required
  • Supports all 6 OTEL sampler types: always_on, always_off, traceidratio, parentbased_always_on, parentbased_always_off, parentbased_traceidratio
  • Falls back to cfg.TracesSampleRatio (from OTEL_TRACES_SAMPLE_RATIO) when OTEL_TRACES_SAMPLER_ARG is unset, preserving backward compatibility

pkg/utils/otel_test.go — Added TestNewSampler and TestNewSampler_InvalidArg

charts/lfx-v2-query-service/templates/deployment.yaml — Renders OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG env vars when app.otel.tracesSampler is set

charts/lfx-v2-query-service/values.yaml — Added tracesSampler: "" to app.otel

Why

trace.TraceIDRatioBased re-decides sampling per span regardless of the parent's sampling flag. When a UI trace propagates a sampled traceparent header to a Go backend, the backend was ignoring it and applying its own ratio (0.2 in dev/staging), dropping 80% of child spans. Wrapping with trace.ParentBased (or using parentbased_traceidratio) ensures child spans always follow the parent's decision.

Issue: LFXV2-1734

Add newSampler() that reads OTEL_TRACES_SAMPLER and
OTEL_TRACES_SAMPLER_ARG. When unset, defaults to
parentbased_traceidratio so child spans always honor
the parent sampling flag, fixing broken trace continuity.

Add tracesSampler field to chart values.yaml and render
OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG env vars
in the deployment template.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Issue: LFXV2-1734
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Trevor Bramwell <tbramwell@linuxfoundation.org>
@bramwelt bramwelt requested a review from a team as a code owner May 13, 2026 16:08
Copilot AI review requested due to automatic review settings May 13, 2026 16:08
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Review Change Stack

Walkthrough

This PR extends OpenTelemetry trace sampling configuration by reading environment-driven sampler modes and arguments, with fallback to config-based ratios. Helm values and deployment template conditionally inject sampler environment variables, while the Go implementation processes these via a new newSampler function that replaces the hardcoded ratio-based sampler.

Changes

OpenTelemetry Sampler Configuration

Layer / File(s) Summary
Configuration schema and template injection
charts/lfx-v2-query-service/values.yaml, charts/lfx-v2-query-service/templates/deployment.yaml
Helm values introduce app.otel.tracesSampler (empty string default), and deployment template conditionally injects OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG environment variables when sampler config is set.
Sampler implementation and integration
pkg/utils/otel.go
New newSampler(cfg) function constructs trace.Sampler from OTEL_TRACES_SAMPLER/OTEL_TRACES_SAMPLER_ARG environment variables with fallback to config-based TracesSampleRatio; updated newTraceProvider to use environment-driven sampler instead of hardcoded ratio-based behavior.
Sampler validation tests
pkg/utils/otel_test.go
Two unit tests verify newSampler returns non-nil samplers across supported sampler types and argument configurations, and confirms graceful fallback on invalid arguments without panicking.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: enabling parent span sampling decisions in OpenTelemetry configuration to fix trace continuity.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the issue, solution, and implementation details across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/LFXV2-1734-parent-based-sampling

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/utils/otel.go`:
- Around line 333-335: The switch's default branch silently falls back to
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) when
OTEL_TRACES_SAMPLER is empty or unknown; change this so that if the raw
OTEL_TRACES_SAMPLER value is non-empty but not recognized, you emit a warning
via the package's logger before returning the ParentBased(...) fallback. Locate
the code that reads/handles OTEL_TRACES_SAMPLER (the switch that now returns
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))) and add a
conditional to check the original sampler string: if it's non-empty and
unmatched, call the existing logger.Warn/Warnf with a concise message including
the unrecognized value, then proceed to return the existing ParentBased
fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 094fa028-46e1-4c6b-b9c6-e87c478566c9

📥 Commits

Reviewing files that changed from the base of the PR and between 7575c2f and 32177d9.

📒 Files selected for processing (4)
  • charts/lfx-v2-query-service/templates/deployment.yaml
  • charts/lfx-v2-query-service/values.yaml
  • pkg/utils/otel.go
  • pkg/utils/otel_test.go

Comment thread pkg/utils/otel.go
Comment on lines +333 to +335
default: // empty/unknown → parent-based with configured ratio
return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Warn when OTEL_TRACES_SAMPLER is unknown instead of silently defaulting.

Right now unknown values and empty values share the same fallback path, which can mask misconfiguration in production. Add a warning for non-empty unknown values before falling back.

Suggested change
 	default: // empty/unknown → parent-based with configured ratio
+		if sampler != "" {
+			slog.Warn("unknown OTEL_TRACES_SAMPLER, falling back to parentbased_traceidratio",
+				"value", sampler,
+				"fallback-ratio", cfg.TracesSampleRatio,
+			)
+		}
 		return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
default: // empty/unknown → parent-based with configured ratio
return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
}
default: // empty/unknown → parent-based with configured ratio
if sampler != "" {
slog.Warn("unknown OTEL_TRACES_SAMPLER, falling back to parentbased_traceidratio",
"value", sampler,
"fallback-ratio", cfg.TracesSampleRatio,
)
}
return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/utils/otel.go` around lines 333 - 335, The switch's default branch
silently falls back to
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio)) when
OTEL_TRACES_SAMPLER is empty or unknown; change this so that if the raw
OTEL_TRACES_SAMPLER value is non-empty but not recognized, you emit a warning
via the package's logger before returning the ParentBased(...) fallback. Locate
the code that reads/handles OTEL_TRACES_SAMPLER (the switch that now returns
trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))) and add a
conditional to check the original sampler string: if it's non-empty and
unmatched, call the existing logger.Warn/Warnf with a concise message including
the unrecognized value, then proceed to return the existing ParentBased
fallback.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the service’s OpenTelemetry tracing configuration to honor upstream sampling decisions (e.g., from incoming traceparent headers) by switching to a parent-based sampler by default, improving trace continuity in Datadog.

Changes:

  • Introduces newSampler(cfg) to select an OTEL sampler based on OTEL_TRACES_SAMPLER / OTEL_TRACES_SAMPLER_ARG, defaulting to parentbased_traceidratio.
  • Updates tracer provider initialization to use the new sampler selection logic.
  • Adds Helm values/rendering for OTEL_TRACES_SAMPLER (+ arg) and adds unit tests for sampler creation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/utils/otel.go Adds environment-driven sampler selection and applies it to the tracer provider to honor parent sampling.
pkg/utils/otel_test.go Adds tests for sampler creation and invalid sampler arg handling.
charts/lfx-v2-query-service/templates/deployment.yaml Optionally renders OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG from Helm values.
charts/lfx-v2-query-service/values.yaml Adds app.otel.tracesSampler value to control sampler rendering.
Comments suppressed due to low confidence (1)

pkg/utils/otel_test.go:369

  • TestNewSampler_InvalidArg claims to verify fallback to cfg.TracesSampleRatio, but it only asserts the sampler is non-nil. To actually test the fallback, assert sampling behavior driven by cfg.TracesSampleRatio (e.g., set cfg ratio to 0.0 or 1.0 and verify ShouldSample result for a root span) when OTEL_TRACES_SAMPLER_ARG is invalid.
// TestNewSampler_InvalidArg verifies that an invalid OTEL_TRACES_SAMPLER_ARG
// falls back to cfg.TracesSampleRatio without panicking.
func TestNewSampler_InvalidArg(t *testing.T) {
	cfg := OTelConfig{TracesSampleRatio: 0.5}
	t.Setenv("OTEL_TRACES_SAMPLER", "parentbased_traceidratio")
	t.Setenv("OTEL_TRACES_SAMPLER_ARG", "invalid")

	s := newSampler(cfg)
	if s == nil {
		t.Error("newSampler returned nil for invalid OTEL_TRACES_SAMPLER_ARG")
	}
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/utils/otel.go
Comment on lines +301 to +335
// newSampler creates a trace.Sampler from OTEL_TRACES_SAMPLER and
// OTEL_TRACES_SAMPLER_ARG environment variables, falling back to
// parentbased_traceidratio with cfg.TracesSampleRatio when unset.
// This ensures parent span sampling decisions are always honored.
func newSampler(cfg OTelConfig) trace.Sampler {
sampler := os.Getenv("OTEL_TRACES_SAMPLER")
arg := os.Getenv("OTEL_TRACES_SAMPLER_ARG")

parseRatio := func() float64 {
if arg != "" {
r, err := strconv.ParseFloat(arg, 64)
if err == nil && r >= 0.0 && r <= 1.0 {
return r
}
slog.Warn("invalid OTEL_TRACES_SAMPLER_ARG, using TracesSampleRatio", "value", arg)
}
return cfg.TracesSampleRatio
}

switch sampler {
case "always_on":
return trace.AlwaysSample()
case "always_off":
return trace.NeverSample()
case "traceidratio":
return trace.TraceIDRatioBased(parseRatio())
case "parentbased_always_on":
return trace.ParentBased(trace.AlwaysSample())
case "parentbased_always_off":
return trace.ParentBased(trace.NeverSample())
case "parentbased_traceidratio":
return trace.ParentBased(trace.TraceIDRatioBased(parseRatio()))
default: // empty/unknown → parent-based with configured ratio
return trace.ParentBased(trace.TraceIDRatioBased(cfg.TracesSampleRatio))
}
Comment thread pkg/utils/otel_test.go
Comment on lines +325 to +356
// TestNewSampler verifies that newSampler returns a non-nil sampler for all
// supported OTEL_TRACES_SAMPLER values, including the default (empty) case.
func TestNewSampler(t *testing.T) {
cfg := OTelConfig{TracesSampleRatio: 0.5}

tests := []struct {
name string
sampler string
arg string
}{
{"default (empty)", "", ""},
{"always_on", "always_on", ""},
{"always_off", "always_off", ""},
{"traceidratio", "traceidratio", "0.5"},
{"parentbased_always_on", "parentbased_always_on", ""},
{"parentbased_always_off", "parentbased_always_off", ""},
{"parentbased_traceidratio", "parentbased_traceidratio", "0.5"},
{"unknown", "unknown", ""},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Setenv("OTEL_TRACES_SAMPLER", tt.sampler)
t.Setenv("OTEL_TRACES_SAMPLER_ARG", tt.arg)

s := newSampler(cfg)
if s == nil {
t.Errorf("newSampler(%q) returned nil", tt.sampler)
}
})
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants