Introduce load-balanced channel for OpenTelemetry exporters#2175
Introduce load-balanced channel for OpenTelemetry exporters#2175dkostyrev wants to merge 1 commit into
Conversation
eb9bfe9 to
b86f6d1
Compare
b86f6d1 to
9533feb
Compare
9533feb to
6406230
Compare
|
@dkostyrev Than you for this awesome PR. I have finally had a bit of time to explore it and now I understand quite a bit. This is fantastic. |
|
@amankrx when you get a chance, please help with the merge conflict here. |
| .block_on(async { | ||
| // The OTLP exporters need to run in a Tokio context. | ||
| spawn!("init tracing", async { init_tracing() }) | ||
| spawn!("init tracing", async { init_tracing().await }) |
|
That coverage issue is interesting. |
29cb3d7 to
dff32cd
Compare
31d41d4 to
892ecee
Compare
892ecee to
1eb7f46
Compare
d6a38f4 to
9e6ddec
Compare
7a99137 to
300e6c5
Compare
5927abc to
7005eab
Compare
8b3d6c2 to
6f77c5a
Compare
6f77c5a to
35001bc
Compare
|
@dkostyrev is attempting to deploy a commit to the native-link-web-assets Team on Vercel. A member of the Team first needs to authorize it. |
f4fc7d3 to
89f99fa
Compare
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
0bd5066 to
e9a95c0
Compare
Add client-side load balancing to OTLP gRPC connections using ginepro. When NL_OTEL_ENDPOINT is set, the telemetry system creates a load-balanced channel shared across log, trace, and metric exporters. This enables better distribution of telemetry traffic across multiple OTLP collector instances and improves overall system resilience. - Add ginepro dependency for gRPC load balancing - Upgrade OpenTelemetry dependencies from 0.29 to 0.30 - Change init_tracing() to async to support channel initialization - Add NL_OTEL_ENDPOINT environment variable for configuration - Update all OTLP exporters to use shared load-balanced channel # Conflicts: # Cargo.lock
e9a95c0 to
6e02626
Compare
| // so concurrent writes would be a data race on Unix. The guard is always | ||
| // scoped so it drops before any `.await`, which satisfies | ||
| // `clippy::await_holding_lock`. | ||
| static ENV_LOCK: Mutex<()> = Mutex::new(()); |
There was a problem hiding this comment.
We're already using serial_test and using that instead would get the same effect. Only need to tag as #[serial] the env-editing test methods (or possibly #[serial(env)] just to mark them as a unified group).
| ); | ||
| } | ||
|
|
||
| #[ignore = "requires DNS resolution (/etc/resolv.conf); not available in sandboxed builds"] |
There was a problem hiding this comment.
Shouldn't this get run in some cases? e.g. direct cargo runs?
| reason = "`tokio::test` uses `tokio::runtime::Builder::new_multi_thread` and \ | ||
| `tokio::runtime::Runtime::block_on` internally" | ||
| )] | ||
| #[tokio::test(flavor = "multi_thread", worker_threads = 2)] |
There was a problem hiding this comment.
Please use nativelink_test. It'll accept arbitrary arguments to tokio::test internally
| } | ||
|
|
||
| #[nativelink_test] | ||
| async fn metrics_are_tracked() -> Result<(), Box<dyn core::error::Error>> { |
There was a problem hiding this comment.
This is an excellent start. Can you run it both with and without NL_OTEL_ENDPOINT so we can check both do the same thing?
Summary
This PR introduces client-side load balancing for OpenTelemetry (OTLP) gRPC connections using the ginepro library. When the
NL_OTEL_ENDPOINT(name to be discussed, maybe boolean flag?) environment variable is set, NativeLink will create a load-balanced channel for exporting logs, traces, and metrics, distributing requests across multiple backend endpoints resolved via DNS. This change allows to distribute OTLP traffic across multiple OTLP collector instances.Changes
Load-balanced OTLP exports:
gineprodependency to provide client-side load balancing for gRPC channels used by OpenTelemetry exportersNL_OTEL_ENDPOINTenvironment variable to configure the OTLP endpoint for load-balanced connectionsinit_tracing()from synchronous to async to support balanced channel initializationThis change is