Skip to content

chore: nearby voice chat | audio stream deduplication#8817

Draft
popuz wants to merge 24 commits into
devfrom
chore/nearby-voice-chat/audio-stream-deduplication
Draft

chore: nearby voice chat | audio stream deduplication#8817
popuz wants to merge 24 commits into
devfrom
chore/nearby-voice-chat/audio-stream-deduplication

Conversation

@popuz
Copy link
Copy Markdown
Collaborator

@popuz popuz commented May 19, 2026

Pull Request Description

What does this PR change?

Fixes the long-standing "ghost stream" pathology in Nearby Voice Chat — the case where LiveKit silently fails to deliver a track-unsubscribe event and the Explorer keeps stale audio sources around for the same participant. Until now multiple LivekitAudioSource instances could exist for one speaker, of which at most one carried real audio; the rest played silence but still consumed Unity audio voices, position-system updates, and spatialisation budget. Visible symptoms: another player sounds choppy, silent, or "missing" even though they are clearly speaking.

The fix turns the registry into the authoritative oracle for which sid is alive right now:

  1. LiveKit fork — each AudioStream exposes LastFrameReceivedAt (a monotonic tick written from inside the FFI frame callback). Ghosts never decode a frame, so their tick stays at the sentinel value.
  2. INearbyAudioStreamRegistry — new resolver: GetActiveSid(walletId) returns the sid with the freshest frame tick (single active candidate per participant); IsActiveSid(StreamKey) is the cleanup predicate. Wrap-safe arithmetic, allocation-free, no memoisation across ticks.
  3. ECS collapseNearbyAudioStreamerComponent drops the string[] snapshot in favour of a single string CurrentSid. NearbyAudioSourceComponent loses its AvatarEntity back-reference and now lives directly on the avatar entity (no separate audio entity, no cross-entity transform lookup, no second World.Create).
  4. SystemsNearbyLivekitBridgeSystem polls the resolver each tick (add / refresh / remove branches with an "all-zeros" guard so a freshly-reconnected speaker isn't dropped before the first frame). NearbyAudioBindingSystem reads CurrentSid only — no foreach over a sids array. NearbyAudioPositionSystem becomes a single same-entity query. NearbyAudioCleanupSystem ditches the "sid disappeared from snapshot" predicate, reaps demoted ghost sids via !IsActiveSid, and inlines the bulk-remove path into the per-entity query.
  5. Tests — full coverage rewrite for NearbyAudioStreamRegistryShould, NearbyAudioBindingSystemShould, NearbyAudioCleanupSystemShould, NearbyLivekitBridgeSystemShould against the new contract; performance harness updated for the 1:1 layout.

Test Instructions

metaforge explorer run 8817

Prerequisites

  • Two test accounts (use a second device, an alt browser profile, or metaforge account create --clear on a second machine).
  • Working microphone on the speaker account; headphones recommended on the listener side to catch overlap / phantom playback.
  • In-world voice chat enabled in settings on both sides (no scene-level NearbyVoiceChat restriction).

Test Steps

1. Baseline — two avatars speaking, no overlap

  1. Account A and account B teleport to the same parcel and stand within ~5 m of each other.
  2. A speaks for ~10 s, then B speaks for ~10 s, alternating.
  3. Expected: each voice is heard clearly with no echo, no phantom second-source overlap, and no choppy drop-outs. Spatial direction matches the speaker's position when you rotate the camera.

2. Reconnect — ghost-sid eradication (primary fix scenario)

  1. A and B stand within 5 m. A speaks — verify B hears A normally.
  2. Force-disconnect A from the room: kill the network on A's machine (toggle Wi-Fi, or use OS-level offline mode) for ~10 s, then bring it back. A's client will reconnect automatically.
  3. Within 2–3 s after reconnect, A starts speaking again.
  4. Expected: B hears A's voice on the new stream only — no overlapping playback of the pre-reconnect track, no doubled / robot-like timbre, no silent dead-air window longer than ~1 frame. (Repeat 3× — this race is non-deterministic.)

3. Range — audible region join / leave

  1. A stands still and speaks continuously (use a looped voice clip if available, or just talk).
  2. B walks slowly away from A past ~22 m, then back inside the range.
  3. Expected: voice silences smoothly when B crosses the outer threshold and resumes when B re-enters. No residual playback of the previous source after stepping out; no ghost source remains audible.

4. Block — mid-conversation user block

  1. A is speaking, B can hear them.
  2. B opens A's user profile from the passport / friends panel and blocks A.
  3. Expected: A's voice silences for B within ~1 frame. Unblock → voice resumes the next time A speaks.

5. Mute toggle per participant

  1. A is speaking, B can hear them.
  2. B opens the voice-chat side panel and toggles "mute" for A.
  3. Expected: A's voice silences for B immediately while A remains visible as a speaker. Toggle back — voice resumes.

6. Output device change

  1. A is speaking, B can hear them through Device 1 (e.g. speakers).
  2. B unplugs Device 1 and switches to Device 2 (e.g. headphones) via Windows / macOS sound settings.
  3. Expected: within ~1 s A's voice resumes through Device 2, no artifacts, no double-audio.

7. Suppression gates — call / loading / scene gate

  1. A is speaking, B can hear them.
  2. Trigger one of the suppression sources in turn:
    • B initiates a private call (call status active).
    • B teleports to a new scene (loading screen visible).
    • B enters a scene whose NearbyVoiceChatEnabled flag is false (use any test scene that disables it).
  3. Expected: A's voice silences for B while the gate is active and resumes immediately once the gate releases. No bleed-through during the suppressed window.

8. Crowd — many speakers nearby

  1. Use the multi-bot harness (metaforge) or coordinate 4+ accounts at the same parcel.
  2. Have at least 3 of them speak simultaneously for ~30 s.
  3. Expected: every speaker is heard, each spatialised to their world position; no participant becomes silent or "ghosted out"; no audio-source slot leak (frame-rate should be stable across the window).

9. Avatar leaves / reappears

  1. A is speaking near B.
  2. A teleports far away (out of B's parcel range), then teleports back close to B and speaks again.
  3. Expected: voice cleanly stops when A leaves, cleanly resumes when A returns. No phantom audio playing from A's previous location.

10. Self-suppression sanity

  1. A is in a room alone, speaks into the microphone.
  2. Expected: A does not hear their own voice echoed back (Explorer-side filter on local participant must keep working — this PR does not change that path, but verify it didn't regress).

Additional Testing Notes

  • The ghost-sid case (Test 2) is the headline fix — please run it multiple times under different network conditions (clean disconnect, flaky Wi-Fi, VPN drop).
  • If you observe a 1–2 frame silence on a freshly-reconnected speaker before their voice resumes, that is expected — the resolver waits for the first decoded frame on a new candidate before promoting it.

Quality Checklist

  • Changes have been tested locally
  • Documentation has been updated (N/A — internal refactor; spec lives in docs/superpowers/specs/)
  • Performance impact has been considered (1:1 mapping reduces per-tick entity count for crowded scenes; allocation-free hot paths preserved)
  • For SDK features: Test scene is included (N/A — not an SDK feature)

Code Review Reference

Please review our Branch & PR Standards before submitting.

popuz added 4 commits May 18, 2026 22:35
  Single active sid per participant — every ECS consumer now reads through
  the frame-activity resolver (slice 2) instead of the per-identity sid array:

  - NearbyAudioStreamerComponent: string[] StreamSidsSnapshot → string CurrentSid
  - NearbyLivekitBridgeSystem: Add / Refresh (ref-mutate) / Remove around
    GetActiveSid, with HasAudioStream as the all-zeros-window guard
  - NearbyAudioBindingSystem: iterates CurrentSid (one sid per avatar)
  - NearbyAudioCleanupSystem: reap predicate flipped to !IsActiveSid —
    reaps both evicted and demoted (ghost) sids
  - NearbyVoiceChatDebugSystem: audio-source counter via HasAudioStream
  - INearbyAudioStreamRegistry: deprecated GetAudioSidsArray / IsStreamGone
    removed; resolver is the only read surface

  EditMode tests migrated to the new API; T6–T9 from the spec land here:
  - Bridge attach on null → sid
  - Bridge in-place CurrentSid mutate on sid → sid'
  - Bridge wait in all-zeros window, drop on HasAudioStream=false
  - Cleanup reaps ghost sid losing the resolver pick
  Move per-entity World.TryGet / World.Has checks into archetype-level
  [All]/[None] filters so the source-generated queries do the gating
  instead of the body. Same observable behavior, fewer hash lookups on
  the hot path, more idiomatic ECS.

  NearbyAudioPositionSystem:
  - SyncPositionsAndSpatialAngles (one query + World.TryGet<InAudibleRangeTag>)
    splits into SyncActiveAudio ([All<InAudibleRangeTag>]) and
    SyncInactiveOutOfRangeAudio ([None<InAudibleRangeTag>]).
  - Shared Stop/diff-write logic extracted to static StopIfPlaying.

  NearbyAudioCleanupSystem:
  - CleanupDoomedSource (one query + two World.Has + three predicate checks)
    splits into three archetype-filtered queries:
    - ReapOrphanedSource     — [None<NearbyAudioStreamerComponent>]   (trigger #1)
    - ReapOutOfRangeSource   — [None<InAudibleRangeTag>]              (trigger #2)
    - ReapFilteredSource     — registry/blocking/ban check in body    (triggers #3/#4/#5)
  - Shared dispose+remove logic extracted to DisposeAndRemove.

  Triggers #1/#2 are now unconditional reaps on archetype filter — no
  per-entity Has lookup. Only the steady-state hot path (#3-#5) does
  registry/blocking/banned predicate checks.
@popuz popuz self-assigned this May 19, 2026
@popuz popuz added the force-build Used to trigger a build on draft PR label May 19, 2026
popuz and others added 6 commits May 19, 2026 14:25
…edup oracle

  Post-slice-4 the audio-source component lives directly on the avatar entity,
  and the per-entity binding query already filters on its absence. The
  HashSet<StreamKey> kept around as a parallel "what's bound" index was
  fully derived state — every Add was paired with World.Add and every
  Remove with World.Remove. The one-avatar-per-walletId invariant from the
  profile layer means the [None<NearbyAudioSourceComponent>] archetype
  filter also dedups by StreamKey, so the HashSet has no remaining job.

  - NearbyAudioBindingSystem: drop bindings field + ctor param, remove
    Contains-guard and Add. Drop the unused MAX_CREATIONS_PER_FRAME
    constant and the stale "throttled to a fixed budget per frame" XML
    doc (no throttling has been wired up since the rewrite).
  - NearbyAudioCleanupSystem: drop bindings field + ctor param.
    DisposeAndRemove no longer copies StreamKey before World.Remove
    (nothing to do with the key afterwards). DisposeDyingAvatarSources
    and the listening-gate / device-change bulk path lose their
    bindings.Clear / Remove maintenance calls. Drop the
    System.Collections.Generic using.
  - VoiceChatPlugin: drop the nearbyAudioBindings field, its
    HashSet<StreamKey>(32) allocation, both InjectToWorld arguments,
    and the no-longer-used LiveKit.Rooms.Streaming / Streaming.Audio /
    System.Collections.Generic usings.

  EditMode tests:
  - NearbyAudioBindingSystemShould / NearbyAudioCleanupSystemShould:
    drop bindings field, setup, teardown, both ctor params, and ~14
    bindings.Contains / bindings.Is.Empty assertions — observable
    behaviour stays covered by CountEntities / world.Has<>
    assertions already present in the same tests.
  - Cleanup test summary updated; binding-system class doc rewritten to
    drop the throttled-per-tick contract line that referenced the removed
    constant.

  PerformanceTests:
  - NearbyAudioFullCyclePerformanceTest / NearbyAudioFullCycleManualTest:
    drop bindings field, allocation, ctor params, teardown clear, and
    the HUD line that printed bindings.Count.
  - ComputeRampUpTicks now returns 2 (binding materializes every ready
    avatar in a single tick — no per-tick budget to amortise).
…ream-deduplication' into chore/nearby-voice-chat/audio-stream-deduplication
…s StreamKey to IsActiveSid

  - NearbyAudioCleanupSystem: fold World.Remove<NearbyAudioSourceComponent>
    into DisposeAllLiveSources via DisposeAndRemove; drop LIVE_AUDIO_QUERY.
  - INearbyAudioStreamRegistry.IsActiveSid: take StreamKey instead of
    (walletId, sid) — callsite already holds the key on the component.
  - ReapFilteredSource: invert keep→remove for direct reading.
  - Update registry impl + all mocks/fakes/asserts in EditMode and Performance
    tests to the new signature.
…ream-deduplication' into chore/nearby-voice-chat/audio-stream-deduplication
Comment thread Explorer/Packages/packages-lock.json Outdated
@popuz popuz marked this pull request as ready for review May 20, 2026 11:10
@popuz popuz requested review from a team as code owners May 20, 2026 11:10
@popuz popuz marked this pull request as draft May 20, 2026 11:10
@popuz popuz removed the force-build Used to trigger a build on draft PR label May 20, 2026
@popuz popuz changed the title Chore/nearby voice chat/audio stream deduplication chore: nearby voice chat | audio stream deduplication May 20, 2026
@popuz
Copy link
Copy Markdown
Collaborator Author

popuz commented May 20, 2026

@claude review

@claude

This comment has been minimized.

@popuz popuz added perf_test Run Performance Tests from the PR and removed perf_test Run Performance Tests from the PR labels May 20, 2026
popuz added a commit to decentraland/client-sdk-unity that referenced this pull request May 21, 2026
#69)

  Add a cheap monotonic timestamp on every AudioStream that ticks inside the FFI callback whenever a media frame is decoded. This is  the foundation for the NearbyVoiceChat single-active-sid resolver (stream deduplication): a stream that's alive is decoding frames; a ghost never decodes.

  - AudioStreamInternal: private `lastFrameReceivedAt` (init -1), Volatile.Read/Write around Environment.TickCount, set right before the buffer write under the existing channel/sampleRate guards.
  - AudioStream: pass-through `LastFrameReceivedAt`.
  - IAudioStreams.GetLastFrameReceivedAt(StreamKey): returns -1 when the stream is missing or has never emitted (sentinel chosen over 0 because TickCount is signed and can legitimately be 0/negative).

explorer consumer - decentraland/unity-explorer#8817
@popuz popuz marked this pull request as ready for review May 21, 2026 11:04
@claude

This comment has been minimized.

@popuz popuz requested review from NickKhalow and removed request for alejandro-jimenez-dcl May 21, 2026 11:05
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude finished @popuz's task in 2m 58s —— View job


Review (after sync — add84dee3, 44d61b6df, 26ab3c395, 9e47387a8)

Re-reviewed the head of the branch against the previous round of feedback. Three of the four notes are resolved; one remains. Net delta since the previous review is small (LiveKit pointer flip + interface XML hardening + test ergonomics + a few comment edits).


Status of previous notes

1. manifest.json LiveKit pin — RESOLVED

Explorer/Packages/manifest.json:13

"com.decentraland.livekit-sdk": "https://github.com/decentraland/client-sdk-unity.git",

The mutable branch ref #chore/frame-activity-oracle-on-AudioStream is gone. The default-branch form now matches the convention used for other internal decentraland/ packages in this file (com.atteneder.gltfast, com.decentraland.unity-shared-dependencies, com.nickkhalow.chrome-devtool-protocol-unity, etc.). packages-lock.json:93 is now pinned to 9caeba7fb88547062f227b2e246b18b6927e4435 against the merged SDK default branch, so this checkout is reproducible. Same-publisher, lockfile-pinned — risk drops to LOW.

2. Main-thread call discipline on resolver — RESOLVED

INearbyAudioStreamRegistry.cs:28-30,40

The XML on GetActiveSid now opens with "Call-site discipline. Main-thread only — the multi-candidate branch performs one FFI hop per sid", and IsActiveSid explicitly inherits that contract ("Shares the cost profile and main-thread discipline of GetActiveSid"). That's exactly the documentation hardening I was asking for. The per-tick memoization side note still applies as a future-if-needed optimization — fine to leave deferred.

3. DisposeAndRemove does a structural change with ref comp in scope — STILL OPEN

NearbyAudioCleanupSystem.cs:112-116

private void DisposeAndRemove(Entity entity, ref NearbyAudioSourceComponent comp)
{
    sourceFactory.Dispose(comp.LivekitAudioSource);
    World.Remove<NearbyAudioSourceComponent>(entity);
}

The ordering (read LivekitAudioSource first, then World.Remove) is functionally safe, but the signature still takes ref while doing a structural change on the same entity. CLAUDE.md §5 says "NEVER perform structural changes … after obtaining a ref, in, or out reference to a component" — the rule applies to the signature, not just the body. The risk isn't this commit; it's the next reader who reorders the two lines and corrupts chunk storage with no obvious tell.

Cleanest fix is one of:

  • Drop ref: the method only reads comp.LivekitAudioSource, so in NearbyAudioSourceComponent comp is sufficient (and source-gen queries accept in for read-only access).
  • Or hoist the read explicitly: var src = comp.LivekitAudioSource; World.Remove<NearbyAudioSourceComponent>(entity); sourceFactory.Dispose(src); — invariant becomes self-documenting.

Fix this →

4. SeedBinding tuple ergonomics — RESOLVED

NearbyAudioCleanupSystemShould.cs:434

Helper now returns (Entity avatarEntity, LivekitAudioSource source). Call sites read cleanly.


Dependency review (delta only)

File Change Source Type Scope Risk Confidence
Packages/manifest.json:13 com.decentraland.livekit-sdk repointed from branch ref to default branch git (decentraland org) source runtime LOW HIGH
Packages/packages-lock.json:93 hash bumped 9c06f180…9caeba7fb8… git source runtime LOW HIGH

The repointed SDK ships from the same org as the project, capability is proportionate (read-only frame-activity counter accessor consumed by NearbyAudioStreamsRegistry.GetLastFrameReceivedAt), and the lockfile pins the resolved hash. No binaries, asmdef edits, install hooks, or scoped-registry changes. No transitive surprises in the lockfile delta.

DEPENDENCY_REVIEW: PASS


  • Read CLAUDE.md and load PR context
  • Diff against origin/dev to see changes since last review
  • Inspect dependency changes (manifest.json, packages-lock.json)
  • Re-verify the three review notes from the previous pass
  • Post updated review

Summary: notes 1, 2, 4 resolved; note 3 (DisposeAndRemove signature) still open and worth addressing for the policy reason rather than a behavior bug. Dependency review now passes — branch ref removed, lockfile pinned, internal publisher.
· Branch

Comment thread Explorer/Assets/DCL/VoiceChat/NearbyVoiceChat/Systems/NearbyAudioCleanupSystem.cs Outdated
popuz added 2 commits May 21, 2026 13:20
…ream-deduplication' into chore/nearby-voice-chat/audio-stream-deduplication
@claude

This comment has been minimized.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude encountered an error after 3m 25s —— View job


Re-reviewing after sync…

  • Read CLAUDE.md and load PR context
  • Diff against origin/dev to see changes since last review
  • Re-verify the two blocking issues from previous review (DisposeAndRemove ref, LogAudioStreams forwarding)
  • Inspect dependency changes (manifest.json, packages-lock.json)
  • Post updated review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude finished @popuz's task in 4m 8s —— View job


Review (sync — d34bec97)

  • Read CLAUDE.md and load PR context
  • Diff against origin/dev to see changes since last review
  • Verify blocking issue: LogAudioStreams.GetLastFrameReceivedAt delegation
  • Verify blocking issue: InteriorAudioStreams.GetLastFrameReceivedAt delegation (production path)
  • Post inline comments
  • Post updated review

Status of previous blocking issues

LogAudioStreams.GetLastFrameReceivedAt — REGRESSION ❌

LogVideoStreams.cs:66–69

The previous iteration had correctly added a stored IAudioStreams origin field and a GetLastFrameReceivedAt delegation. Commit d34bec97 ("rolled back LogAudioStreams") removed that field, making the class unable to forward the call. The interface default (Option.None) now silently fires again for every call through LogRoom. This is a step backward — needs to be reverted to the version that stored the typed origin.

InteriorAudioStreams.GetLastFrameReceivedAt — STILL OPEN ❌

InteriorVideoStreams.cs:52–89

Still no GetLastFrameReceivedAt override. Every other IAudioStreams method (ActiveStream, Release, Free, etc.) already delegates to assigned, but GetLastFrameReceivedAt does not.

Combined production impact: The complete call chain at runtime is:

NearbyAudioStreamsRegistry.getLastFrameReceivedAt
  → InteriorAudioStreams.GetLastFrameReceivedAt  (default → Option.None)
     [if fixed]: → LogAudioStreams.GetLastFrameReceivedAt (default → Option.None again)
        [if fixed]: → AudioStreams.GetLastFrameReceivedAt  (real frame tick ← correct)

Both layers must be fixed together. Fixing only one leaves the oracle dead in production.


Required fixes

  1. InteriorAudioStreams — add one delegation method (same file, InteriorVideoStreams.cs):

    public Option<int> GetLastFrameReceivedAt(StreamKey key) =>
        assigned.GetLastFrameReceivedAt(key);
  2. LogAudioStreams — restore the stored IAudioStreams origin field and delegation that d34bec97 removed:

    public class LogAudioStreams : LogStreams<AudioStream, AudioStreamInfo>, IAudioStreams
    {
        private readonly IAudioStreams origin;
    
        public LogAudioStreams(IAudioStreams origin) : base(origin, nameof(LogAudioStreams))
        {
            this.origin = origin;
        }
    
        public Option<int> GetLastFrameReceivedAt(StreamKey key) =>
            origin.GetLastFrameReceivedAt(key);
    }

Both open issues must be fixed before merge — together they mean the PR's headline fix (ghost-stream deduplication) is silently broken for the exact scenario it targets.

REVIEW_RESULT: FAIL ❌
COMPLEXITY: COMPLEX
COMPLEXITY_REASON: Touches core ECS systems, component structural changes, LiveKit audio stream wrapper classes, and the production multiplayer room path.
QA_REQUIRED: YES

· Branch ·

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude encountered an error after 2m 58s —— View job


Re-reviewing after sync…

  • Read CLAUDE.md and load PR context
  • Diff against origin/dev to see changes since last review (d34bec970 rolled back LogAudioStreams)
  • Verify blocking issue: LogAudioStreams.GetLastFrameReceivedAt
  • Verify blocking issue: InteriorAudioStreams.GetLastFrameReceivedAt (production path)
  • Inspect dependency changes (manifest.json, packages-lock.json)
  • Post updated review
    · Branch

@m3taphysics
Copy link
Copy Markdown
Collaborator

PR #8817, run #26225373717

Builds: Windows change, Windows baseline, macOS change, macOS baseline

Framework 13 i7

Metric Change Baseline Delta Improvement
Samples 2700 2700
CPU average 33.3 ms 33.3 ms -0.0 ms 0.0%
CPU 1% worst 33.6 ms 33.6 ms -0.0 ms 0.1%
CPU 0.1% worst 34.7 ms 35.3 ms -0.6 ms 1.6%
GPU average 8.1 ms 8.0 ms 0.1 ms -1.0%
GPU 1% worst 19.5 ms 21.7 ms -2.2 ms 10.0% 🟢
GPU 0.1% worst 24.0 ms 26.7 ms -2.7 ms 10.0% 🟢

@popuz popuz marked this pull request as draft May 22, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants