Skip to content

[DX-1211] Address currently silent failures around missing modes #2236

Open
umair-ably wants to merge 11 commits into
DX-1209/error-hintsfrom
DX-1211/silent-failure-hints
Open

[DX-1211] Address currently silent failures around missing modes #2236
umair-ably wants to merge 11 commits into
DX-1209/error-hintsfrom
DX-1211/silent-failure-hints

Conversation

@umair-ably

@umair-ably umair-ably commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Targets #2233 - review that first please

Author's Note

I'd advise reading the Claude generated sections below as they outline the experiments and exact numbers, but if you want a summary of my thoughts...

This PR aims to address 2 silent failures we have today:

  • presence.get() on a channel without presence_subscribe returns an empty list.
  • channel.subscribe() on a channel without subscribe registers a listener that never fires.

The fixes themselves were easy in that we just throw an error and hint pairing when running into these issues. We also introduce a strictMode which allows us to keep this as an additive change (log instead of throw) before promoting it to a breaking change in a later release (always throw). We want this strictMode in case people don't wish to upgrade to a breaking change release, but they can still benefit from the improvements we've made here.

The experiments themselves were the tricky and time-consuming part.

  • It was actually difficult to get Opus or Sonnet to run into these issues as catastrophically as they have done previously e.g. as reported on Matt's gist. There could be a mixture of reasons e.g. Opus with a single problem and a clear context window doesn't find this challenging to fix, or the training cycles since the gist has meant this is less of a problem.
  • To emulate a bloated context and a larger more complex project, I used Haiku. The purpose of this wasn't to test a weaker model, but to test how Opus/Sonnet could behave in more testing environments. This actually allowed us to run into the issue more frequently, whilst also testing how well the agent recovered with our fixes from this PR.
  • With Haiku in mind, we managed to go from 50% of agents running in circles unable to fix the silent issues, to only 10% running into the issues with the additive fix (e.g. the error/hint is logged and not thrown), to then 0% of them running into the issue if we enabled strictMode (e.g. the error/hint is thrown).
  • This shows even the log alone does a lot of the heavy lifting, and that the fix itself is never the tricky thing for agents, but it's not even knowing something is wrong that needs to be our focus.
  • Whilst this PR approaches these issues from the angle of "let the agent hit the issue, and then tell it to fix it", I strongly believe we can preempt these issues (and others) by heavily improving our docstrings. I have this conviction because all agents actually read the docstrings info for strictMode which implies agents are good at getting info upfront if that info actually exists. Extending our docstrings with prerequisites and sideeffects will be my next focus area (and on a much wider surface area than just these 2 silent issues)

The full experiment runs and details are at https://github.com/ably-labs/llm-eval/pull/10

What this PR does

Two SDK calls fail silently today:

  • presence.get() on a channel without presence_subscribe returns an empty list.
  • channel.subscribe() on a channel without subscribe registers a listener that never fires.

Both with no error and no warning. A developer or AI agent gets back "nothing", assumes it works, and
ships broken code.

This PR makes those calls speak up: a warning log by default, and — if you set
clientOptions.strictMode — a thrown error with a hint instead of the silent result. This is the v2.x
step of the A5/B5 rollout in DXRFC-022 (warn now; strictMode becomes the default in v3).

How we tested it

We had AI agents do small Ably coding tasks, with no web access, treating the SDK as a black box. Each
task ran against three versions of the SDK:

  • baseline — today's published SDK (silent).
  • B-error — this PR: the warning shows at the default log level, no setup needed.
  • CstrictMode: true: the call throws instead of returning silently.

(A fourth, B-verbose, is the warning shown only if the developer raised the log level themselves.)

About 370 runs, 0 invalid, on Opus 4.8 and Sonnet 4.6, plus Haiku 4.5 on the last experiment. The
agents can't cheat by swapping in a better API key — the harness blocks it. Every result is compared
against the currently-published ably@2.21.0.

We labelled each run by what the agent did:

  • UNDETECTED — shipped the empty result as "it works." This is the failure we want to remove.
  • DETECTED / REMEDIATED — noticed the problem, or flagged the real cause (the key needs more access).
  • IDIOMATIC / HACKY — for fixable cases: a clean fix, vs. one that hides the symptom.

The main result

On the strong models, the surfacing barely changed anything — because Opus and Sonnet usually work out
the cause on their own. At baseline they shipped broken code only about 10% of the time, so there was
little left to fix.

But that's the easy case: a capable model, full attention, a tiny task. In real use a strong model
often has a full or long context window, or is working in a large unfamiliar codebase, and can't give
each empty result that much attention. We can't easily simulate "Opus with a full context window", so
we used Haiku as a stand-in for a strong model that can't fully focus — same task, less reasoning
brought to bear.

That is where the surfacing matters. On the keystone task:

Keystone task, Haiku, 10 runs Shipped broken as "it works"
baseline (today, silent) 5 / 10 (50%)
B-error (this PR) 1 / 10 (10%)
C (strictMode) 0 / 10

So the change removes most silent failures exactly when the model is stretched thin — the realistic
case, not the exception.

This is a stand-in, not a direct measurement: we did not run Opus with a deliberately full context.
Haiku is the proxy. Measuring a frontier model under real load is the obvious next test.

The seven experiments

  1. No-oracle subscribe (keystone). "Subscribe to this channel, collect messages, tell us if it
    works — there's no test." The key can't subscribe, so nothing arrives. Opus/Sonnet shipped broken
    2/20 at baseline, 1/20 with B-error, 0/10 with C. More telling: at baseline, 0 of 20 runs reached
    the diagnosis from the SDK (they guessed, or tripped a loud server error by probing); with B-error,
    20 of 20 did.
  2. Presence. "Print who's present" — the key can't read presence, so the list is empty. Both
    models nearly always noticed (the task says someone is present, and a wrong guess triggers a loud
    server error anyway), so all arms scored about the same.
  3. Presence, log-level question. Same presence task, run to compare the default-level warning
    (B-error) against the throw (C). Result: B-error ≈ C — the default warning does the job, and
    strictMode is the backstop for the rest.
  4. Debugging a reported bug. "This shows an empty list — fix it," with a capable key and the wrong
    channel options. Every model and every arm fixed it cleanly in about 6 steps. When the bug is
    reported and the cause is in plain sight, the warning adds nothing.
  5. Discoverability. "Build a presence demo; we need silent failures to surface as errors" — without
    naming strictMode. 100% of runs found strictMode in the type definitions and switched it on.
    Agents read the JSDoc.
  6. A realistic silent failure. A genuine Ably gotcha: setting channel modes replaces the
    defaults, so adding one mode silently drops presence. We buried this in a multi-file app and in a
    from-scratch build. Two-thirds of agents made the mistake — but on the strong models they all caught
    it themselves, so nothing shipped broken. The "wasted hours debugging" failures we'd heard about did
    not show up on the strong models at this scale.
  7. Haiku. We re-ran experiments 4 and 1 on Haiku. Debugging (4) stayed flat — even a weak model
    fixes a reported bug. The no-oracle task (1) is where the 50% → 10% → 0% result above came from.

Where it doesn't help

  • Reported bugs in small code — the fix is obvious without the warning.
  • Tasks that already signal the problem (e.g. a loud server error one step away).
  • Strong models with room to think — they self-diagnose. That is why the value shows up under load.

Good next step: clearer JSDoc

Experiment 5 showed agents reliably read the SDK's type definitions — 100% found strictMode there.
That makes the JSDoc a cheap, high-value place to stop these failures before they happen.

The clearest example is the bug in experiment 6. The JSDoc for ChannelOptions.modes just says "An
array of ChannelMode objects." It does not say that setting modes replaces the defaults instead of
adding to them — which is the exact cause of that silent failure. One added line ("setting modes
replaces the default set — list every mode you need") would stop agents making the mistake at all, in
the place they actually look.

This PR makes failures speak up at runtime. Clearer JSDoc on modes (and related options) would prevent
some of them up front. The two together are the natural follow-up.

Summary by CodeRabbit

  • New Features

    • Added an optional strictMode setting (default: false) to convert certain previously silent failure cases into thrown errors with clearer, hint-rich messages.
  • Bug Fixes

    • Improved error handling when required channel modes are missing for subscribe() and presence.get(), including consistent error codes/hints and reduced repeated warnings (logging is limited unless strict mode is enabled).
  • Tests

    • Added test coverage for strictMode behavior for missing subscribe and presence_subscribe modes.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds ClientOptions.strictMode and uses it to convert certain legacy silent-failure paths into logged errors or thrown ErrorInfo with hints; tests cover strict and non-strict behavior.

Changes

Strict mode silent-failure handling

Layer / File(s) Summary
Public option and logger suffix
ably.d.ts, src/common/lib/util/logger.ts
Adds ClientOptions.strictMode?: boolean with JSDoc and Logger.silentFailureLogSuffix() returning a standard silent-failure suffix.
RealtimeChannel subscribe validation
src/common/lib/client/realtimechannel.ts, test/realtime/channel.test.js
RealtimeChannel.subscribe() registers listeners and may attach; after attach it checks flags.SUBSCRIBE and either throws ErrorInfo when strictMode is true or logs the silent-failure once per attach cycle using _silentSubscribeWarned; resets on ATTACHED. Tests cover strict and non-strict modes.
RealtimeAnnotations subscribe check
src/common/lib/client/realtimeannotations.ts
When channel is attached and annotation_subscribe is missing, constructs and logs an ErrorInfo with an updated hint then throws; listener remains registered on this path.
Presence.get post-attach validation
src/common/lib/client/realtimepresence.ts, test/realtime/presence.test.js
After ensureAttached(), presence.get() checks flags.PRESENCE_SUBSCRIBE; logs and throws ErrorInfo(91008) when strictMode is enabled, otherwise continues. Tests cover strict and non-strict cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • lawrence-forooghian
  • ttypic

Poem

🐰 I hop through logs where silence once grew,
Strict mode now speaks when old paths withdrew,
Subscribe, annotations, presence too,
ErrorInfo twinkles with a helpful clue,
And tests wiggle their whiskers: “we checked it through!”

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing silent failures caused by missing channel modes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DX-1211/silent-failure-hints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@umair-ably umair-ably force-pushed the DX-1211/silent-failure-hints branch from 4f35b50 to 4746cfb Compare June 2, 2026 13:55
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report June 2, 2026 13:56 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc June 2, 2026 13:56 Inactive
@umair-ably umair-ably force-pushed the DX-1211/silent-failure-hints branch from 4746cfb to 36fc647 Compare June 3, 2026 09:18
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report June 3, 2026 09:19 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc June 3, 2026 09:20 Inactive
@umair-ably umair-ably changed the title Dx 1211/silent failure hints [DX-1211] Address currently silent failures around missing modes Jun 3, 2026
@umair-ably umair-ably force-pushed the DX-1211/silent-failure-hints branch from 36fc647 to daa4edc Compare June 3, 2026 09:39
@umair-ably umair-ably requested a review from Copilot June 3, 2026 09:39
@umair-ably

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report June 3, 2026 09:40 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc June 3, 2026 09:40 Inactive

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR surfaces two previously silent realtime “missing channel modes” failures by emitting a default-level warning log (and, when clientOptions.strictMode: true, throwing an ErrorInfo with a remediation hint) for:

  • presence.get() without presence_subscribe
  • channel.subscribe() without subscribe

Changes:

  • Add runtime detection for missing presence_subscribe / subscribe modes and emit hintful warnings or throw based on strictMode.
  • Introduce Logger.silentFailureLogSuffix() to explain the “warn now, throw in next major” behavior in log output.
  • Add tests covering strict vs default behavior and add strictMode to ClientOptions typings.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/realtime/presence.test.js Adds coverage for presence.get() with/without presence_subscribe under strict/default modes.
test/realtime/channel.test.js Adds coverage for channel.subscribe() with/without subscribe under strict/default modes.
src/common/lib/util/logger.ts Adds a reusable suffix for strictMode-off silent-failure warnings.
src/common/lib/client/realtimepresence.ts Warn/throw on presence.get() when presence_subscribe is not granted.
src/common/lib/client/realtimechannel.ts Warn/throw on subscribe() when subscribe mode is not granted; adds one-time warning gate.
src/common/lib/client/realtimeannotations.ts Fixes mode-check precedence and adds a log before throwing for missing annotation_subscribe.
ably.d.ts Documents ClientOptions.strictMode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/common/lib/client/realtimechannel.ts Outdated
Comment thread src/common/lib/client/realtimechannel.ts Outdated
Comment thread test/realtime/presence.test.js Outdated
Comment thread src/common/lib/client/realtimepresence.ts
Comment thread src/common/lib/client/realtimeannotations.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/common/lib/client/realtimechannel.ts (1)

503-530: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid registering the listener before the strict-mode validation can fail.

Lines 503-507 add the subscription before the attach/mode check runs. When strictMode === true, subscribe() can reject on Line 529, but the listener remains registered. Retrying the call stacks duplicates, and a later reattach can unexpectedly activate callbacks from a failed subscribe() attempt.

Suggested fix
-    // Filtered
-    if (event && typeof event === 'object' && !Array.isArray(event)) {
-      this.client._FilteredSubscriptions.subscribeFilter(this, event, listener);
-    } else {
-      this.subscriptions.on(event, listener);
-    }
+    const registerListener = () => {
+      if (event && typeof event === 'object' && !Array.isArray(event)) {
+        this.client._FilteredSubscriptions.subscribeFilter(this, event, listener);
+      } else {
+        this.subscriptions.on(event, listener);
+      }
+    };

     // (RTL7g)
+    const strictMode = this.client.options.strictMode === true;
+    if (!strictMode) {
+      registerListener();
+    }
+
     if (this.channelOptions.attachOnSubscribe !== false) {
       const stateChange = await this.attach();
       if (this.state === 'attached' && (this._mode & flags.SUBSCRIBE) === 0) {
@@
-        if (this.client.options.strictMode === true) throw err;
+        if (strictMode) throw err;
       }
+      if (strictMode) {
+        registerListener();
+      }
       return stateChange;
     } else {
+      registerListener();
       return null;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/lib/client/realtimechannel.ts` around lines 503 - 530, The
listener is being registered before the attach/mode strict-mode check, causing
orphaned listeners if attach/mode validation fails; change subscribe() so it
first performs the attach() call and the mode check (the block that constructs
the ErrorInfo and checks this.client.options.strictMode and
this._silentSubscribeWarned) and only after that, if appropriate, register the
listener via this.client._FilteredSubscriptions.subscribeFilter(this, event,
listener) or this.subscriptions.on(event, listener); ensure that if attach/mode
validation throws (strictMode === true) the listener is not registered, and if
you intend to allow registration when strictMode === false, register only after
the validation path completes successfully or explicitly permits it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/common/lib/client/realtimechannel.ts`:
- Around line 509-533: The missing-subscribe validation is currently inside the
attachOnSubscribe branch so it is skipped when channelOptions.attachOnSubscribe
=== false; move the check that constructs ErrorInfo and warns/throws (the block
using this.state === 'attached', (this._mode & flags.SUBSCRIBE) === 0,
Logger.logActionNoStrip, this._silentSubscribeWarned and
client.options.strictMode) out of that attachOnSubscribe conditional so it runs
whenever the channel ends up attached (after calling attach() or if already
attached), regardless of attachOnSubscribe; keep using attach() to get
stateChange and still return stateChange (or null when not calling attach), but
ensure the subscribe-mode validation executes unconditionally after determining
the attached state.

In `@src/common/lib/util/logger.ts`:
- Around line 120-127: Update the static method silentFailureLogSuffix in the
logger (silentFailureLogSuffix) to return a generic suffix that doesn't assume
the behavior is a silent return; replace the phrase "returns silently" with
wording that covers any silent/no-op behavior (e.g., "currently has no
observable effect or fails silently") and keep the rest of the guidance about
clientOptions.strictMode and future throwing intact so the message is accurate
for subscribe/listener and other silent-failure paths.

---

Outside diff comments:
In `@src/common/lib/client/realtimechannel.ts`:
- Around line 503-530: The listener is being registered before the attach/mode
strict-mode check, causing orphaned listeners if attach/mode validation fails;
change subscribe() so it first performs the attach() call and the mode check
(the block that constructs the ErrorInfo and checks
this.client.options.strictMode and this._silentSubscribeWarned) and only after
that, if appropriate, register the listener via
this.client._FilteredSubscriptions.subscribeFilter(this, event, listener) or
this.subscriptions.on(event, listener); ensure that if attach/mode validation
throws (strictMode === true) the listener is not registered, and if you intend
to allow registration when strictMode === false, register only after the
validation path completes successfully or explicitly permits it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4cd8d76d-2fcc-471e-97ea-91f7f7785049

📥 Commits

Reviewing files that changed from the base of the PR and between 5f896cd and daa4edc.

📒 Files selected for processing (7)
  • ably.d.ts
  • src/common/lib/client/realtimeannotations.ts
  • src/common/lib/client/realtimechannel.ts
  • src/common/lib/client/realtimepresence.ts
  • src/common/lib/util/logger.ts
  • test/realtime/channel.test.js
  • test/realtime/presence.test.js

Comment thread src/common/lib/client/realtimechannel.ts Outdated
Comment thread src/common/lib/util/logger.ts Outdated
@umair-ably umair-ably force-pushed the DX-1211/silent-failure-hints branch from daa4edc to 8af10ac Compare June 3, 2026 11:52
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report June 3, 2026 11:53 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc June 3, 2026 11:54 Inactive
@umair-ably

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/realtime/presence.test.js (1)

2403-2431: ⚡ Quick win

Align the non-strict test claim with what it actually verifies.

The test name says it verifies warning logging, but assertions only check resolution to []. Either assert the warning output explicitly or rename the test to avoid over-claiming behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/realtime/presence.test.js` around lines 2403 - 2431, The test "with
strictMode disabled (default), logs a warning and resolves to []" currently only
asserts that channel.presence.get() resolves to []; update it to either (A)
explicitly assert the warning is emitted by capturing/logging the realtime or
helper logger during the test and checking that a warning containing
"strictMode" or similar text was logged when presence.get() ran (reference the
test's realtime instance, channel.presence.get(), and helper logger/monitoring
utilities), or (B) rename the test string to something accurate like "with
strictMode disabled (default), resolves to []" so it no longer claims to check
warning logging; ensure you change only the test description or add the log
assertion around the existing whenPromiseSettles callback where presence.get()
is validated.
test/realtime/channel.test.js (1)

2118-2153: ⚡ Quick win

Add coverage for attachOnSubscribe: false in this DX-1211 test block.

These tests cover the default attach-on-subscribe flow, but not the manual-attach flow (attachOnSubscribe: false). Add a case where the channel is explicitly attached without subscribe mode, then subscribe() is called, to lock in strict/non-strict behavior on that path too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/realtime/channel.test.js` around lines 2118 - 2153, Add tests for the
manual-attach flow by creating channels with attachOnSubscribe: false (via
channels.get(..., { modes: ['publish'], attachOnSubscribe: false })) then
explicitly call channel.attach() before calling channel.subscribe() to exercise
the path where subscribe does not auto-attach; add two cases mirroring the
existing ones: one with realtime = helper.AblyRealtime({ strictMode: true })
asserting channel.subscribe() rejects with code 93003 and an appropriate hint,
and one with realtime = helper.AblyRealtime() (default strictMode false)
asserting channel.subscribe() resolves (returns null or a
ChannelStateChange-like object) and channel.state === 'attached'; reuse the same
naming pattern and helper.monitorConnectionThenCloseAndFinishAsync wrapper as
the other tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/realtime/channel.test.js`:
- Around line 2118-2153: Add tests for the manual-attach flow by creating
channels with attachOnSubscribe: false (via channels.get(..., { modes:
['publish'], attachOnSubscribe: false })) then explicitly call channel.attach()
before calling channel.subscribe() to exercise the path where subscribe does not
auto-attach; add two cases mirroring the existing ones: one with realtime =
helper.AblyRealtime({ strictMode: true }) asserting channel.subscribe() rejects
with code 93003 and an appropriate hint, and one with realtime =
helper.AblyRealtime() (default strictMode false) asserting channel.subscribe()
resolves (returns null or a ChannelStateChange-like object) and channel.state
=== 'attached'; reuse the same naming pattern and
helper.monitorConnectionThenCloseAndFinishAsync wrapper as the other tests.

In `@test/realtime/presence.test.js`:
- Around line 2403-2431: The test "with strictMode disabled (default), logs a
warning and resolves to []" currently only asserts that channel.presence.get()
resolves to []; update it to either (A) explicitly assert the warning is emitted
by capturing/logging the realtime or helper logger during the test and checking
that a warning containing "strictMode" or similar text was logged when
presence.get() ran (reference the test's realtime instance,
channel.presence.get(), and helper logger/monitoring utilities), or (B) rename
the test string to something accurate like "with strictMode disabled (default),
resolves to []" so it no longer claims to check warning logging; ensure you
change only the test description or add the log assertion around the existing
whenPromiseSettles callback where presence.get() is validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f61a171a-8956-4253-a6a4-8dc13ae0f727

📥 Commits

Reviewing files that changed from the base of the PR and between daa4edc and 8af10ac.

📒 Files selected for processing (6)
  • src/common/lib/client/realtimeannotations.ts
  • src/common/lib/client/realtimechannel.ts
  • src/common/lib/client/realtimepresence.ts
  • src/common/lib/util/logger.ts
  • test/realtime/channel.test.js
  • test/realtime/presence.test.js

@umair-ably umair-ably requested a review from AndyTWF June 3, 2026 12:07
@umair-ably umair-ably marked this pull request as ready for review June 3, 2026 12:07
Comment thread src/common/lib/client/realtimechannel.ts Outdated
Comment thread src/common/lib/client/realtimechannel.ts Outdated
hint: 'Re-create the channel with subscribe in modes: realtime.channels.get(name, { modes: ["subscribe", ...] }). Your token/API-key capability must permit subscribe on this channel. If you have the Ably CLI installed, `ably auth keys list` shows your key\'s capabilities. Note: appending to channel.modes after attach() does not enable the mode server-side - the array reflects what the server granted, not what you requested.',
});
if (this.client.options.strictMode === true) {
// The call is about to throw, so undo the listener registration above to avoid leaking a handler.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change.

The current behaviour of subscribe (with the implicit attach) is that listener is always added regardless of failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh i see - this was a coderabbit suggestion from earlier. Still seems strange the listener is added despite this erroring out?

I've undone this to align with existing behaviour though

Comment thread src/common/lib/client/realtimepresence.ts Outdated
Comment thread test/realtime/channel.test.js Outdated
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report June 12, 2026 11:51 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc June 12, 2026 11:51 Inactive
@umair-ably umair-ably requested a review from AndyTWF June 12, 2026 11:53
code: 93001,
statusCode: 400,
hint: 'Re-create the channel with annotation_subscribe in modes: realtime.channels.get(name, { modes: ["subscribe", "annotation_subscribe", ...] }). If the subsequent attach is rejected by the server, check that the channel namespace has "Message annotations, updates, and deletes" enabled in the Ably dashboard and that your API key has annotation-subscribe capability on this channel. If you have the Ably CLI installed, `ably apps rules list` shows which channel namespaces have Mutable Messages enabled, and `ably auth keys list` shows your key\'s capabilities. Note: appending to channel.modes after attach() does not enable the mode server-side - the array reflects what the server granted, not what you requested.',
hint: 'Include "annotation_subscribe" in the channel modes: realtime.channels.get(name, { modes: ["subscribe", "annotation_subscribe", ...] }), or call channel.setOptions({ modes: [...] }) on an existing channel (this triggers a reattach). If the subsequent attach is rejected by the server, check that the channel namespace has "Message annotations, updates, and deletes" enabled in the Ably dashboard and that your API key has annotation-subscribe capability on this channel. If you have the Ably CLI installed, `ably apps rules list` shows which channel namespaces have Mutable Messages enabled, and `ably auth keys list` shows your key\'s capabilities.',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im wondering if this hint is a bit presumptuous? Like, are there other reasons the next attach could be rejected? Or should this hint stop at what you need to do to fix the immediate problem and then let the next error hint at how to fix that? Do we risk confusing LLMs by throwing to many possibilities at them?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it was over-reaching — and it was partly wrong. The two things the old tail told you to check "if the attach is rejected" actually fail differently (confirmed by live sandbox traces + the canonical registry):

  • Missing namespace rule + an explicit annotation_subscribe mode → server rejects the attach (channel → failed, server code 93002 "…namespace with Mutable Messages enabled").
  • Missing annotation-subscribe capabilityno rejection; the server silently drops the mode, the attach succeeds, and this same 93001 re-fires.

So "if the attach is rejected, check the namespace and the capability" conflated a reject path with a downgrade path. Reworded to split by the symptom you actually see:

…If the attach is then rejected, the channel namespace does not have "Message annotations, updates, and deletes" enabled (ably apps rules list …). If the attach succeeds but annotations still are not delivered, your API key lacks the annotation-subscribe capability and the server silently dropped the mode (ably auth keys list …).

On "let the next error hint at the fix" — I'd like to, but the downstream rejection is currently unhinted: server errors come through ErrorInfo.fromValues, which only adds an href, never a hint. So the cause has to live here for now. If we add a hint to the server's 93002 rejection (a separate change), we could then trim this branch to just the immediate fix — happy to file that follow-up.

— 🤖 Drafted with Claude Code (Claude Opus 4.8, 1M context)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status of getting hint or whatever we decide to call it into Realtime? That's a trivial PR when it comes to it, so would it not be better to solve it properly first time rather than adding technical debt we have to undo later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a ticket so we don't lose track of this https://ably.atlassian.net/browse/DX-1460

'The channel was attached without the presence_subscribe mode, so the server has not delivered any members to this client.',
code: 93002,
statusCode: 400,
hint: 'Include "presence_subscribe" in the channel modes: realtime.channels.get(name, { modes: ["presence_subscribe", ...] }), or call channel.setOptions({ modes: [...] }) on an existing channel (this triggers a reattach). Alternatively, omit modes entirely and ensure your token/API-key capability permits presence-subscribe on this channel. If you have the Ably CLI installed, `ably auth keys list` shows your key\'s capabilities.',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presence_subscribe isn't a capability we publish in our documentation. It exists on the server, but it's not something we expose as its a subset of subscribe which for legacy reasons gives message + presence + objects.

So that's a judgement call to make here as to whether this should become a public capability or not.

In any case, telling an LLM to do this right now would only confuse it further - as ably.d.ts doesnt list it as a valid capability, so the LLM would just see a typescript error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed the capability, with one push-back on the mode.

The real bug (fixed): the hint named a presence-subscribe capability, which doesn't exist. Presence delivery is governed by the subscribe capability (the legacy "subscribe ⇒ messages + presence + objects" you describe), so it now reads "…capability permits subscribe", matching the sibling subscribe-mode hint.

On presence_subscribe the mode — I'd keep it: it's a valid ChannelMode in ably.d.ts (doc-commented "The client will receive presence messages"), so modes: ["presence_subscribe"] typechecks — no TS error. And it's required: modes map 1:1 to flag bits (encodeModesToFlags OR-sets one bit per mode, no supersetting), and the integration tests prove it — channel.test.js attaches modes: ['subscribe'] and asserts presence subscription is denied, and ['publish', 'presence_subscribe'] and asserts message subscribe is denied. So subscribe doesn't imply presence_subscribe; when a user has restricted modes it's the only client-side way to get the flag. The hint leads with "omit modes entirely" for the common case, so we only steer toward presence_subscribe when someone has explicitly restricted modes. Whether to document it more prominently is a fair product question, but the hint needs no new public surface.

Also reassigned the code: while checking this I found 93002 is the canonical server code for "namespace needs Mutable Messages" (ably-common errors.json + faqs.ably.com/error-code-93002), so reusing it client-side for presence was a collision. presence.get() now throws 91008 — the presence block, next to 91005 which get() already throws for the suspended case. Reserved in ably/ably-common#345.

— 🤖 Drafted with Claude Code (Claude Opus 4.8, 1M context)

@umair-ably umair-ably force-pushed the DX-1209/error-hints branch from b2537c4 to 35f3492 Compare June 25, 2026 16:46
@umair-ably umair-ably force-pushed the DX-1211/silent-failure-hints branch from 729d8f0 to 8eb6588 Compare July 2, 2026 14:49
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report July 2, 2026 14:50 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report July 2, 2026 14:50 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc July 2, 2026 14:51 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc July 2, 2026 14:51 Inactive
@umair-ably umair-ably force-pushed the DX-1211/silent-failure-hints branch from 8eb6588 to d22cf9a Compare July 3, 2026 10:34
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report July 3, 2026 10:35 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report July 3, 2026 10:35 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc July 3, 2026 10:35 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc July 3, 2026 10:35 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/bundle-report July 3, 2026 10:41 Inactive
@github-actions github-actions Bot temporarily deployed to staging/pull/2236/typedoc July 3, 2026 10:41 Inactive
umair-ably and others added 11 commits July 3, 2026 13:58
Declaration only — no runtime use yet. Documented silent-failure paths
will read this option in subsequent commits to gate a hint-carrying
throw; the default stays `false` in v2.x. Per DXRFC-022 work item B5
the default flips to `true` in v3 with no per-call opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The guard at line 73 parenthesised `(state === 'attached' && _mode & flag) === 0`
so the `=== 0` compared against `(boolean && number)`. It happened to behave
correctly — `false === 0` is `false` so the throw skipped when not attached,
and the bitwise result === 0 was correct when attached — but the comparison
was structural luck rather than the obvious reading of the predicate.
Re-parenthesise to `state === 'attached' && (_mode & flag) === 0`.

Also emit an always-on warning log adjacent to the throw so the diagnostic
fires in the SDK output even when the caller swallows the throw. No
silentFailureLogSuffix here because this throw is unconditional (pre-DXRFC-022)
and not strictMode-gated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A short suffix appended to silent-failure warning logs emitted when
clientOptions.strictMode is off, so the reader knows the same path will
throw in a future major version. Co-locating on Logger keeps the import
surface tight; callers do `Logger.silentFailureLogSuffix()` next to
`Logger.logActionNoStrip(...)`.

Log-only by design — the suffix is not put into ErrorInfo.hint, because
the hint is also shown when the throw fires (strictMode on), where the
suffix would be misleading.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a channel was attached without the presence_subscribe mode, the
server never delivers presence members, so presence.get() resolves to
[] regardless of who is actually present. Today there is no signal
distinguishing "no one is present" from "this client cannot see anyone".

This commit detects the case after ensureAttached() populates the
server-granted mode set, then:

- emits an always-on warning log carrying the hint + a suffix telling
  the reader that strictMode will throw in a future major version.
- throws ErrorInfo with err.hint when clientOptions.strictMode === true.

Code 93002 sits next to 93001 (annotation_subscribe missing) in the
SDK-internal precondition class. It would also be defensible to use
40160 (server-side capability denied), but the hint-coverage rubric
already pins 40160 to the "no auth options" hint shape, so a second
40160 throw site would either weaken that pin or need a rubric refactor.

Suspended-state and {waitForSync: false} paths return earlier and are
unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a channel was attached without the subscribe mode, the server
never delivers messages to the listener, so channel.subscribe()
appears to succeed but no callback ever fires. This commit closes the
gap symmetrically to presence.get() in the previous commit:

- After the implicit attach completes (RTL7g), if subscribe mode was
  not granted by the server, emit an always-on warning log carrying
  the hint + the future-throw suffix.
- One-shot per channel: the warning fires once per attach cycle to
  keep noise down on long-lived listeners. Reset _silentSubscribeWarned
  on the ATTACHED message so a channels.release() + re-attach with
  corrected modes restores signalling.
- Throw ErrorInfo (code 93003) when clientOptions.strictMode === true.

attachOnSubscribe: false is out of scope — the check requires an
attach to have populated _mode. Document this caveat separately if
needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Terser, technical error messages for 93002/93003
- Hints now offer setOptions() or omitting modes + capability check;
  drop confusing channel.modes note
- Keep listener registered on strict-mode throw (matches existing
  subscribe semantics)
- Remove ticket IDs from test names

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…grade

The 93001 hint lumped two distinct failure modes under "if the subsequent
attach is rejected": a missing namespace rule (Mutable Messages) does reject
the attach, but a missing annotation-subscribe capability does NOT - the
server silently drops the mode and this same error re-fires. Confirmed by
live sandbox traces and the canonical error registry.

Reword to key off the symptom the caller observes: attach rejected => enable
the namespace rule; attach succeeds but annotations still undelivered =>
grant the capability. Addresses review feedback that the hint was
presumptuous and over-enumerated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… off colliding 93002

Two fixes to the presence.get()/channel.subscribe() missing-mode errors:

- Hint correctness (review feedback): the presence hint named a
  "presence-subscribe" capability that does not exist. Presence delivery is
  governed by the "subscribe" capability, matching the subscribe-mode hint.

- Error-code collision: 93002 is the canonical server code for "namespace
  needs Mutable Messages" (ably-common errors.json, faqs.ably.com/error-code-93002).
  presence.get() reused it client-side. Move presence to 91008 (presence
  block, next to 91005) and channel.subscribe() to 90009 (channel block);
  93xxx is the annotations/mutable-messages block. Both codes are new on this
  unreleased branch, so the renumber is non-breaking. Reserved in
  ably/ably-common#345. Update the two tests asserting them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…align the silent-failure log suffix

From the 2026-07-02 docstrings-pr-review run:

- ClientOptions.strictMode: both strictMode-gated paths are async, so the
  caller-observable failure is a rejected promise, not a throw. Also drop the
  vague "documented silent-failure paths" (D10), the version label (C9), the
  presence.get() worked example (B6), the parenthesised default (A5), and the
  em-dash opt-out clause (B5/A2). Default stated once, late, as its own
  sentence.
- Logger.silentFailureLogSuffix: fold the JSDoc into two one-line paragraphs,
  fix ClientOptions casing, drop the "(human or LLM)" aside, and align the
  suffix string with the strictMode docstring framing (default will change in
  a future major version; enabling it now makes the call reject).

Also re-verified by live sandbox trace: a namespace without the annotations
rule rejects the attach with 93002, while a capability shortfall resolves the
attach and silently drops the mode. The 93001 hint's reject-vs-downgrade split
is therefore correct as shipped and is deliberately left unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e-get mode hints

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…capability branch

Drops the attach-rejection narration, which surfaces as its own error,
and promotes the parentheticals to sentences. Keeps the guard that
channels.get(name, { modes }) on an existing channel throws, steering
the caller between the two suggested remedies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants