Skip to content

MSC4354: Sticky Events#4354

Open
kegsay wants to merge 50 commits intomainfrom
kegan/persist-edu
Open

MSC4354: Sticky Events#4354
kegsay wants to merge 50 commits intomainfrom
kegan/persist-edu

Conversation

@kegsay
Copy link
Copy Markdown
Member

@kegsay kegsay commented Sep 16, 2025

@kegsay kegsay changed the title Sticky Events MSC4354: Sticky Events Sep 16, 2025
It wasn't particulalry useful for clients, and doesn't help equivocation much.
Comment thread proposals/4354-sticky-events.md Outdated
@turt2live turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Sep 16, 2025
@github-project-automation github-project-automation bot moved this to Tracking for review in Spec Core Team Workflow Sep 16, 2025
@turt2live turt2live added the matrix-2.0 Required for Matrix 2.0 label Sep 16, 2025
Comment thread proposals/4354-sticky-events.md
Comment thread proposals/4354-sticky-events.md Outdated
Comment thread proposals/4354-sticky-events.md
Comment thread proposals/4354-sticky-events.md
Comment thread proposals/4354-sticky-events.md Outdated
These messages may be combined with [MSC4140: Delayed Events](https://github.com/matrix-org/matrix-spec-proposals/pull/4140)
to provide heartbeat semantics (e.g required for MatrixRTC). Note that the sticky duration in this proposal
is distinct from that of delayed events. The purpose of the sticky duration in this proposal is to ensure sticky events are cleaned up,
whereas the purpose of delayed events is to affect the send time (and thus start time for stickiness) of an event.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticky events is an awesome and fun name, I would even say elegant if it wasnt for the following issue. It appears to be named after an implication of what it does, but I think the metaphor isn't at all obvious. This is particularly true wrt self-destruct semantics (= inverted heartbeat) of delayed events as the apparent need for this clarifying section emphasizes.

In conversation with peers, people who have not read this MSC but only somehow heard about it are usually assuming some kind of self-destruction related to the timer. Perhaps the event "sticks around" that long. It's quite possible this is also burdened by mixup with that actual requirement as delayed events and matrixrtc as a whole remain a developing topic, and so often both being mentioned in one sentence.

My interpretation after reading is that they are sticky in a sense of sticking to the top, eager high priority sharing, similar to state events, and eventually unsticking to fall back to regular priority.

I realize I'm a bit late to complain about the name which has already been used proudly in public a bunch. We could sti consider finding a name that more intuitively fits the purpose. Priority Events, Important Events?

Copy link
Copy Markdown
Member

@BillCarsonFr BillCarsonFr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question regarding mixed content sticky chains (clear sticky replacing encrypted sticky)

As such, this proposal relies somewhat on NTP to ensure clocks over federation are roughly in sync.
As a consequence of this, the sticky duration SHOULD NOT be set to below 5 minutes.[^ttl]

### Encryption
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the strategy regarding Mixed Content sticky chain? Like a clear event replacing an encrypted one?
Can we disable mixed content? Only an encrypted event can replace an encrypted sticked event.

Or at least having a way to discard such a sticky event?
If not it would be like allowing clear edits of encrypted messages without showing a big red warning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there is prior art / similar rules for edits validity of replacement events

Maybe we could add similar rules? to be consistent.
Things like

  • The replacement and original events must have the same type
  • If the original event was encrypted the replacement should be too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is referring to the 'Addendum: Implementing an ephemeral map'? I wonder if the formalisation of that addendum would be better-suited to another MSC?

Comment thread proposals/4354-sticky-events.md Outdated
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Feb 24, 2026
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [element-hq/synapse](https://github.com/element-hq/synapse) | minor | `v1.147.1` → `v1.148.0` |

---

### Release Notes

<details>
<summary>element-hq/synapse (element-hq/synapse)</summary>

### [`v1.148.0`](https://github.com/element-hq/synapse/releases/tag/v1.148.0)

[Compare Source](element-hq/synapse@v1.147.1...v1.148.0)

### Synapse 1.148.0 (2026-02-24)

No significant changes since 1.148.0rc1.

### Synapse 1.148.0rc1 (2026-02-17)

#### Features

- Support sending and receiving [MSC4354 Sticky Event](matrix-org/matrix-spec-proposals#4354) metadata. ([#&#8203;19365](element-hq/synapse#19365))

#### Improved Documentation

- Fix reference to the `experimental_features` section of the configuration manual documentation. ([#&#8203;19435](element-hq/synapse#19435))

#### Deprecations and Removals

- Remove support for [MSC3244: Room version capabilities](matrix-org/matrix-spec-proposals#3244) as the MSC was rejected. ([#&#8203;19429](element-hq/synapse#19429))

#### Internal Changes

- Add in-repo Complement tests so we can test Synapse specific behavior at an end-to-end level. ([#&#8203;19406](element-hq/synapse#19406))
- Push Synapse docker images to Element OCI Registry. ([#&#8203;19420](element-hq/synapse#19420))
- Allow configuring the Rust HTTP client to use HTTP/2 only. ([#&#8203;19457](element-hq/synapse#19457))
- Correctly refuse to start if the Rust workspace config has changed and the Rust library has not been rebuilt. ([#&#8203;19470](element-hq/synapse#19470))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4yNS44IiwidXBkYXRlZEluVmVyIjoiNDMuMjUuOCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4203
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Feb 26, 2026
Tested on NetBSD 9 amd64 by reporting pyproject.toml buglets upstream!

# Synapse 1.148.0 (2026-02-24)

## Features

- Support sending and receiving [MSC4354 Sticky Event](matrix-org/matrix-spec-proposals#4354) metadata. ([\#19365](element-hq/synapse#19365))

## Deprecations and Removals

- Remove support for [MSC3244: Room version capabilities](matrix-org/matrix-spec-proposals#3244) as the MSC was rejected. ([\#19429](element-hq/synapse#19429))
Comment on lines +95 to +96
* History visibility **checks** MUST NOT be applied to sticky events. Any joined user is authorised to see sticky events
for the duration they remain sticky.[^hisvis]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sticky events in this situation will never come down the timeline section of sliding sync, but I need to confirm that.
In the Synapse implementation it looks like Sliding Sync only shows events since you joined the room.


### Sync API changes

The new `/sync` section looks like:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mention of max number of sticky events in the response, or how the client can configure such a limit.

Comment thread proposals/4354-sticky-events.md
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Mar 24, 2026
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [element-hq/synapse](https://github.com/element-hq/synapse) | minor | `v1.149.1` → `v1.150.0` |

---

### Release Notes

<details>
<summary>element-hq/synapse (element-hq/synapse)</summary>

### [`v1.150.0`](https://github.com/element-hq/synapse/releases/tag/v1.150.0)

[Compare Source](element-hq/synapse@v1.149.1...v1.150.0)

### Synapse 1.150.0 (2026-03-24)

No significant changes since 1.150.0rc1.

### Synapse 1.150.0rc1 (2026-03-17)

#### Features

- Add experimental support for the [MSC4370](matrix-org/matrix-spec-proposals#4370) Federation API `GET /extremities` endpoint. ([#&#8203;19314](element-hq/synapse#19314))
- [MSC4140: Cancellable delayed events](matrix-org/matrix-spec-proposals#4140): When persisting a delayed event to the timeline, include its `delay_id` in the event's `unsigned` section in `/sync` responses to the event sender. ([#&#8203;19479](element-hq/synapse#19479))
- Expose [MSC4354 Sticky Events](matrix-org/matrix-spec-proposals#4354) over the legacy (v3) /sync API. ([#&#8203;19487](element-hq/synapse#19487))
- When Matrix Authentication Service (MAS) integration is enabled, allow MAS to set the user locked status in Synapse. ([#&#8203;19554](element-hq/synapse#19554))

#### Bugfixes

- Fix `Build and push complement image` CI job pointing to non-existent image. ([#&#8203;19523](element-hq/synapse#19523))
- Fix a bug introduced in v1.26.0 that caused deactivated, erased users to not be removed from the user directory. ([#&#8203;19542](element-hq/synapse#19542))

#### Improved Documentation

- In the Admin API documentation, always express path parameters as `/<param>` instead of as `/$param`. ([#&#8203;19307](element-hq/synapse#19307))
- Update docs to clarify `outbound_federation_restricted_to` can also be used with the [Secure Border Gateway (SBG)](https://element.io/en/server-suite/secure-border-gateways). ([#&#8203;19517](element-hq/synapse#19517))
- Unify Complement developer docs. ([#&#8203;19518](element-hq/synapse#19518))

#### Internal Changes

- Put membership updates in a background resumable task when changing the avatar or the display name. ([#&#8203;19311](element-hq/synapse#19311))
- Add in-repo Complement test to sanity check Synapse version matches git checkout (testing what we think we are). ([#&#8203;19476](element-hq/synapse#19476))
- Migrate `dev` dependencies to [PEP 735](https://peps.python.org/pep-0735/) dependency groups. ([#&#8203;19490](element-hq/synapse#19490))
- Remove the optional `systemd-python` dependency and the `systemd` extra on the `synapse` package. ([#&#8203;19491](element-hq/synapse#19491))
- Avoid re-computing the event ID when cloning events. ([#&#8203;19527](element-hq/synapse#19527))
- Allow caching of the `/versions` and `/auth_metadata` public endpoints. ([#&#8203;19530](element-hq/synapse#19530))
- Add a few labels to the number groupings in the `Processed request` logs. ([#&#8203;19548](element-hq/synapse#19548))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My44NC4yIiwidXBkYXRlZEluVmVyIjoiNDMuODQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/5040
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
added to the following endpoints:

* [`PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}`](https://spec.matrix.org/v1.16/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid)
* [`PUT /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey}`](https://spec.matrix.org/v1.16/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey)
Copy link
Copy Markdown
Contributor

@benkuly benkuly Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case of having sticky state events? Isn't there already a delivery guarantee for state events? Additionally in the section above the wording is "Message events", which does not contain state events, does it?

"room_id": "!foo",
"origin_server_ts": 1757920344000,
"content": {
"sticky_key": "LAPTOPXX123",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general thought: I understand, that putting the sticky_key and original type into the decrypted content prevents server side metadata. On the other side, the server isn't able to filter out old sticky events with the same sticky_key, that is still valid or the type via sync filter. This problem is also present for "normal" encrypted message events, but this MSC adds additionally cases, because due to the delivery guarantee the client in enforced to receive and decrypt events, that he might not be interested in.

E. g. MSC4362 puts the type:state_key still unencrypted on top level of the encrypted event, which could allow at least client side filters but theoretically server side filters too.

Comment thread proposals/4354-sticky-events.md Outdated
```js
{
"rooms": {
"join": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a room is newly-joined, is the server meant to send down all the pre-existing sticky events?
(If so, that is going to get fiddly because we need to track that backlog in the sync token so we can send pages of 50 sticky events down at a time)

This question also applies to the equivalent sliding sync extension.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All unexpired sticky events for that room, yes. It should be functionally the same as how we send all pre-existing room state to the client when they join a room, so the transition to join should:

  • inspect the sync token to see its event stream position
  • load all unexpired sticky events in that room < that stream position
  • include them in the response

Am I missing something here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble is that it's difficult to paginate to include only (e.g.) 50 sticky events in one response, if we do it directly like that.

What happens if there are 1000 active sticky events?

We can decide to send them all down to the client at once, no matter what, but I don't know if that's reasonable or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally use "It should be functionally the same as how we send all pre-existing room state to the client when they join a room" as a guide here. When you join a room with 1000 current state events, do we paginate it? No. So why should we do this with sticky events?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm happy to go along with that, though I suppose a difference is that for state events, only users with PL ≥ 50 can send new ones by default (other than their membership, where only one can be active at a time), whereas sticky events can be sent by anyone.

We possibly want to consider the abuse considerations a bit more (...should we have an 'Abuse Considerations' section alongside the Security Considerations one?).

If some user decides to spam my room with 1k, 10k, 100k, 1M or more sticky events and we have to push this down the pipe to clients, should I have any tooling to deal with that?

For example,

  • Make sticky events become un-sticky if the sender leaves|gets banned|gets kicked (?)?
  • Make sticky events become un-sticky when redacted?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of treating it semantically the same as room state is to piggyback off any existing protections, rather than having a patchwork of protections. A malicious actor can join 100k users to the same room today. We protect against this by rate limiting, policy servers, and setting the join rules to invite. Sticky events would benefit from the same protection measures (they are rate limited using the same mechanism, checked by policy servers in the same way, and the act of stopping bad users from joining stops them being able to send sticky events in the first place).

Do we really need an extra safeguard here?

As an aside, we must also consider the whole protocol not parts in isolation. For example, we do not rate limit messages. When considered in part, this is fine as each /sync response is bounded by timeline_limit. In practice though, the whole app behaviour is not really any different because most apps automatically and repeatedly ask for older /messages if the viewport is not full, allowing a similar amount of bandwidth to be consumed by just sending 100k junk events.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I find it hard to resonate with that: the main protection against state event spam is that users can't actually bloat the state space with PL0. (This doesn't apply to sticky events, right?)

I don't think policy servers currently have any protection mechanism against either state nor sticky events; once a policy server rule had been written it would be too late to enforce it for the room that was already polluted. (With a slight benefit for sticky events given that they expire, so the storm would 'blow over' so to speak.)

We don't send the full state set down to syncing clients; see for example lazy-loading room memberships where we only include 'relevant' state. This is another aspect that's new with sticky events: clients don't get any control and the server is actually being required to send the entire set down.

Applying rate limits to sticky event (or indeed, any kind of event) spam over federation is difficult.

However the protection for timeline events is that clients don't have to see them all; typical clients only request enough history to fill the visible client window and the federation protocol doesn't require you to have all of the timeline. This is again a situation that changes with sticky events (in fact, that's the motivating reason for them to exist).

I am finding it hard to know what the right course is here fwiw (and trusting our existing safeguards might indeed be the right call), but it does feel to me that sticky events can be a potential unprivileged abuse vector with novel repercussions in several of these axes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users can't actually bloat the state space with PL0.

..but they can, by joining new users they control.

We don't send the full state set down to syncing clients; see for example lazy-loading room memberships where we only include 'relevant' state.

..but we do. We explicitly send all membership deltas between two sync tokens even with lazy loading enabled.

once a policy server rule had been written it would be too late to enforce it for the room that was already polluted

..which is true in basically all "reactive" scenarios, e.g. changing the join rule to invite-only, banning users, etc.


I don't dispute that it allows more bandwidth to be consumed over the CS link. I don't think adding a patchwork of bandwidth protections is the right call: it just further reinforces the whole "whoops, the protocol grew organically, so there's one way to do this, another way to do that, and this thing isn't possible because it was done before we did things this way". If we are serious about having bandwidth control on the CS link then we should.. have an MSC that talks about it. The fact we don't is perhaps the biggest indication that this isn't a problem in practice. Having a holistic view allows us to address other elephants in the room (to-device messages, device lists, invites, etc all of which are unbounded and attacker controlled, in addition to the whole-protocol-view that apps typically do follow-up requests which can also consume lots of bandwidth (e.g. threaded operations which defer to thread specific endpoints, /messages and backfill).

I'm also fairly bullish on this because of the 1 hour limit on sticky events. This means you need a continuous active attack for it to consume lots of bandwidth, which is a high bar compared to.. basically every other payload in the CS link :S so I do find it quite bizarre to focus on bandwidth concerns quite so much for this MSC, yet neglect it in all the others (see: organic growth).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cross-linking another conversation about shotgun bandwidth optimizations vs holistic approach, #4186 (comment))

@benkuly
Copy link
Copy Markdown
Contributor

benkuly commented Apr 8, 2026

Just a note: Trixnity does implement this MSC with (as usual) high test coverage here: https://gitlab.com/connect2x/trixnity/trixnity/-/merge_requests/687

This does not mean, that I like how the MSC is right now (see my review).

@Saiv46
Copy link
Copy Markdown

Saiv46 commented Apr 16, 2026

This proposal should supercede #4357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client-server Client-Server API kind:core MSC which is critical to the protocol's success matrix-2.0 Required for Matrix 2.0 proposal A matrix spec change proposal unresolved-concerns This proposal has at least one outstanding concern

Projects

Status: Tracking for review

Development

Successfully merging this pull request may close these issues.