-
Notifications
You must be signed in to change notification settings - Fork 435
MSC4354: Sticky Events #4354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MSC4354: Sticky Events #4354
Changes from 5 commits
57ccc48
94b1a87
50d76e6
3baf0d8
29e9bf7
b6e8159
33ec282
7725f74
192c6b4
97c9c5b
8d101fd
c75e19c
c925a4c
6524be2
d14448c
ce37b02
caf3fcd
ba01efd
06d7aa5
b44ccaa
81cf728
eced090
cec1815
b94096a
b9ed93f
b135726
8f0e3ce
3c26e3b
71e83cb
b2eab83
99ee9f8
3ff65a5
865746c
240d650
434794d
6f94547
0d5e4d8
7e54063
da7c7c7
50c1910
732a72b
e5c1635
331484d
4340903
41deb2d
082a157
8fbd13d
8c491f3
5c6bd89
ad1203d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,331 @@ | ||
| # MSC4354: Sticky Events | ||
|
|
||
| MatrixRTC currently depends on [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757) | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| for sending per-user per-device state. MatrixRTC wants to be able to share a temporary state to all | ||
| users in a room to indicate whether the given client is in the call or not. | ||
|
|
||
| The concerns with MSC3757 and using it for MatrixRTC are mainly: | ||
|
|
||
| 1. In order to ensure other users are unable to modify each other’s state, it proposes using | ||
| string packing for authorization which feels wrong, given the structured nature of events. | ||
| 2. Allowing unprivileged users to send arbitrary amounts of state into the room is a potential | ||
| abuse vector, as these states can pile up and can never be cleaned up as the DAG is append-only. | ||
| 3. State resolution can cause rollbacks. These rollbacks may inadvertently affect per-user per-device state. | ||
|
|
||
| Other proposals have similar problems such as live location sharing which uses state events when it | ||
| really just wants per-user last-write-wins behaviour. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
|
|
||
| There currently exists no good communication primitive in Matrix to send this kind of data. EDUs are | ||
| almost the right primitive, but: | ||
|
kegsay marked this conversation as resolved.
|
||
|
|
||
| * They can’t be sent via clients (there is no concept of EDUs in the Client-Server API\! | ||
| [MSC2477](https://github.com/matrix-org/matrix-spec-proposals/pull/2477) tries to change that) | ||
| * They aren’t extensible. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| * They do not guarantee delivery. Each EDU type has slightly different persistence/delivery guarantees, | ||
| all of which currently fall short of guaranteeing delivery. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
|
|
||
| This proposal adds such a primitive, called Sticky Events, which provides the following guarantees: | ||
|
|
||
| * Eventual delivery (with timeouts) and convergence. | ||
| * Access control tied to the joined members in the room. | ||
| * Extensible, able to be sent by clients. | ||
|
|
||
| This new primitive can be used to implement MatrixRTC participation, live location sharing, among other functionality. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Proposal | ||
|
|
||
| Message events can be annotated with a new top-level `sticky` key, which MUST have a `duration_ms`, | ||
| which is the number of milliseconds for the event to be sticky. The presence of `sticky.duration_ms` | ||
| with a valid value makes the event “sticky”[^stickyobj]. Valid values are the integer range 0-3600000 (1 hour). | ||
|
kegsay marked this conversation as resolved.
|
||
|
|
||
| ```json | ||
| { | ||
| "type": "m.rtc.member", | ||
| "sticky": { | ||
| "duration_ms": 600000 | ||
| }, | ||
|
kegsay marked this conversation as resolved.
|
||
| "sender": "@alice:example.com", | ||
| "room_id": "!foo", | ||
| "origin_server_ts": 1757920344000, | ||
| "content": { ... } | ||
| } | ||
| ``` | ||
|
|
||
| This key can be set by clients in the CS API by a new query parameter `stick_duration_ms`, which is | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| added to the following endpoints: | ||
|
|
||
| * `PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}` | ||
| * `PUT /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey}` | ||
|
kegsay marked this conversation as resolved.
Outdated
kegsay marked this conversation as resolved.
Outdated
kegsay marked this conversation as resolved.
Outdated
|
||
|
|
||
| To calculate if any sticky event is still sticky: | ||
|
|
||
| * Calculate the start time: | ||
| * The start time is `min(now, origin_server_ts)`. This ensures that malicious origin timestamps cannot | ||
| specify start times in the future. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| * If the event is pushed via `/send`, servers MAY use the current time as the start time. This minimises | ||
| the risk of clock skew causing the start time to be too far in the past. See “Potential issues \> Time”. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| * Calculate the end time as `start_time + min(stick_duration_ms, 3600000)`. | ||
| * If the end time is in the future, the event remains sticky. | ||
|
|
||
| Sticky events are like normal message events and are authorised using normal PDU checks. They have the | ||
| following _additional_ properties: | ||
|
|
||
| * They are eagerly synchronised with all other servers.[^partial] | ||
| * They must appear in the `/sync` response.[^sync] | ||
| * The soft-failure checks MUST be re-evaluated when the membership state changes for a user with unexpired sticky events.[^softfail] | ||
|
|
||
| To implement these properties, servers MUST: | ||
|
|
||
| * Attempt to send all sticky events to all joined servers, whilst respecting per-server backoff times. | ||
|
erikjohnston marked this conversation as resolved.
Outdated
|
||
| Large volumes of events to send MUST NOT cause the sticky event to be dropped from the send queue on the server. | ||
| * Ensure all sticky events are delivered to clients via `/sync` in a new section of the sync response, | ||
| regardless of whether the sticky event falls within the timeline limit of the request. | ||
| * When a new server joins the room, the server MUST attempt delivery of all sticky events immediately. | ||
| * Remember sticky events per-user, per-room such that the soft-failure checks can be re-evaluated. | ||
|
|
||
| When an event loses its stickiness, these properties disappear with the stickiness. Servers SHOULD NOT | ||
| eagerly synchronise such events anymore, nor send them down `/sync`, nor re-evaluate their soft-failure status. | ||
| Note: policy servers and other similar antispam techniques still apply to these events. | ||
|
reivilibre marked this conversation as resolved.
Outdated
|
||
|
|
||
| The new sync section looks like: | ||
|
|
||
| ```json | ||
| { | ||
| "rooms": { | ||
| "join": { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a room is newly-joined, is the server meant to send down all the pre-existing sticky events? This question also applies to the equivalent sliding sync extension.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All unexpired sticky events for that room, yes. It should be functionally the same as how we send all pre-existing room state to the client when they join a room, so the transition to join should:
Am I missing something here?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The trouble is that it's difficult to paginate to include only (e.g.) 50 sticky events in one response, if we do it directly like that. What happens if there are 1000 active sticky events? We can decide to send them all down to the client at once, no matter what, but I don't know if that's reasonable or not.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would personally use "It should be functionally the same as how we send all pre-existing room state to the client when they join a room" as a guide here. When you join a room with 1000 current state events, do we paginate it? No. So why should we do this with sticky events?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess I'm happy to go along with that, though I suppose a difference is that for state events, only users with PL ≥ 50 can send new ones by default (other than their membership, where only one can be active at a time), whereas sticky events can be sent by anyone. We possibly want to consider the abuse considerations a bit more (...should we have an 'Abuse Considerations' section alongside the Security Considerations one?). If some user decides to spam my room with 1k, 10k, 100k, 1M or more sticky events and we have to push this down the pipe to clients, should I have any tooling to deal with that? For example,
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point of treating it semantically the same as room state is to piggyback off any existing protections, rather than having a patchwork of protections. A malicious actor can join 100k users to the same room today. We protect against this by rate limiting, policy servers, and setting the join rules to invite. Sticky events would benefit from the same protection measures (they are rate limited using the same mechanism, checked by policy servers in the same way, and the act of stopping bad users from joining stops them being able to send sticky events in the first place). Do we really need an extra safeguard here? As an aside, we must also consider the whole protocol not parts in isolation. For example, we do not rate limit messages. When considered in part, this is fine as each
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, I find it hard to resonate with that: the main protection against state event spam is that users can't actually bloat the state space with PL0. (This doesn't apply to sticky events, right?) I don't think policy servers currently have any protection mechanism against either state nor sticky events; once a policy server rule had been written it would be too late to enforce it for the room that was already polluted. (With a slight benefit for sticky events given that they expire, so the storm would 'blow over' so to speak.) We don't send the full state set down to syncing clients; see for example lazy-loading room memberships where we only include 'relevant' state. This is another aspect that's new with sticky events: clients don't get any control and the server is actually being required to send the entire set down. Applying rate limits to sticky event (or indeed, any kind of event) spam over federation is difficult. However the protection for timeline events is that clients don't have to see them all; typical clients only request enough history to fill the visible client window and the federation protocol doesn't require you to have all of the timeline. This is again a situation that changes with sticky events (in fact, that's the motivating reason for them to exist). I am finding it hard to know what the right course is here fwiw (and trusting our existing safeguards might indeed be the right call), but it does feel to me that sticky events can be a potential unprivileged abuse vector with novel repercussions in several of these axes.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
..but they can, by joining new users they control.
..but we do. We explicitly send all membership deltas between two sync tokens even with lazy loading enabled.
..which is true in basically all "reactive" scenarios, e.g. changing the join rule to invite-only, banning users, etc. I don't dispute that it allows more bandwidth to be consumed over the CS link. I don't think adding a patchwork of bandwidth protections is the right call: it just further reinforces the whole "whoops, the protocol grew organically, so there's one way to do this, another way to do that, and this thing isn't possible because it was done before we did things this way". If we are serious about having bandwidth control on the CS link then we should.. have an MSC that talks about it. The fact we don't is perhaps the biggest indication that this isn't a problem in practice. Having a holistic view allows us to address other elephants in the room (to-device messages, device lists, invites, etc all of which are unbounded and attacker controlled, in addition to the whole-protocol-view that apps typically do follow-up requests which can also consume lots of bandwidth (e.g. threaded operations which defer to thread specific endpoints, I'm also fairly bullish on this because of the 1 hour limit on sticky events. This means you need a continuous active attack for it to consume lots of bandwidth, which is a high bar compared to.. basically every other payload in the CS link :S so I do find it quite bizarre to focus on bandwidth concerns quite so much for this MSC, yet neglect it in all the others (see: organic growth).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (cross-linking another conversation about shotgun bandwidth optimizations vs holistic approach, #4186 (comment)) |
||
| "!726s6s6q:example.com": { | ||
| "account_data": { ... }, | ||
| "ephemeral": { ... }, | ||
| "state": { ... }, | ||
| "timeline": { ... }, | ||
| "sticky": { | ||
| "events": [ | ||
| { | ||
| "sender": "@bob:example.com", | ||
| "type": "m.foo", | ||
| "sticky": { | ||
| "duration_ms": 300000 | ||
| }, | ||
| "origin_server_ts": 1757920344000, | ||
| "content": { ... } | ||
| }, | ||
| { | ||
| "sender": "@alice:example.com", | ||
| "type": "m.foo", | ||
| "sticky": { | ||
| "duration_ms": 300000 | ||
| }, | ||
| "origin_server_ts": 1757920311020, | ||
| "content": { ... } | ||
| } | ||
| ] | ||
| } | ||
|
kegsay marked this conversation as resolved.
|
||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Over Simplified Sliding Sync, Sticky Events have their own extension `sticky_events`, which has the following response shape: | ||
|
kegsay marked this conversation as resolved.
Outdated
kegsay marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```json | ||
| { | ||
| "rooms": { | ||
| "!726s6s6q:example.com": { | ||
| "events": [{ | ||
| "sender": "@bob:example.com", | ||
| "type": "m.foo", | ||
| "sticky": { | ||
| "duration_ms": 300000 | ||
| }, | ||
| "origin_server_ts": 1757920344000, | ||
| "content": { ... } | ||
| }] | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
kegsay marked this conversation as resolved.
|
||
|
|
||
| Sticky messages MAY be sent in the timeline section of the `/sync` response, regardless of whether | ||
| or not they exceed the timeline limit[^ordering]. If a sticky event is in the timeline, it MAY be | ||
| omitted from the `sticky.events` section. This ensures we minimise duplication in the `/sync` response JSON. | ||
|
|
||
| Servers SHOULD rate limit sticky events over federation. If the rate limit kicks in, servers MUST | ||
| return a non-2xx status code from `/send` such that the sending server *retries the request* in order | ||
| to guarantee that the sticky event is eventually delivered. Servers MUST NOT silently drop sticky events | ||
| and return 200 OK from `/send`, as this breaks the eventual delivery guarantee. | ||
|
|
||
| These messages may be combined with [MSC4140: Delayed Events](https://github.com/matrix-org/matrix-spec-proposals/pull/4140) | ||
| to provide heartbeat semantics (e.g required for MatrixRTC). Note that the sticky duration in this proposal | ||
| is distinct from that of delayed events. The purpose of the sticky duration in this proposal is to ensure sticky events are cleaned up. | ||
|
|
||
| ### Implementing a map | ||
|
|
||
| MatrixRTC relies on a per-user, per-device map of RTC member events. To implement this, this MSC proposes | ||
| a standardised mechanism for determining keys on sticky events, the `content.sticky_key` property: | ||
|
|
||
| ```json | ||
| { | ||
| "type": "m.rtc.member", | ||
| "sticky": { | ||
| "duration_ms": 300000 | ||
| }, | ||
| "sender": "@alice:example.com", | ||
| "room_id": "!foo", | ||
| "origin_server_ts": 1757920344000, | ||
| "content": { | ||
| "sticky_key": "LAPTOPXX123", | ||
| ... | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| `content.sticky_key` is ignored server-side[^encryption] and is purely informational. Clients which | ||
| receive a sticky event with a sticky key SHOULD keep a map with keys determined via the 4-uple | ||
| `(room_id, sender, type, content.sticky_key)` to track the current values in the map. Nothing stops | ||
|
kegsay marked this conversation as resolved.
|
||
| users sending multiple events with the same `sticky_key`. To deterministically tie-break, clients which | ||
| implement this behaviour MUST: | ||
|
|
||
| - pick the one with the highest `origin_server_ts`, | ||
| - tie break on the one with the highest lexicographical event ID (A < Z). | ||
|
|
||
| When overwriting keys, clients SHOULD use the same sticky duration as the previous sticky event to avoid clients diverging. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| This can happen when a client sends a sticky event with key K with a long timeout, then overwrites it with the same key K’ | ||
| with a short timeout. If the sticky event K’ fails to be sent to all servers before the short timeout is hit, | ||
| some clients will believe the state is K and others will have no state. This will only resolve once the long timeout is hit. | ||
|
|
||
| Note that encrypted sticky events will encrypt some parts of the 4-uple. An encrypted sticky event only exposes the room ID and sender to the server: | ||
|
|
||
| ```json | ||
| { | ||
| "content": { | ||
| "algorithm": "m.megolm.v1.aes-sha2", | ||
| "ciphertext": "AwgCEqABubgx7p8AThCNreFNHqo2XJCG8cMUxwVepsuXAfrIKpdo8UjxyAsA50IOYK6T5cDL4s/OaiUQdyrSGoK5uFnn52vrjMI/+rr8isPzl7+NK3hk1Tm5QEKgqbDJROI7/8rX7I/dK2SfqN08ZUEhatAVxznUeDUH3kJkn+8Onx5E0PmQLSzPokFEi0Z0Zp1RgASX27kGVDl1D4E0vb9EzVMRW1PrbdVkFlGIFM8FE8j3yhNWaWE342eaj24NqnnWJ5VG9l2kT/hlNwUenoGJFMzozjaUlyjRIMpQXqbodjgyQkGacTEdhBuwAQ", | ||
| "device_id": "AAvTvsyf5F", | ||
| "sender_key": "KVMNIv/HyP0QMT11EQW0X8qB7U817CUbqrZZCsDgeFE", | ||
| "session_id": "c4+O+eXPf0qze1bUlH4Etf6ifzpbG3YeDEreTVm+JZU" | ||
| }, | ||
| "origin_server_ts": 1757948616527, | ||
| "sender": "@alice:example.com", | ||
| "type": "m.room.encrypted", | ||
| "sticky": { | ||
| "duration_ms": 600000 | ||
| }, | ||
| "event_id": "$lsFIWE9JcIMWUrY3ZTOKAxT_lIddFWLdK6mqwLxBchk", | ||
| "room_id": "!ffCSThQTiVQJiqvZjY:matrix.org" | ||
| } | ||
| ``` | ||
|
|
||
| The decrypted event would contain the `type` and `content.sticky_key`. | ||
|
|
||
| ## Potential issues | ||
|
kegsay marked this conversation as resolved.
|
||
|
|
||
| ### Time | ||
|
|
||
| Servers who can’t maintain correct clock frequency may expire sticky events at a slightly slower/faster rate | ||
| than other servers. As the maximum timeout is relatively low, the total deviation is also reasonably low, | ||
| making this less problematic. The alternative of explicitly sending an expiration event would likely cause | ||
| more deviation due to retries than deviations due to clocks. | ||
|
|
||
| Servers with significant clock skew may set `origin_server_ts` too far in the past or future. If the value | ||
| is too far in the past this will cause sticky events to expire quicker than they should, or to always be | ||
| treated as expired. If the value is too far in the future, this has no effect as it is bounded by the current time. | ||
| As such, this proposal relies somewhat on NTP to ensure clocks over federation are roughly in sync. | ||
| As a consequence of this, the sticky duration SHOULD NOT be set to below 5 minutes.[^ttl] | ||
|
|
||
| ### Encryption | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the strategy regarding Or at least having a way to discard such a sticky event?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW there is prior art / similar rules for edits validity of replacement events Maybe we could add similar rules? to be consistent.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this is referring to the 'Addendum: Implementing an ephemeral map'? I wonder if the formalisation of that addendum would be better-suited to another MSC? |
||
|
|
||
| Encrypted sticky events reduce reliability as in order for a sticky event to be visible to the end user it | ||
| requires *both* the sending client to think the receiver is joined (so we encrypt for their devices) and the | ||
| receiving server to think the sender is joined (so it passes auth checks). Unencrypted events only strictly | ||
| require the receiving server to think the sender is joined. | ||
|
|
||
| The lack of historical room key sharing may make some encrypted sticky events undecryptable when new users join the room. | ||
|
|
||
| ### Spam | ||
|
|
||
| Servers may send every event as a sticky event, causing a higher amount of events to be sent eagerly over federation | ||
| and to be sent down `/sync` to clients. The former is already an issue as servers can simply `/send` many events. | ||
| The latter is a new abuse vector, as up until this point the `timeline_limit` would restrict the amount of events | ||
|
kegsay marked this conversation as resolved.
|
||
| that arrive on client devices (only state events are unbounded and setting state is a privileged operation). | ||
| This proposal has the following protections in place: | ||
|
|
||
| * All sticky events expire, with a hard limit of 1 hour. The hard limit ensures that servers cannot set years-long expiry times. | ||
| This ensures that the data in the `/sync` response can go down and not grow unbounded. | ||
| * All sticky events are subject to normal PDU checks, meaning that the sender must be authorised to send events into the room. | ||
| * Servers sending lots of sticky events may be asked to try again later as a form of rate-limiting. | ||
| Due to data expiring, subsequent requests will gradually have less data. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| ### Use state events | ||
|
|
||
| We could do [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/pull/3757), but for the | ||
| reasons mentioned at the start we don’t really want to do so. | ||
|
|
||
| ### Make stickiness persistent not ephemeral | ||
|
|
||
| There are arguments that, at least for some use cases, we don’t want these sticky events to timeout. | ||
| However, that opens the possibility of bloating the `/sync` response with sticky events. | ||
|
|
||
| Suggestions for minimizing that have been to have a hard limit on the number of sticky events a user can have per room, | ||
| instead of a timeout. However, this has two drawbacks: a) you still may end up with substantial bloat as stale data doesn’t | ||
| automatically get reaped (even if the amount of bloat is limited), and b) what do clients do if there are already too many | ||
| sticky events? The latter is tricky, as deleting the oldest may not be what the user wants if it happens to be not-stale data, | ||
| and asking the user what data it wants to delete vs keep is unergonomic. | ||
|
|
||
| Non-expiring sticky events could be added later if the above issues are resolved. | ||
|
|
||
| ### Have a dedicated ‘ephemeral user state’ section | ||
|
|
||
| Early prototypes of this proposal devised a key-value map with timeouts maintained over EDUs rather than PDUs. | ||
| This early proposal had much the same feature set as this proposal but with one major difference: equivocation. | ||
|
kegsay marked this conversation as resolved.
Outdated
|
||
| Servers could broadcast different values for the same key to different servers, causing the map to not converge: | ||
| the Byzantine Broadcast problem. Matrix already has a data structure to agree on shared state: the room DAG. | ||
| As such, this led to the prototype to the current proposal. By putting the data into the DAG, other servers | ||
| can talk to each other via it to see if they have been told different values. When combined with a simple | ||
| conflict resolution algorithm (which works because there is [no need for coordination](https://arxiv.org/abs/1901.01930)), | ||
| this provides a way for clients to agree on the same values. Note that in practice this needs servers to *eagerly* | ||
| share forward extremities so servers aren’t reliant on unrelated events being sent in order to check for equivocation. | ||
| Currently, there is no mechanism for servers to express “these are my latest events, what are yours?” without actually sending another event. | ||
|
|
||
| ## Security Considerations | ||
|
|
||
| Servers may equivocate over federation and send different events to different servers in an attempt to cause | ||
| the key-value map maintained by clients to not converge. Alternatively, servers may fail to send sticky events | ||
| to their own clients to produce the same outcome. Federation equivocation is mitigated by the events being | ||
| persisted in the DAG, as servers can talk to each other to fetch all events. There is no way to protect against | ||
| dropped updates for the latter scenario. | ||
|
|
||
| ## Unstable Prefix | ||
|
|
||
| - The `stick_duration_ms` query param is `msc4354_stick_duration_ms`. | ||
| - The `sticky` key in the PDU is `msc4354_sticky`. | ||
| - The `/sync` response section is `msc4354_sticky_events`. | ||
| - The sticky key in the `content` of the PDU is `msc4354_sticky_key`. | ||
|
kegsay marked this conversation as resolved.
|
||
|
|
||
| [^stickyobj]: The presence of the `sticky` object alone is insufficient. | ||
| [^partial]: Over federation, servers are not required to send all timeline events to every other server. | ||
| Servers mostly lazy load timeline events, and will rely on clients hitting `/messages` which in turn | ||
| hits`/backfill` to request events from federated servers. | ||
| [^sync]: Normal timeline events do not always appear in the sync response if the event is more than `timeline_limit` events away. | ||
| [^softfail]: Not all servers will agree on soft-failure status due to the check considering the “current state” of the room. | ||
| To ensure all servers agree on which events are sticky, we need to re-evaluate this rule when the current room state changes. | ||
| This becomes particularly important when room state is rolled back. For example, if Charlie sends some sticky event E and | ||
| then Bob kicks Charlie, but concurrently Alice kicks Bob then whether or not a receiving server would accept E would depend | ||
| on whether they saw “Alice kicks Bob” or “Bob kicks Charlie”. If they saw “Alice kicks Bob” then E would be accepted. If they | ||
| saw “Bob kicks Charlie” then E would be rejected, and would need to be rolled back when they see “Alice kicks Bob”. | ||
| [^ordering]: Sticky events expose gaps in the timeline which cannot be expressed using the current sync API. If sync used | ||
| something like [stitched ordering](https://codeberg.org/andybalaam/stitched-order) | ||
| or [MSC3871](https://github.com/matrix-org/matrix-spec-proposals/pull/3871) then sticky events could be inserted straight | ||
| into the timeline without any additional section, hence “MAY” would enable this behaviour in the future. | ||
|
kegsay marked this conversation as resolved.
|
||
| [^encryption]: Previous versions of this proposal had the key be at the top-level of the event JSON so servers could | ||
| implement map-like semantics on client’s behalf. However, this would force the key to remain visible to the server and | ||
| thus leak metadata. As a result, the key now falls within the encrypted `content` payload, and clients are expected to | ||
| implement the map-like semantics should they wish to. | ||
|
kegsay marked this conversation as resolved.
|
||
| [^ttl]: Earlier designs had servers inject a new `unsigned.ttl_ms` field into the PDU to say how many milliseconds were left. | ||
| This was problematic because it would have to be modified every time the server attempted delivery of the event to another server. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Doesn't the spec require that today with the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah but not over federation. I mostly added this because Erik seemed to think this was a downside in his earlier proposal:
Do you want me to add anything to this? |
||
| Furthermore, it didn’t really add any more protection because it assumed servers honestly set the value. | ||
| Malicious servers could set the TTL to be 0 ~ `sticky.duration_ms` , ensuring maximum divergence | ||
| on whether or not an event was sticky. In contrast, using `origin_server_ts` is a consistent reference point | ||
| that all servers are guaranteed to see, limiting the ability for malicious servers to cause divergence as all | ||
| servers approximately track NTP. | ||
Uh oh!
There was an error while loading. Please reload this page.