Skip to content

MSC4242: State DAGs (Federation)#19425

Open
kegsay wants to merge 27 commits intodevelopfrom
kegan/4242
Open

MSC4242: State DAGs (Federation)#19425
kegsay wants to merge 27 commits intodevelopfrom
kegan/4242

Conversation

@kegsay
Copy link
Copy Markdown
Contributor

@kegsay kegsay commented Feb 3, 2026

Builds off #19424

Adds federation compatibility for MSC4242 state DAG rooms. Overview:

  • Adds extra HTTP API fields as per the MSC.
  • Adds methods for walking and extracting the state DAG for a room (for /get_missing_events and /send_join respectively).
  • Adds impl for processing the federation requests, as well as /send.

Complement tests: matrix-org/complement#841

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

This implements state DAGs, without support for federation.

A general overview:
 - It adds a new room version and new event type.
 - It adds a new field `calculated_auth_event_ids` to internal metadata.
 - It stores the state DAG via new state DAG edges / forward extremities tables.
 - It adds new auth rules as per the MSC.
 - It uses the new `prev_state_events` field instead of `prev_event_ids()` when doing state resolution.
Builds off #19424

Adds federation compatibility for state DAG rooms. Overview:
 - Adds extra HTTP API fields as per the MSC.
 - Adds methods for walking and extracting the state DAG for a room (for `/get_missing_events` and `/send_join` respectively).
 - Adds impl for processing the federation requests, as well as `/send`.
@kegsay kegsay requested a review from a team as a code owner February 3, 2026 14:49
logger.debug("get_room_state_ids dest=%s, room=%s", destination, room_id)

path = _create_v1_path("/state_ids/%s", room_id)
qps = {"event_id": event_id}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qps is too ambiguous. Also mentioned this on matrix-org/complement#806 (comment)

create_event = e
break

if room_version.msc4242_state_dags and response.state_dag:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this should just be if room_version.msc4242_state_dags and error if response.state_dag doesn't exist.

Comment on lines +1215 to 1220
auth_events = response.auth_events
create_event = None
for e in state:
if (e.type, e.state_key) == (EventTypes.Create, ""):
create_event = e
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this as an else below?

Comment on lines +780 to +782
if event.room_version.msc4242_state_dags and caller_supports_partial_state:
# TODO(kegan): for now, MSC4242 won't support partial state for ease of prototyping.
caller_supports_partial_state = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment about the nuance in the docstring

Comment on lines +807 to +814
# NOTE: we don't return the state dag for forward extremities that aren't part of this
# join event to make it easier for the receiving server to set their own forward
# extremities (they are equal to the join event's prev_state_events). This means we may
# fail to sync concurrent forks not on the path to the join event, but this is an
# outstanding problem in general.
state_dag = await self.store.get_state_dag(
room_id, set(event.prev_state_events)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really that bad if we just pass all forward extremities?

e_id
for e_id in (
ev.auth_event_ids()
if not isinstance(ev, FrozenEventVMSC4242)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert not isinstance(ev, FrozenEventVMSC4242)

]
for ev in event_map.values()
}
# XXX: confusing name: this isn't _only_ auth events!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it hold?

Comment on lines +1931 to +1934
# TODO(kegan): check before merging if we can be stricter here.
# Otherwise, we are somewhat lenient and just persist the event
# as rejected, for moderate compatibility with older Synapse
# versions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

else:
calculated_auth_events = await self._store.get_events(
calculated_auth_event_ids,
allow_rejected=True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow_rejected=True,?

Comment on lines +1956 to +1960
# The above function is typically not async, and so won't yield to
# the reactor. For large rooms let's yield to the reactor
# occasionally to ensure we don't block other work.
if (i + 1) % 1000 == 0:
await self._clock.sleep(Duration(seconds=0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true?

Both process and prep look like they have yieldable points that are always hit. Perhaps good to have this explicit anyway as the other code can evolve.

@MadLittleMods MadLittleMods mentioned this pull request Mar 9, 2026
3 tasks
kegsay and others added 8 commits March 10, 2026 14:49
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: Eric Eastwood <erice@element.io>
Co-authored-by: Eric Eastwood <erice@element.io>
Base automatically changed from kegan/4242-csapi to develop April 16, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants