Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/19453.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Introduce `spam_checker_spammy` internal event metadata.
Comment thread
reivilibre marked this conversation as resolved.
24 changes: 24 additions & 0 deletions rust/src/events/internal_metadata.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ enum EventInternalMetadataData {
SoftFailed(bool),
ProactivelySend(bool),
PolicyServerSpammy(bool),
SpamCheckerSpammy(bool),
Comment on lines 57 to +58
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an enum which would allow future adaptions like this? It makes sense that these could be both true so that may not apply in any case. I'm guessing we have some early-returns if either things marks the event as spammy though?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An enum isn't a bad idea actually, but I can't think of a pleasant way of doing that migration, which leads me to think we might be better off keeping the 2 bools but also addressing the 'early-returns' point you make.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early-return removed at be8c05b so we evaluate both attributes

Comment on lines 57 to +58
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "spam checker" only for the module API?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or really there is also the pre-module-API 'spam checker API' but it got superseded by the module API, I think we have a compatibility shim.

Redacted(bool),
TxnId(Box<str>),
TokenId(i64),
Expand Down Expand Up @@ -104,6 +105,13 @@ impl EventInternalMetadataData {
.to_owned()
.into_any(),
),
EventInternalMetadataData::SpamCheckerSpammy(o) => (
pyo3::intern!(py, "spam_checker_spammy"),
o.into_pyobject(py)
.unwrap_infallible()
.to_owned()
.into_any(),
),
EventInternalMetadataData::Redacted(o) => (
pyo3::intern!(py, "redacted"),
o.into_pyobject(py)
Expand Down Expand Up @@ -168,6 +176,11 @@ impl EventInternalMetadataData {
.extract()
.with_context(|| format!("'{key_str}' has invalid type"))?,
),
"spam_checker_spammy" => EventInternalMetadataData::SpamCheckerSpammy(
value
.extract()
.with_context(|| format!("'{key_str}' has invalid type"))?,
),
"redacted" => EventInternalMetadataData::Redacted(
value
.extract()
Expand Down Expand Up @@ -451,6 +464,17 @@ impl EventInternalMetadata {
set_property!(self, PolicyServerSpammy, obj);
}

#[getter]
fn get_spam_checker_spammy(&self) -> PyResult<bool> {
Ok(get_property_opt!(self, SpamCheckerSpammy)
.copied()
.unwrap_or(false))
}
#[setter]
fn set_spam_checker_spammy(&mut self, obj: bool) {
set_property!(self, SpamCheckerSpammy, obj);
}

#[getter]
fn get_redacted(&self) -> PyResult<bool> {
let bool = get_property!(self, Redacted)?;
Expand Down
6 changes: 5 additions & 1 deletion synapse/federation/federation_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,9 @@ async def _check_sigs_and_hash(
# Note: we don't redact the event so admins can inspect the event after the
# fact. Other processes may redact the event, but that won't be applied to
# the database copy of the event until the server's config requires it.
return pdu
#
# We also *don't* return early here as we would still like to evaluate
# `spam_checker_spammy`, for completeness.

spam_check = await self._spam_checker_module_callbacks.check_event_for_spam(pdu)

Expand All @@ -194,6 +196,8 @@ async def _check_sigs_and_hash(
# using the event in prev_events).
redacted_event = prune_event(pdu)
redacted_event.internal_metadata.soft_failed = True
# Mark this as spam so we don't re-evaluate soft-failure status.
redacted_event.internal_metadata.spam_checker_spammy = True
Comment on lines +199 to +200
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on how this was used in #18968, perhaps insert_sticky_events_txn should be updated to instead skip any soft-failed events 🤔

Especially as the reasoning there is "Skipping the insertion of these types of 'invalid' events is useful for performance reasons because they would fill up the table yet we wouldn't show them to clients anyway." and soft_failed covers the behavior of showing events to clients.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, this comment below clarifies why we distinguish this:

Note: Soft-failed sticky events ARE inserted, as their soft-failed status could be re-evaluated later.

return redacted_event

return pdu
Expand Down
13 changes: 11 additions & 2 deletions synapse/storage/databases/main/sticky_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,10 +211,14 @@ def insert_sticky_events_txn(
Skips inserting events:
- if they are considered spammy by the policy server;
(unsure if correct, track: https://github.com/matrix-org/matrix-spec-proposals/pull/4354#discussion_r2727593350)
- if they are considered spammy by a Synapse spam checker module;
- if they are rejected;
- if they are outliers (they should be reconsidered for insertion when de-outliered); or
- if they are not sticky (e.g. if the stickiness expired).

Note: Soft-failed sticky events ARE inserted, as their soft-failed status
could be re-evaluated later.

Skipping the insertion of these types of 'invalid' events is useful for performance reasons because
they would fill up the table yet we wouldn't show them to clients anyway.

Expand All @@ -230,7 +234,12 @@ def insert_sticky_events_txn(
sticky_events: list[tuple[EventBase, int]] = []
for ev in events:
# MSC: Note: policy servers and other similar antispam techniques still apply to these events.
if ev.internal_metadata.policy_server_spammy:
# We don't filter out soft-failed events altogether (in case they get re-evaluated later),
# so filter out `spam_checker_spammy` events specifically as we don't want to re-evaluate _those_ later.
if (
ev.internal_metadata.policy_server_spammy
or ev.internal_metadata.spam_checker_spammy
):
Comment thread
MadLittleMods marked this conversation as resolved.
continue
# We shouldn't be passed rejected events, but if we do, we filter them out too.
if ev.rejected_reason is not None:
Expand All @@ -241,7 +250,7 @@ def insert_sticky_events_txn(
sticky_duration = ev.sticky_duration()
if sticky_duration is None:
continue
# Calculate the end time as start_time + effecitve sticky duration
# Calculate the end time as start_time + effective sticky duration
expires_at = min(ev.origin_server_ts, now_ms) + sticky_duration.as_millis()
# Filter out already expired sticky events
if expires_at <= now_ms:
Expand Down
13 changes: 13 additions & 0 deletions synapse/synapse_rust/events.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,19 @@ class EventInternalMetadata:
policy_server_spammy: bool
"""whether the policy server indicated that this event is spammy"""

spam_checker_spammy: bool
"""Whether a spam checker module indicated that this event is spammy

Note that spam checkers also cause the event to be marked as soft-failed.

This flags exists for two reasons:
1. as debugging information
2. to prevent the soft-failed re-evaluation of spammy events
(the re-evaluation behaviour originates from MSC4354 Sticky Events)

Note that historical spammy events won't have this flag.
"""

txn_id: str
"""The transaction ID, if it was set when the event was created."""
token_id: int
Expand Down
188 changes: 187 additions & 1 deletion tests/module_api/test_spamchecker.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,33 @@
# <https://www.gnu.org/licenses/agpl-3.0.html>.
#
#
from http import HTTPStatus
from typing import Literal

from twisted.internet.testing import MemoryReactor

from synapse.api.constants import EventContentFields, EventTypes
from synapse.api.constants import (
EventContentFields,
EventTypes,
Membership,
)
from synapse.api.room_versions import RoomVersions
from synapse.config.server import DEFAULT_ROOM_VERSION
from synapse.events import make_event_from_dict
from synapse.module_api import EventBase
from synapse.rest import admin, login, room, room_upgrade_rest_servlet
from synapse.server import HomeServer
from synapse.types import Codes, JsonDict
from synapse.util.clock import Clock

from tests import unittest
from tests.server import FakeChannel
from tests.unittest import HomeserverTestCase


class SpamCheckerTestCase(HomeserverTestCase):
"""Tests for the spam checker module API."""

servlets = [
room.register_servlets,
admin.register_servlets,
Expand Down Expand Up @@ -284,3 +295,178 @@ async def user_may_send_state_event(

self.assertEqual(channel.code, 403)
self.assertEqual(channel.json_body["errcode"], Codes.FORBIDDEN)


class FederatedEventSpamCheckMetadataTestCase(unittest.FederatingHomeserverTestCase):
servlets = [
admin.register_servlets,
login.register_servlets,
room.register_servlets,
]

def prepare(self, reactor: MemoryReactor, clock: Clock, hs: HomeServer) -> None:
super().prepare(reactor, clock, hs)
self._module_api = hs.get_module_api()
self._store = hs.get_datastores().main
self._storage_controllers = hs.get_storage_controllers()
self._federation_event_handler = hs.get_federation_event_handler()
self._federation_server = hs.get_federation_server()
self._state_handler = hs.get_state_handler()
self._persistence_controller = hs.get_storage_controllers().persistence

# Create a room
user1_id = self.register_user("user1", "pass")
user1_tok = self.login(user1_id, "pass")
self.room_id = self.helper.create_room_as(
user1_id,
tok=user1_tok,
is_public=True,
room_version=RoomVersions.V10.identifier,
)

# Prepare a join for the 'remote' user
state_map = self.get_success(
self._storage_controllers.state.get_current_state(self.room_id)
)
forward_extremity_event_ids = self.get_success(
self.hs.get_datastores().main.get_latest_event_ids_in_room(self.room_id)
)
self.remote_user_id = f"@remoteuser:{self.OTHER_SERVER_NAME}"
self.remote_user_join_event = make_event_from_dict(
self.add_hashes_and_signatures_from_other_server(
{
"room_id": self.room_id,
"sender": self.remote_user_id,
"state_key": self.remote_user_id,
"depth": 1000,
"origin_server_ts": 1,
"type": EventTypes.Member,
"content": {"membership": Membership.JOIN},
"auth_events": [
state_map[(EventTypes.Create, "")].event_id,
state_map[(EventTypes.JoinRules, "")].event_id,
],
"prev_events": list(forward_extremity_event_ids),
}
),
room_version=RoomVersions.V10,
)

# Send the join
self.get_success(
self._federation_event_handler.on_receive_pdu(
self.OTHER_SERVER_NAME, self.remote_user_join_event
)
)

# Check the join made it to the 'local' view of the room
self.helper.get_event(
room_id=self.room_id,
event_id=self.remote_user_join_event.event_id,
tok=user1_tok,
expect_code=HTTPStatus.OK,
)
Comment thread
reivilibre marked this conversation as resolved.

def test_federated_events_with_spam_checker_metadata(self) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a in-repo Complement test for this sort of thing. Only hiccup would be setting up/configuring up the spam checker module but we could always configure one and have some specific constant that triggers this behavior SPAM_CHECKER_SPAM

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much it fits; I feel the trial test is adequate and is nicer to develop on.
The spam_checker_spammy field is also internal so wouldn't be visible externally.
Am I missing something really beneficial that would be covered?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more clear, I think the trial test is sufficient 👍

It was more an ideal. One benefit would be avoiding the room version details (like this problem). We could also avoid the internal details of manually puppeting federation here.

In terms of checking, we would probably check via /sync (send a sentinel event after the spammy one) and/or admin API's if the spam_checker_spammy value was actually important.

"""
Simulates receiving spammy and non-spammy events over federation,
then checks their `spam_checker_spammy` flag is set properly.
"""

async def check_event_for_spam(event: EventBase) -> Literal["NOT_SPAM"] | Codes:
if event.type == EventTypes.Message:
if "ham" not in event.content["body"]:
return Codes.FORBIDDEN
return "NOT_SPAM"

# Register a spam checker callback that only allows messages with 'ham'
self._module_api.register_spam_checker_callbacks(
check_event_for_spam=check_event_for_spam
)

# Prepare a spammy and a non-spammy event.
forward_extremity_event_ids = self.get_success(
self._store.get_latest_event_ids_in_room(self.room_id)
)
state_map = self.get_success(
self._storage_controllers.state.get_current_state(self.room_id)
)
spammy_event = make_event_from_dict(
self.add_hashes_and_signatures_from_other_server(
{
"room_id": self.room_id,
"sender": self.remote_user_id,
"depth": 2000,
"origin_server_ts": 2,
"type": EventTypes.Message,
"content": {"body": "this is spam", "msgtype": "m.text"},
"auth_events": [
state_map[(EventTypes.Create, "")].event_id,
state_map[(EventTypes.JoinRules, "")].event_id,
state_map[(EventTypes.Member, self.remote_user_id)].event_id,
],
"prev_events": list(forward_extremity_event_ids),
}
),
room_version=RoomVersions.V10,
)
non_spammy_event = make_event_from_dict(
self.add_hashes_and_signatures_from_other_server(
{
"room_id": self.room_id,
"sender": self.remote_user_id,
"depth": 2000,
"origin_server_ts": 2,
"type": EventTypes.Message,
"content": {"body": "delicious ham", "msgtype": "m.text"},
"auth_events": [
state_map[(EventTypes.Create, "")].event_id,
state_map[(EventTypes.JoinRules, "")].event_id,
state_map[(EventTypes.Member, self.remote_user_id)].event_id,
],
"prev_events": list(forward_extremity_event_ids),
}
),
room_version=RoomVersions.V10,
)

# Receive these events over federation
# We need to let the federation server have them because it will
# invoke `_check_sigs_and_hash` which invokes the spam checker.
self.get_success(
self._federation_server._handle_received_pdu(
self.OTHER_SERVER_NAME, spammy_event
)
)
self.get_success(
self._federation_server._handle_received_pdu(
self.OTHER_SERVER_NAME, non_spammy_event
)
)

# Retrieve the events from the database
retrieved_spammy_event = self.get_success(
self._store.get_event(spammy_event.event_id, allow_rejected=True)
)
retrieved_non_spammy_event = self.get_success(
self._store.get_event(non_spammy_event.event_id, allow_rejected=True)
)

# Assert the spammy flags (and soft-failed flags, for good measure) are set properly
self.assertTrue(
retrieved_spammy_event.internal_metadata.spam_checker_spammy,
"Spammy inbound event should be marked as spam_checker_spammy!",
)
self.assertTrue(
retrieved_spammy_event.internal_metadata.is_soft_failed(),
"Spammy inbound event should be soft-failed.",
)

self.assertFalse(
retrieved_non_spammy_event.internal_metadata.spam_checker_spammy,
"Non-spammy inbound event should not be marked as spam_checker_spammy!",
)
self.assertFalse(
retrieved_non_spammy_event.internal_metadata.is_soft_failed(),
"Non-spammy inbound event should not be soft-failed.",
)
3 changes: 2 additions & 1 deletion tests/rest/client/sliding_sync/test_rooms_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# <https://www.gnu.org/licenses/agpl-3.0.html>.
#
import logging
from typing import Any

from parameterized import parameterized, parameterized_class

Expand Down Expand Up @@ -966,7 +967,7 @@ def test_rooms_bump_stamp_backfill(self) -> None:
creator = "@user:other"
room_id = "!foo:other"
room_version = RoomVersions.V10
shared_kwargs = {
shared_kwargs: dict[str, Any] = {
"room_id": room_id,
"room_version": room_version.identifier,
}
Expand Down
4 changes: 2 additions & 2 deletions tests/storage/test_roommember.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
#
#
import logging
from typing import cast
from typing import Any, cast

from twisted.internet.testing import MemoryReactor

Expand Down Expand Up @@ -238,7 +238,7 @@ def test_join_locally_forgotten_room(self) -> None:
creator = "@user:other"
room_id = "!foo:other"
room_version = RoomVersions.V10
shared_kwargs = {
shared_kwargs: dict[str, Any] = {
"room_id": room_id,
"room_version": room_version.identifier,
}
Expand Down
4 changes: 2 additions & 2 deletions tests/storage/test_sliding_sync_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#
#
import logging
from typing import cast
from typing import Any, cast

import attr
from parameterized import parameterized
Expand Down Expand Up @@ -873,7 +873,7 @@ def test_joined_room_bump_stamp_backfill(self) -> None:
creator = "@user:other"
room_id = "!foo:other"
room_version = RoomVersions.V10
shared_kwargs = {
shared_kwargs: dict[str, Any] = {
"room_id": room_id,
"room_version": room_version.identifier,
}
Expand Down
Loading
Loading