Vulnerabilities of X's Community Notes to rater bias and manipulation

This repository contains code to reproduce the results in the paper Community Notes are Vulnerable to Rater Bias and Manipulation.

We're interested in testing the core version of the Community Notes algorithm.

Some notes about the simulated Community Notes data:

It does not contain note summaries, thus topic modeling is not applicable.
It does not contain detailed tags (e.g., SpamHarassmentOrAbuse), thus Harassment-Abuse Tag-Consensus Matrix Factorization and Note Status Explanation rule are not applicable.
It assumes that every rating is posted 1 millisecond after the note, thus all ratings are valid ratings.

Changes were made to the scoring model to accommodate these details:

In data processing, Post Selection Similarity module is disabled.
In model training, only the CORE model is retained.
Inside the CORE model training,
- When deciding rater helpfulness, only use the raterAgreeRatio (0.66).
- notHelpfulSpamHarassmentOrAbuse model is disabled.
- Low diligence model is disabled.
In note status scoring rule,
- Only InitialNMR, CRH, CRNH are retained.
In post training phase,
- PFLIP model is disabled.

Overview of the repo

data: C++ code to generate synthetic input data for each experiment
experiments: code to reproduce the results from scratch, including experiment metadata (definitions.json), main driver calling the data-generation code and scoring algorithm (run_all_experiments.sh) and post-processing scripts (count_FP_FN.py, count_filtered.py, corr_stats.py) to calculate summary metrics
libs: the Community Notes scoring algorithm (git submodule)
results: pre-computed summary CSVs
reports: Jupyter notebooks to reproduce all figures in the paper

1. Set up the environment

1a Initialize the algorithm submodule

git submodule update --init --recursive

1b. Create and activate the Python environment

cp pyproject.toml libs/communitynotes
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install poetry
poetry install --directory libs/communitynotes --no-root

2. Reproduce the results

The results reported in the paper is already in results. You can JUMP DIRECTLY to Stage 3 and run the Jupyter notebooks to re-create the plots with this results. If you want to replicate the workflow from scratch, follow the step-by-step below.

Stage 1 — Generate data and run the scoring algorithm

experiments/run_all_experiments.sh is the main driver. For each experiment it automatically:

Compiles the C++ data-generation code in data/{exp}/
Runs the compiled binary to generate synthetic input data
Runs the Community Notes scoring algorithm on each generated notes file

NOTE: Running this script will always do these 3 steps. Since creating the synthetic data might take some time, you might want to only perform step 1 and 2 if the data doesn't exist.

To run:

bash experiments/run_all_experiments.sh <EXP_NAME> <FILTER>

This script in turn calls experiments/run_multiple_times.sh, which just runs multiple replicates of the same argument set.

Arguments:

EXP_NAME: experiment name, {"bad_actor_no_bias_grid", "bad_actor_with_bias_grid", "multi_bad_actor_no_bias", "multi_bad_actor_with_bias", "multi_bad_actor_with_bias_bhvr_1", "homophily", "iu_var", "note_pol", "user_pol", "Homophily_HIGH_UPol_HIGH_NPol_HIGH", "Homophily_HIGH_UPol_HIGH_NPol_LOW", "Homophily_HIGH_UPol_LOW_NPol_HIGH", "Homophily_HIGH_UPol_LOW_NPol_LOW", "Homophily_LOW_UPol_HIGH_NPol_LOW", "Homophily_LOW_UPol_LOW_NPol_HIGH", "Homophily_LOW_UPol_LOW_NPol_LOW", "Homophily_NONE_UPol_HIGH_NPol_LOW", "Homophily_NONE_UPol_LOW_NPol_HIGH"}, or --all

More about experiment description is in experiments/README.md
FILTER: helpfulness model flag, {True | False}

For example, to run all 18 experiments:

bash experiments/run_all_experiments.sh --all True

Stage 2 — Compute summary statistics

Calculate summary statistics using the generated data in experiments/ enerate summary CSVs. Each script takes 2 arguments EXP_NAME and FILTER, with the same meaning as above.

# FP/FN counts (pollution, suppression, infiltration, waste rates)
python experiments/count_FP_FN.py --exp <EXP_NAME> --helpfulness <FILTER>

# Filtered-user counts (bad-actor vs. good users removed by the algorithm)
python experiments/count_filtered.py --exp <EXP_NAME> --helpfulness <FILTER>

# Correlation statistics (inferred vs. ground-truth intercept and factor)
python experiments/corr_stats.py --exp <EXP_NAME> --helpfulness <FILTER>

These scripts write three summary CSV files per experiment:

To process all 18 experiments at once, replace --exp EXP_NAME with --all:

python experiments/count_FP_FN.py  --all --helpfulness True

More information about the data fields resulting from this computation is described in results/README.md

Stage 3 — Plot results from the paper

Notebooks using results/ to plot main figures.

To run these notebooks, first install the additional packages needed for notebooks:

pip install jupyter ipykernel tqdm

Expected output

After running Stage 2 and 3, you should be expecting the following outputs:

Synthetic input data — Each experiment sweeps over a grid of parameter combinations; for each combination, the C++ binary produces five TSV files:

data/{exp}/data/
└── {params}-notes.tsv            # synthetic note ratings (input to the scoring algorithm)
    {params}-ratings.tsv          # per-rater ratings
    {params}-TrueNoteParams.tsv   # ground-truth note polarity / intercept parameters
    {params}-TrueUserParams.tsv   # ground-truth user polarity parameters
    {params}-userEnrollment.tsv   # user enrollment records

Algorithm output — For each parameter set the scoring algorithm runs once (run_0) and produces:

results/{exp}/output_helpfulness_{True|False}/
└── {params}-notes/
    ├── run_0/
    │   ├── scored_notes.tsv        # notes with inferred intercept and factor scores
    │   ├── helpfulness_scores.tsv  # per-user helpfulness / reputation scores
    │   ├── aux_note_info.tsv       # auxiliary note metadata
    │   ├── note_status_history.tsv # note status over time
    │   └── main_tiny.log          # algorithm run log
    └── timing_results_0.1.csv     # wall-clock timing for this run

Summary output

experiments/{exp}/
├── FP_count/
│   ├── helpfulness_{True|False}.csv    # FP/FN counts with reputation scoring
├── corr_stats/
│   ├── helpfulness_{True|False}.csv    # Pearson correlation statistics
└── filtered_count/
    └── helpfulness_{True|False}.csv    # filter counts with reputation scoring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vulnerabilities of X's Community Notes to rater bias and manipulation

Overview of the repo

1. Set up the environment

1a Initialize the algorithm submodule

1b. Create and activate the Python environment

2. Reproduce the results

Stage 1 — Generate data and run the scoring algorithm

Stage 2 — Compute summary statistics

Stage 3 — Plot results from the paper

Expected output

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
experiments		experiments
libs		libs
report		report
results		results
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Vulnerabilities of X's Community Notes to rater bias and manipulation

Overview of the repo

1. Set up the environment

1a Initialize the algorithm submodule

1b. Create and activate the Python environment

2. Reproduce the results

Stage 1 — Generate data and run the scoring algorithm

Stage 2 — Compute summary statistics

Stage 3 — Plot results from the paper

Expected output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages