Skip to content

osome-iu/communitynotes-manipulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vulnerabilities of X's Community Notes to rater bias and manipulation

This repository contains code to reproduce the results in the paper Community Notes are Vulnerable to Rater Bias and Manipulation.

We're interested in testing the core version of the Community Notes algorithm.

Some notes about the simulated Community Notes data:

  • It does not contain note summaries, thus topic modeling is not applicable.
  • It does not contain detailed tags (e.g., SpamHarassmentOrAbuse), thus Harassment-Abuse Tag-Consensus Matrix Factorization and Note Status Explanation rule are not applicable.
  • It assumes that every rating is posted 1 millisecond after the note, thus all ratings are valid ratings.

Changes were made to the scoring model to accommodate these details:

  • In data processing, Post Selection Similarity module is disabled.
  • In model training, only the CORE model is retained.
  • Inside the CORE model training,
    • When deciding rater helpfulness, only use the raterAgreeRatio (0.66).
    • notHelpfulSpamHarassmentOrAbuse model is disabled.
    • Low diligence model is disabled.
  • In note status scoring rule,
    • Only InitialNMR, CRH, CRNH are retained.
  • In post training phase,
    • PFLIP model is disabled.

Overview of the repo

  1. data: C++ code to generate synthetic input data for each experiment
  2. experiments: code to reproduce the results from scratch, including experiment metadata (definitions.json), main driver calling the data-generation code and scoring algorithm (run_all_experiments.sh) and post-processing scripts (count_FP_FN.py, count_filtered.py, corr_stats.py) to calculate summary metrics
  3. libs: the Community Notes scoring algorithm (git submodule)
  4. results: pre-computed summary CSVs
  5. reports: Jupyter notebooks to reproduce all figures in the paper

1. Set up the environment

1a Initialize the algorithm submodule

git submodule update --init --recursive

1b. Create and activate the Python environment

cp pyproject.toml libs/communitynotes
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install poetry
poetry install --directory libs/communitynotes --no-root

2. Reproduce the results

The results reported in the paper is already in results. You can JUMP DIRECTLY to Stage 3 and run the Jupyter notebooks to re-create the plots with this results. If you want to replicate the workflow from scratch, follow the step-by-step below.

Stage 1 — Generate data and run the scoring algorithm

experiments/run_all_experiments.sh is the main driver. For each experiment it automatically:

  1. Compiles the C++ data-generation code in data/{exp}/
  2. Runs the compiled binary to generate synthetic input data
  3. Runs the Community Notes scoring algorithm on each generated notes file

NOTE: Running this script will always do these 3 steps. Since creating the synthetic data might take some time, you might want to only perform step 1 and 2 if the data doesn't exist.

To run:

bash experiments/run_all_experiments.sh <EXP_NAME> <FILTER>

This script in turn calls experiments/run_multiple_times.sh, which just runs multiple replicates of the same argument set.

Arguments:

  • EXP_NAME: experiment name, {"bad_actor_no_bias_grid", "bad_actor_with_bias_grid", "multi_bad_actor_no_bias", "multi_bad_actor_with_bias", "multi_bad_actor_with_bias_bhvr_1", "homophily", "iu_var", "note_pol", "user_pol", "Homophily_HIGH_UPol_HIGH_NPol_HIGH", "Homophily_HIGH_UPol_HIGH_NPol_LOW", "Homophily_HIGH_UPol_LOW_NPol_HIGH", "Homophily_HIGH_UPol_LOW_NPol_LOW", "Homophily_LOW_UPol_HIGH_NPol_LOW", "Homophily_LOW_UPol_LOW_NPol_HIGH", "Homophily_LOW_UPol_LOW_NPol_LOW", "Homophily_NONE_UPol_HIGH_NPol_LOW", "Homophily_NONE_UPol_LOW_NPol_HIGH"}, or --all

    More about experiment description is in experiments/README.md

  • FILTER: helpfulness model flag, {True | False}

For example, to run all 18 experiments:

bash experiments/run_all_experiments.sh --all True

Stage 2 — Compute summary statistics

Calculate summary statistics using the generated data in experiments/ enerate summary CSVs. Each script takes 2 arguments EXP_NAME and FILTER, with the same meaning as above.

# FP/FN counts (pollution, suppression, infiltration, waste rates)
python experiments/count_FP_FN.py --exp <EXP_NAME> --helpfulness <FILTER>

# Filtered-user counts (bad-actor vs. good users removed by the algorithm)
python experiments/count_filtered.py --exp <EXP_NAME> --helpfulness <FILTER>

# Correlation statistics (inferred vs. ground-truth intercept and factor)
python experiments/corr_stats.py --exp <EXP_NAME> --helpfulness <FILTER>

These scripts write three summary CSV files per experiment:

To process all 18 experiments at once, replace --exp EXP_NAME with --all:

python experiments/count_FP_FN.py  --all --helpfulness True

More information about the data fields resulting from this computation is described in results/README.md

Stage 3 — Plot results from the paper

Notebooks using results/ to plot main figures.

To run these notebooks, first install the additional packages needed for notebooks:

pip install jupyter ipykernel tqdm

Expected output

After running Stage 2 and 3, you should be expecting the following outputs:

Synthetic input data — Each experiment sweeps over a grid of parameter combinations; for each combination, the C++ binary produces five TSV files:

data/{exp}/data/
└── {params}-notes.tsv            # synthetic note ratings (input to the scoring algorithm)
    {params}-ratings.tsv          # per-rater ratings
    {params}-TrueNoteParams.tsv   # ground-truth note polarity / intercept parameters
    {params}-TrueUserParams.tsv   # ground-truth user polarity parameters
    {params}-userEnrollment.tsv   # user enrollment records

Algorithm output — For each parameter set the scoring algorithm runs once (run_0) and produces:

results/{exp}/output_helpfulness_{True|False}/
└── {params}-notes/
    ├── run_0/
    │   ├── scored_notes.tsv        # notes with inferred intercept and factor scores
    │   ├── helpfulness_scores.tsv  # per-user helpfulness / reputation scores
    │   ├── aux_note_info.tsv       # auxiliary note metadata
    │   ├── note_status_history.tsv # note status over time
    │   └── main_tiny.log          # algorithm run log
    └── timing_results_0.1.csv     # wall-clock timing for this run

Summary output

experiments/{exp}/
├── FP_count/
│   ├── helpfulness_{True|False}.csv    # FP/FN counts with reputation scoring
├── corr_stats/
│   ├── helpfulness_{True|False}.csv    # Pearson correlation statistics
└── filtered_count/
    └── helpfulness_{True|False}.csv    # filter counts with reputation scoring

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors