AML Alert Prioritization and Risk Scoring System

Overview

This project implements a two stage, risk based Anti Money Laundering monitoring framework that mirrors how financial institutions design and operate transaction monitoring programs. The system combines scenario based detection with model driven alert prioritization to improve investigator efficiency while maintaining interpretability and regulatory alignment.

The objective is not to automate suspicious activity reporting, but to prioritize alerts for human review in accordance with common AML regulatory expectations.

System Architecture

The framework is structured into two primary stages.

Stage 1: Scenario Based Transaction Monitoring

Stage 1 focuses on coverage and detection. Transaction level rules are applied to identify behaviors associated with known money laundering typologies.

Key characteristics:

Rolling time window feature engineering
Scenario rules aligned to CAMS typologies
Alerts generated at the account day level
Emphasis on interpretability and explainability

Example scenarios include:

High transaction velocity
Cross border activity bursts
Cash intensive behavior
Rapid transaction patterns

An alert is generated when one or more scenario rules are triggered for an account on a given day.

Stage 2: Alert Risk Scoring and Prioritization

Stage 2 focuses on efficiency and prioritization. Alerts generated in Stage 1 are enriched and ranked to determine which alerts should be reviewed first by investigators.

Stage 2 is divided into three sub steps.

Stage 2A: Alert Aggregation and Labeling

Transactions are grouped into alerts
An alert is labeled positive if any underlying transaction is labeled as laundering

Stage 2B: Alert Enrichment and Typology Attribution

Alerts are enriched with a dominant laundering typology for analysis
Typology labels are used only for validation and monitoring
Typology information is intentionally excluded from model features to avoid leakage

Stage 2C: Alert Scoring Model

Supervised models are trained to estimate alert risk
Logistic regression is used as an interpretable baseline
Gradient boosted trees are used for improved ranking performance
Alerts are ranked by risk score for investigation

End to End Monitoring Flow

Raw Transactions
    |
    v
Transaction Feature Engineering
    - Rolling counts and amounts
    - Velocity and frequency metrics
    - Cross border and cash indicators
    |
    v
Stage 1: Scenario Based Monitoring
    - High transaction velocity
    - Cross border activity bursts
    - Cash intensive behavior
    - Rapid transaction patterns
    |
    v
Alert Generation
    - Alerts created at account day level
    |
    v
Stage 2: Alert Enrichment
    - Transaction aggregation
    - Alert labeling using laundering outcomes
    - Typology attribution for validation
    |
    v
Stage 2: Alert Risk Scoring
    - Interpretable baseline model
    - Gradient boosted ranking model
    |
    v
Ranked Alerts
    - Risk scores
    - Investigator reason codes
    |
    v
Investigator Review and Decisioning

This flow illustrates how scenario based detection and model driven prioritization work together to support risk based AML investigations.

Dataset Access

The dataset is not included in this repository due to GitHub file size limits. It can be downloaded from Kaggle and placed in the data/ directory as saml_d.csv.

Key fields include:

Sender and receiver account identifiers
Transaction amount, currency, and payment type
Sender and receiver bank locations
Ground truth laundering indicator
Laundering typology labels

Evaluation Approach

Model performance is evaluated using AML relevant metrics, rather than accuracy alone.

Primary evaluation criteria:

Precision at the top 1 percent, 5 percent, and 10 percent of alerts
Concentration of true positives within investigator capacity
Stability across time based train test splits

This evaluation reflects how alert scoring models are assessed in production AML environments.

Explainability and Reason Codes

To support investigator decision making, each alert is accompanied by reason codes derived from the most extreme contributing risk factors. These reason codes highlight behaviors such as elevated transaction volume, cross border activity, or cash intensity.

This approach supports transparency and aligns with expectations for explainable AML systems.

Governance Considerations and Limitations

The system is designed for alert prioritization, not automated decisioning
Scenario thresholds require periodic tuning as behavior evolves
Model performance should be monitored for drift and stability
All alerts require human review and investigator judgment

Project Structure

risk_based_aml_monitoring/
├── aml_alert_scoring.ipynb
├── README.md
└── data/
    └── saml_d.csv

Results Summary

The two stage framework demonstrates strong alert prioritization performance in a highly imbalanced AML setting.

Key outcomes:

Alert positive rate after scenario tuning: ~1 percent
Gradient boosted model outperforms baseline logistic regression in ranking risk
Meaningful concentration of true positives within the top reviewed alerts
Investigator capacity simulation shows improved yield per day compared to random review

These results illustrate how combining scenario based monitoring with model driven prioritization can significantly improve investigative efficiency.

Key Takeaway

This project demonstrates how rule based monitoring and machine learning can be combined to create a practical, regulator friendly AML alert prioritization system that improves efficiency without sacrificing interpretability.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
aml_alert_scoring.ipynb		aml_alert_scoring.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AML Alert Prioritization and Risk Scoring System

Overview

System Architecture

Stage 1: Scenario Based Transaction Monitoring

Stage 2: Alert Risk Scoring and Prioritization

Stage 2A: Alert Aggregation and Labeling

Stage 2B: Alert Enrichment and Typology Attribution

Stage 2C: Alert Scoring Model

End to End Monitoring Flow

Dataset Access

Evaluation Approach

Explainability and Reason Codes

Governance Considerations and Limitations

Project Structure

Results Summary

Key Takeaway

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AML Alert Prioritization and Risk Scoring System

Overview

System Architecture

Stage 1: Scenario Based Transaction Monitoring

Stage 2: Alert Risk Scoring and Prioritization

Stage 2A: Alert Aggregation and Labeling

Stage 2B: Alert Enrichment and Typology Attribution

Stage 2C: Alert Scoring Model

End to End Monitoring Flow

Dataset Access

Evaluation Approach

Explainability and Reason Codes

Governance Considerations and Limitations

Project Structure

Results Summary

Key Takeaway

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages