This project implements a two stage, risk based Anti Money Laundering monitoring framework that mirrors how financial institutions design and operate transaction monitoring programs. The system combines scenario based detection with model driven alert prioritization to improve investigator efficiency while maintaining interpretability and regulatory alignment.
The objective is not to automate suspicious activity reporting, but to prioritize alerts for human review in accordance with common AML regulatory expectations.
The framework is structured into two primary stages.
Stage 1 focuses on coverage and detection. Transaction level rules are applied to identify behaviors associated with known money laundering typologies.
Key characteristics:
- Rolling time window feature engineering
- Scenario rules aligned to CAMS typologies
- Alerts generated at the account day level
- Emphasis on interpretability and explainability
Example scenarios include:
- High transaction velocity
- Cross border activity bursts
- Cash intensive behavior
- Rapid transaction patterns
An alert is generated when one or more scenario rules are triggered for an account on a given day.
Stage 2 focuses on efficiency and prioritization. Alerts generated in Stage 1 are enriched and ranked to determine which alerts should be reviewed first by investigators.
Stage 2 is divided into three sub steps.
- Transactions are grouped into alerts
- An alert is labeled positive if any underlying transaction is labeled as laundering
- Alerts are enriched with a dominant laundering typology for analysis
- Typology labels are used only for validation and monitoring
- Typology information is intentionally excluded from model features to avoid leakage
- Supervised models are trained to estimate alert risk
- Logistic regression is used as an interpretable baseline
- Gradient boosted trees are used for improved ranking performance
- Alerts are ranked by risk score for investigation
Raw Transactions
|
v
Transaction Feature Engineering
- Rolling counts and amounts
- Velocity and frequency metrics
- Cross border and cash indicators
|
v
Stage 1: Scenario Based Monitoring
- High transaction velocity
- Cross border activity bursts
- Cash intensive behavior
- Rapid transaction patterns
|
v
Alert Generation
- Alerts created at account day level
|
v
Stage 2: Alert Enrichment
- Transaction aggregation
- Alert labeling using laundering outcomes
- Typology attribution for validation
|
v
Stage 2: Alert Risk Scoring
- Interpretable baseline model
- Gradient boosted ranking model
|
v
Ranked Alerts
- Risk scores
- Investigator reason codes
|
v
Investigator Review and Decisioning
This flow illustrates how scenario based detection and model driven prioritization work together to support risk based AML investigations.
The dataset is not included in this repository due to GitHub file size limits.
It can be downloaded from Kaggle and placed in the data/ directory as saml_d.csv.
Key fields include:
- Sender and receiver account identifiers
- Transaction amount, currency, and payment type
- Sender and receiver bank locations
- Ground truth laundering indicator
- Laundering typology labels
Model performance is evaluated using AML relevant metrics, rather than accuracy alone.
Primary evaluation criteria:
- Precision at the top 1 percent, 5 percent, and 10 percent of alerts
- Concentration of true positives within investigator capacity
- Stability across time based train test splits
This evaluation reflects how alert scoring models are assessed in production AML environments.
To support investigator decision making, each alert is accompanied by reason codes derived from the most extreme contributing risk factors. These reason codes highlight behaviors such as elevated transaction volume, cross border activity, or cash intensity.
This approach supports transparency and aligns with expectations for explainable AML systems.
- The system is designed for alert prioritization, not automated decisioning
- Scenario thresholds require periodic tuning as behavior evolves
- Model performance should be monitored for drift and stability
- All alerts require human review and investigator judgment
risk_based_aml_monitoring/
├── aml_alert_scoring.ipynb
├── README.md
└── data/
└── saml_d.csv
The two stage framework demonstrates strong alert prioritization performance in a highly imbalanced AML setting.
Key outcomes:
- Alert positive rate after scenario tuning: ~1 percent
- Gradient boosted model outperforms baseline logistic regression in ranking risk
- Meaningful concentration of true positives within the top reviewed alerts
- Investigator capacity simulation shows improved yield per day compared to random review
These results illustrate how combining scenario based monitoring with model driven prioritization can significantly improve investigative efficiency.
This project demonstrates how rule based monitoring and machine learning can be combined to create a practical, regulator friendly AML alert prioritization system that improves efficiency without sacrificing interpretability.