Public artifact for the paper:
When Point Metrics Mislead: Structure-Aware Evaluation Reveals Conditional Ranking Shifts in Time Series Anomaly Detection
Author: Youngmin Ko
TL;DR: Point-wise metrics can change TSAD model rankings when benchmark anomalies are sustained segments rather than isolated spikes.
Project page: https://tsad-eval-site.onrender.com/
Paper figures (static PNGs in docs/assets/, same as the project page):
- AUC-ROC vs Affiliation-F1 ranking flips: 14/60 (deep-model set), 44/126 (with classical baselines).
- Four audited industrial benchmarks contain no short anomaly segments under processed labels.
- SAEScore is a reporting composite, not a universal leaderboard replacement.
- TSB-AD-M audit scale: 25 models, 180 multivariate series, 4,498 recomputed model-series rows.
.
├── src/
│ ├── evaluation/ # Metric utilities used by validation
│ └── analysis/ # Taxonomy helper
├── scripts/
│ ├── validate_tab_rfr_counts.py
│ ├── compute_tsbad_alpha_stratified_rfr.py
│ └── compute_rfr_bootstrap_ci.py
├── experiments/results/ # Derived summaries only (no raw datasets)
├── docs/
│ ├── index.html # Project page (static site source)
│ ├── reproduction.md
│ ├── dataset_access.md
│ ├── artifact_manifest.md
│ └── assets/
├── tests/ # Lightweight validation tests
├── requirements.txt
├── Dockerfile
├── CITATION.cff
└── LICENSE
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtBuild the image:
docker build -t tsad-eval .Run headline validation:
docker run --rm tsad-eval python scripts/validate_tab_rfr_counts.pyOptional validation scripts:
docker run --rm tsad-eval python scripts/compute_tsbad_alpha_stratified_rfr.py
docker run --rm tsad-eval python scripts/compute_rfr_bootstrap_ci.py --n-boot 100Run headline validation:
python scripts/validate_tab_rfr_counts.py
python scripts/compute_tsbad_alpha_stratified_rfr.py
python scripts/compute_rfr_bootstrap_ci.py --n-boot 100- This public repository is optimized for derived-output verification of headline claims.
- Full raw-data reruns require obtaining datasets from their original providers.
- See
docs/reproduction.mdfor details and expected outputs.
Raw SWaT/WADI and other access-controlled raw datasets are not redistributed here.
Users are responsible for complying with upstream licenses and access terms.
The public site is built from docs/ and deployed on Render as a static site (live URL). Blueprint config lives in render.yaml at the repository root (runtime: static, staticPublishPath: docs, SKIP_INSTALL_DEPS=true so the artifact’s root requirements.txt is not installed during deploy).
You can still host the same files with GitHub Pages if you prefer: Settings -> Pages -> Deploy from a branch -> main -> /docs (mirror of the Render site).
If you use this artifact, please cite the paper. GitHub citation metadata is in CITATION.cff.
This repository is licensed under the Apache License 2.0. See LICENSE for details.

