Skip to content

rahulumrao/sparc

Repository files navigation

SPARC

License: MIT Python GitHub stars GitHub forks Ruff Documentation Status CI DOI

Smart Potential with Atomistic Rare Events and Continuous Learning

drawing

For More Information, Please Visit SPARC Documentation.

Try SPARC Tutorial Open In Colab

Overview

SPARC is a Python package build around the ASE wrapper that implements an automated workflow of developing machine learning potential for reactive chemical systems. It automates the process of identifying new structures in the configurational space without having to run initial ab-initio MD simulations. SPARC is designed to work seamlessly within the Python framewrok to efficiently improve ML model.

Key Features

  • Automated active learning workflow
  • Ab initio molecular dynamics (AIMD) using VASP and CP2K
  • Machine learning potential training with DeepMD-kit
  • ML/MD simulations and iterative model refinement
  • Monitor atomic force deviations and query-by-committee to identify new configurations
  • Reactive trajectory generation with PLUMED integration

Requirements

Core Dependencies

  • Python 3.xx
  • DeepMD-kit (version: 2.2.10)
  • ASE (Atomic Simulation Environment)
  • VASP (First-Principle Calculations)
  • PLUMED (PES Exploration)

Python Package Dependencies

  • numpy
  • pandas
  • dpdata
  • cython
  • pandoc
  • nbsphinx

Installation

  1. Create and activate a conda environment:
conda create -n sparc python=3.10
conda activate sparc
  1. Use any of following methods to install Deepmd-kit:
  • pip

pip install deepmd-kit[gpu,cu12]==2.2.10
  • conda-forge

conda install deepmd-kit=2.2.10=*gpu libdeepmd=2.2.10=*gpu lammps horovod -c https://conda.deepmodeling.com -c defaults
  1. Clone repository and install pacakge:
git clone --depth 1 https://github.com/rahulumrao/sparc.git
cd sparc
pip install .

Note

Some Collective Variables (CVs), such as Generic CVs (e.g., SPRINT), are part of the additional module and are not included in a standard PLUMED installation. To enable them, we need to manually install PLUMED and wrap with Python environment:

  1. Install PLUMED:

Download PLUMED package from the website, and install with the following flags (make sure conda env is active):

./configure --enable-mpi=no --enable-modules=all PYTHON_BIN=$(which python) --prefix=$CONDA_PREFIX

make -j$(nproc) && make install

Refer to the official PLUMED installation page for more details.

If you don’t need additional modules, you can skip the manual installation and install PLUMED directly from conda-forge.

conda install -c conda-forge py-plumed

Quick Start

  1. Set Environment Variables:
export VASP_PP_PATH=/path/to/vasp/potcar_files    # POTCAR files path

If you have installed PLUMED manually (skip if you used conda-forge), you also need to set PLUMED environment before running the code:

export PLUMED_KERNEL="$CONDA_PREFIX/lib/libplumedKernel.so"
export PYTHONPATH="$CONDA_PREFIX/lib/plumed/python:$PYTHONPATH"
  1. Prepare input file (see example below)
  • navigate to scripts folder for full input tempelate

Example Input File

general:
  structure_file: "POSCAR"   # Input structure

md_simulation:
  ensemble: "NVT"            # Ensemble for MD simulation
  thermostat: "Nose"         # Thermostat type (nose-Hoover)
  timestep_fs: 1.0           # TimeStep for MD simulation
  md_steps: 10               # Number of MD Steps
  temperature: 300           # Temperature in Kelvin
  log_frequency: 4           # Interval for MD log and save trajectories
  use_dft_plumed: False      # Use PLUMED for MD simulation

dft_calculator:
  name: "VASP"               # DFT package name
  prec: "Normal"             # Precision level
  kgamma: True               # Gamma point calculation
  incar_file: "INCAR"        # Path/Name of VASP input file

# Active Learning
active_learning: True        # Active Learning protocol
iteration: 10                # Number of Active Learning iteration
model_dev:
  f_min_dev: 0.1             # Force uncertainity cutoff/s
  f_max_dev: 0.8

Once the installation is complete and required dependencies are setup, follow these steps to run SPARC.

Ensure you have the necessary input files (eg., input.yaml, input.json, INCAR, POSCAR). You can find a template in scripts/input.yaml in the root directory.

  1. Run SPARC
sparc -i input.yaml

Monitor log and output stored in iter_xxxxxx directories.

Directory Structure

>> Project Root
├── INCAR
├── POSCAR
├── input.json
├── input.yaml
├──── Dataset
│      └── training_data
│      └── validation_data
├── iter_000000
│   ├── 00.dft
│   ├── 01.train
│   └── 02.dpmd
├── iter_000001
    ├── 00.dft
    ├── 01.train
    └── 02.dpmd

Core Components

1. MD Simulation

  • Supports both ab initio and ML molecular dynamics within ASE MD engine
  • NVT ensemble with Nose-Hoover and Langevin thermostat
  • Checkpoint/restart capabilities
  • PLUMED integration for accelerated configuration space sampling

2. DeepMD Training

  • Automated model training
  • Ensemble model generation
  • Configurable network architecture and training parameters

3. Active Learning

  • Query by Committee approach for configuration selection
  • Atomic force based error metrics
  • Automated structure labeling and retraining

Current Status

  • Fixed model update in active learning iterations restart with added key:
    • learning_restart: True
    • latest_model: 'path/to/frozen_model.pb'
  • Structured log formatting for better readability
  • Implemented Umbrella Sampling for reaction study on-the-fly
  • Utility tools for analysing model accuracy, active learning status, and structural properties.

Planned Updates

  • Support for ORCA, Psi4 and xTB calculators
  • Documentation under development

Important

There are some version dependencies, currently the latest version of deepmd-kit is not supported. Check DeePMD documentation for installation of older version.

Limitations

  • Currently only supports DeepMD-kit 2.2.10 (newer versions not yet supported)
  • Documentation is still being developed

Known Issue

Important

  • Deepmd-kit pip install tensorflow[and-cuda] installation soetimes does not detect GPU.
  • To verify if TensorFlow detects your GPU, run the following command:
    python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Check TensorFlow pip installation page to fix this. \

Some hardware have also shown issues with conda channels

LibMambaUnsatisfiableError: Encountered problems while solving:
 - nothing provides __cuda needed by libdeepmd-2.2.10-0_cuda10.2_gpu
 - nothing provides __cuda needed by tensorflow-2.9.0-cuda102py310h7cc18f4_0
- Could not solve for environment specs
- The following packages are incompatible
- ├─ deepmd-kit 2.2.10 *gpu is not installable because it requires
- │  └─ tensorflow 2.9.* cuda*, which requires
- │     └─ __cuda, which is missing on the system;
- └─ libdeepmd 2.2.10 *gpu is not installable because it requires
-  └─ __cuda, which is missing on the system.

Build Document Locally

pip install sphinx sphinx-autodoc-typehints sphinx_rtd_theme

cd docs/
make html

This will create a html file in a build folder, open index.html in any browser.

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Code Style and Linting

We used ruff and pre-commit for code styling and linting to keep the database consistant. Configurations are defined inside the pyproject.toml and pre-commit-config.yaml file.

pip install ruff
pip install pre-commit

After installation, run all hooks

pre-commit run --all-files

Warning

This package is under active development. Features and APIs may change.
Also, this code is designed to work in a Linux environment. It may not be fully compatible with macOS systems.

Citation

If you use this software or the dataset in your research, please cite:

@article{joss,
  author  = {Verma, Rahul and Joshi, Nisarg and Pfaendtner, Jim},
  title   = {{SPARC}: An Automated Workflow Toolkit for Accelerated Active Learning of Reactive Machine Learning Interatomic Potentials},
  journal = {Journal of Open Source Software},
  volume  = {11},
  number  = {120},
  pages   = {9468},
  year    = {2026},
  month   = {apr},
  doi     = {10.21105/joss.09468},
  url     = {https://doi.org/10.21105/joss.09468}
}

@software{sparc,
  author = {Verma, Rahul and Joshi, Nisarg and Pfaendtner, Jim},
  doi    = {https://doi.org/10.5281/zenodo.19389278},
  license = {MIT},
  month  = {Apr},
  title  = {{SPARC}: An Automated Workflow Toolkit for Accelerated Active Learning of Reactive Machine Learning Interatomic Potentials},
  url    = {https://github.com/rahulumrao/sparc},
  year   = {2026}
}

@dataset{sparc,
  author = {Verma, Rahul and Joshi, Nisarg and Pfaendtner, Jim},
  doi    = {https://doi.org/10.5281/zenodo.18261342},
  license = {MIT},
  month  = {jan},
  title  = {{SPARC}: An Automated Workflow Toolkit for Accelerated Active Learning of Reactive Machine Learning Interatomic Potentials},
  url    = {https://zenodo.org/records/18261342},
  year   = {2026}
}

About

Smart Potential with Atomistic Rare Events and Continuous Learning

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors