AiiDA-TrainsPot: Towards automated training of neural-network interatomic potentials
Davide Bidoggia, Nataliia Manko, Maria Peressi, Antimo Marrazzo
TL;DR
AiiDA-TrainsPot presents a fully automated, code-agnostic workflow for training neural-network interatomic potentials by integrating DFT labeling, dataset augmentation, and MD-based exploration within the AiiDA provenance framework. The method relies on a calibrated committee-disagreement criterion to selectively label configurations, enabling data-efficient active learning that scales to diverse materials such as carbon allotropes and W$_x$Mo$_{1-x}$Te$_2$ monolayers, and even fine-tuning foundation models. Validation demonstrates strong accuracy and transferability, with two carbon-validation campaigns showing RMSEs in the meV–Å range and the ability to capture vibrational properties and defect energetics, while alloy benchmarks highlight robust phase-stability predictions. The work emphasizes reproducibility, modularity, and extensibility, offering a practical path toward democratizing access to high-accuracy NNIPs and enabling integration with future data-driven materials-design pipelines.
Abstract
Crafting neural-network interatomic potentials (NNIPs) remains a complex task, demanding specialized expertise in both machine learning and electronic-structure calculations. Here, we introduce AiiDA-TrainsPot, an automated, open-source, and user-friendly workflow that streamlines the creation of accurate NNIPs by orchestrating density-functional-theory calculations, data augmentation strategies, and classical molecular dynamics. Our active-learning strategy leverages on-the-fly calibration of committee disagreement against ab initio reference errors to ensure reliable uncertainty estimates. We use electronic-structure descriptors and dimensionality reduction to analyze the efficiency of this calibrated criterion, and show that it minimizes both false positives and false negatives when deciding what to compute from first principles. AiiDA-TrainsPot has a modular design that supports multiple NNIP backends, enabling both the training of NNIPs from scratch and the fine-tuning of foundation models. We demonstrate its capabilities through automated training campaigns targeting pristine and defective carbon allotropes, including amorphous carbon, as well as structural phase transitions in monolayer $\mathrm{W_xMo_{1-x}Te_2}$ alloys.
