aims-PAX: Parallel Active eXploration for the automated construction of Machine Learning Force Fields
Tobias Henkes, Shubham Sharma, Alexandre Tkatchenko, Mariana Rossi, Igor Poltavskyi
TL;DR
aims-PAX addresses the data-inefficiency of ML force-field development by pairing automated initial data generation with parallel multi-trajectory active learning. It integrates GP-MLFFs, the MACE framework, and the FHI-aims DFT pipeline under a scalable Parsl-based workload manager, enabling rapid, transferable model construction with minimal human intervention. Across a flexible peptide, MD17 small molecules, solvated paracetamol, and CsPbI3 perovskite benchmarks, aims-PAX delivers comparable accuracy to larger curated datasets while reducing DFT labeling and training time by orders of magnitude. The framework thus provides a scalable, versatile platform for automated, data-efficient atomistic simulations applicable to both academic and industrial settings.
Abstract
Recent advances in machine learning force fields (MLFF) have significantly extended the reach of atomistic simulations. Continuous progress in this field requires reliable reference datasets, accurate MLFF architectures, and efficient active learning strategies to enable robust modeling of complex molecular and material systems. Here we introduce aims-PAX, an expedited, multi-trajectory active learning framework that streamlines the development of stable and accurate MLFFs. Designed for a wide range of researchers, aims-PAX offers a modular, high-performance workflow that couples diversified sampling with scalable training across CPU and GPU architectures. Integrated with the widely used ab initio code FHI-aims, the framework supports state-of-the-art ML models and dataset generation using general-purpose (or "foundational") force-fields for rapid deployment in diverse systems. We demonstrate the capabilities of aims-PAX in various challenging tasks: creating datasets and models for highly flexible peptides, multiple organic molecules at once, explicitly solvated molecules, and for efficiently handling computationally demanding systems such as the CsPbI$_3$ perovskite. We show that aims-PAX achieves a reduction of up to three orders of magnitude in the number of required reference calculations, automatically selects challenging systems within a given chemical space, facilitates simulation of solvated molecules with more than thousand atoms, while enabling a ten-fold speedup in active-learning time through optimized resource utilization. This positions aims-PAX as a powerful and versatile platform for next-generation atomistic simulations in both academic and industrial settings.
