Scalable Reactive Atomistic Dynamics with GAIA
Suhwan Song, Heejae Kim, Jaehee Jang, Hyuntae Cho, Gunhee Kim, Geonu Kim
TL;DR
GAIA presents an automated, end-to-end framework for constructing diverse training datasets to train general-purpose reactive MLIPs. By coupling a metadynamics-based Nanoreactor+-driven data generator with a data-improver that targets underrepresented regions, GAIA produces Titan25, a 1.8-million-structure dataset spanning 11 elements, enabling a Titan25-trained MLIP (SNet-T25) that closely matches DFT and experimental results across detonation, CNT coalescence, interfacial adsorption, and catalytic processes. The Titan25-trained model demonstrates broad transferability, outperforming models trained on public datasets in GAIA-Bench and reproducing experimentally observed phenomena with near-ab initio fidelity. These results establish GAIA as a practical, scalable path toward universal, generalizable MLIPs capable of describing diverse materials and chemical processes in realistic conditions.
Abstract
Groundbreaking advances in materials and chemical research have been driven by the development of atomistic simulations. However, the broader applicability of atomistic simulations remains limited, as they inherently depend on energy models that are either approximate or computationally prohibitive for large-scale simulations. Machine learning interatomic potentials (MLIPs) have recently emerged as a promising class of energy models, but their deployment also remains challenging due to the scarcity of systematic protocols for generating training data spanning diverse structural regimes. Here we introduce GAIA, an end-to-end automated framework that streamlines dataset construction for the development of general-purpose reactive MLIPs. GAIA combines a metadynamics-based exploration scheme with closed-loop data expansion for the efficient sampling of a broad spectrum of atomic arrangements, thereby addressing the reliance on heuristics in conventional dataset generation. Using GAIA, we constructed Titan25, a benchmark-scale dataset, and trained an MLIP that closely matches both static and dynamic density functional theory results. The resulting model reproduces key experimental observations across distinct modes of reactivity, including detonation, coalescence, and catalytic processes. GAIA thus helps bridge the gap between simulation and experiment, paving the way toward scalable and general MLIPs capable of describing a wide range of materials and chemical processes.
