Metatensor and metatomic: foundational libraries for interoperable atomistic machine learning
Filippo Bigi, Joseph W. Abbott, Philip Loche, Arslan Mazitov, Davide Tisi, Marcel F. Langer, Alexander Goscinski, Paolo Pegolo, Sanggyu Chong, Rohit Goswami, Pol Febrer, Sofiia Chorna, Matthias Kellner, Michele Ceriotti, Guillaume Fraux
TL;DR
The paper tackles interoperability barriers in atomistic ML by introducing metatensor, a gradient-aware, block-sparse data format, and metatomic, a portable ML-model interface. It defines a robust data container and a universal model-exchange protocol to enable seamless data/model sharing across diverse simulation engines. A modular ecosystem (metatrain, featomic, torch-spex, torch-pme, vesin, sphericart) and example models (PET-MAD, ShiftML, FlashMD) demonstrate end-to-end workflows from training to deployment in LAMMPS, ASE, i-PI, PLUMED, and beyond. The results show minimal runtime overhead for metatomic in production-like runs and reveal broad applicability across short- and long-range interactions, collective-variable workflows, and quantum-sampled simulations.
Abstract
Incorporation of machine learning (ML) techniques into atomic-scale modeling has proven to be an extremely effective strategy to improve the accuracy and reduce the computational cost of simulations. It also entails conceptual and practical challenges, as it involves combining very different mathematical foundations, as well as software ecosystems that are very well developed in their own right, but do not share many commonalities. To address these issues and facilitate the adoption of ML in atomistic simulations, we introduce two dedicated software libraries. The first one, metatensor, provides multi-platform and multi-language storage and manipulation of arrays with many potentially sparse indices, designed from the ground up for atomistic ML applications. By combining the actual values with metadata that describes their nature and that facilitates the handling of geometric information and gradients with respect to the atomic positions, metatensor provides a common framework to enable data sharing between ML software -- typically written in Python -- and established atomistic modeling tools -- typically written in Fortran, C or C++. The second library, metatomic, provides an interface to store an atomistic ML model and metadata about this model in a portable way, facilitating the implementation, training and distribution of models, and their use across different simulation packages. We showcase a growing ecosystem of tools, including low-level libraries, training utilities, and interfaces with existing software packages that demonstrate the effectiveness of metatensor and metatomic in bridging the gap between traditional simulation software and modern ML frameworks.
