Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations

Tomoya Shiota; Kenji Ishihara; Tuan Minh Do; Toshio Mori; Wataru Mizukami

Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations

Tomoya Shiota, Kenji Ishihara, Tuan Minh Do, Toshio Mori, Wataru Mizukami

TL;DR

The paper tackles the challenge of building a universal atomistico model by introducing Total Energy Alignment (TEA), a two-step energy-harmonization protocol that unifies datasets from different quantum-chemical fidelities via Inner Core Energy Alignment and Atomization Energy Correction. This enables training a single, open-source MLIP, MACE-Osaka24, on a combined organic-inorganic dataset with competitive accuracies across molecular and crystalline systems. TEA demonstrates effective cross-domain alignment on QM9, molecular torsions, lattice constants, liquid water, and heterogeneous nanoparticle catalysts, illustrating the potential of foundation models in chemistry. By enabling data reuse without costly recalculations, TEA democratizes multi-domain model development and paves the way for more interoperable and scalable chemistry/ materials modeling tools.

Abstract

Machine learning interatomic potentials (MLIPs) are changing atomistic simulations in the field of chemistry and materials science. However, constructing a single universal MLIP that can accurately model molecular and crystalline systems remains challenging. A central obstacle is the integration of diverse datasets generated under different computational conditions. We present Total Energy Alignment (TEA), which is an approach that enables the seamless integration of heterogeneous quantum chemical datasets without redundant calculations. Using TEA, we trained MACE-Osaka24, the first open-source MLIP model based on a unified dataset covering molecular and crystalline systems. This universal model displays strong performances across diverse chemical systems, exhibiting similar or improved accuracies in predicting organic reaction barriers compared to those of specialized models, while effectively maintaining state-of-the-art accuracies for inorganic systems. These advancements pave the way for accelerated discoveries in the fields of chemistry and materials science via genuine foundation models for chemistry.

Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations

TL;DR

Abstract

Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)