Table of Contents
Fetching ...

Cross Learning between Electronic Structure Theories for Unifying Molecular, Surface, and Inorganic Crystal Foundation Force Fields

Ilyes Batatia, Chen Lin, Joseph Hart, Elliott Kasoar, Alin M. Elena, Sam Walton Norwood, Thomas Wolf, Gábor Csányi

TL;DR

This work tackles the challenge of unifying interatomic potentials across molecular, surface, and inorganic crystal chemistry by developing a foundation MLIP with cross-domain learning. It introduces a strengthened MACE backbone with non-linear tensor decomposition and a multi-head replay fine-tuning protocol that transfers knowledge across domains while mitigating forgetting. Through extensive benchmarks spanning materials, molecular crystals, surfaces, and molecules, the cross-domain model (mace-mh-1-omat) achieves state-of-the-art or competitive performance across domains, often surpassing specialised baselines and demonstrating robust cross-learning. The results suggest a viable path toward a single, transferable MLIP capable of accurately simulating multi-scale chemical phenomena with broad practical impact in catalysis, materials design, and biomolecular modeling.

Abstract

Creating a single unified interatomic potential capable of attaining ab initio accuracy across all chemistry remains a long-standing challenge in computational chemistry and materials science. This work introduces a training protocol for foundation machine-learning interatomic potentials (MLIPs) that bridge molecular, surface, and materials chemistry through cross-domain learning. First, we introduce enhancements to the MACE architecture that improve its performance on chemically diverse databases by increasing weight sharing across chemical elements and introducing non-linear factors into the tensor decomposition of the product basis. Second, we develop a multi-head replay post-training methodology that enables efficient knowledge transfer across diverse chemical domains. By fine-tuning on datasets at different levels of electronic structure theory, including inorganic crystals, molecular systems, surface chemistry, and reactive organic chemistry, we demonstrate that a single unified model achieves state-of-the-art performance across several chemical domains. Comprehensive benchmarking reveals superior cross-domain transferability compared with existing specialised and multi-task models, with notable improvements in molecular and surface properties while maintaining state-of-the-art performance in materials-property prediction.

Cross Learning between Electronic Structure Theories for Unifying Molecular, Surface, and Inorganic Crystal Foundation Force Fields

TL;DR

This work tackles the challenge of unifying interatomic potentials across molecular, surface, and inorganic crystal chemistry by developing a foundation MLIP with cross-domain learning. It introduces a strengthened MACE backbone with non-linear tensor decomposition and a multi-head replay fine-tuning protocol that transfers knowledge across domains while mitigating forgetting. Through extensive benchmarks spanning materials, molecular crystals, surfaces, and molecules, the cross-domain model (mace-mh-1-omat) achieves state-of-the-art or competitive performance across domains, often surpassing specialised baselines and demonstrating robust cross-learning. The results suggest a viable path toward a single, transferable MLIP capable of accurately simulating multi-scale chemical phenomena with broad practical impact in catalysis, materials design, and biomolecular modeling.

Abstract

Creating a single unified interatomic potential capable of attaining ab initio accuracy across all chemistry remains a long-standing challenge in computational chemistry and materials science. This work introduces a training protocol for foundation machine-learning interatomic potentials (MLIPs) that bridge molecular, surface, and materials chemistry through cross-domain learning. First, we introduce enhancements to the MACE architecture that improve its performance on chemically diverse databases by increasing weight sharing across chemical elements and introducing non-linear factors into the tensor decomposition of the product basis. Second, we develop a multi-head replay post-training methodology that enables efficient knowledge transfer across diverse chemical domains. By fine-tuning on datasets at different levels of electronic structure theory, including inorganic crystals, molecular systems, surface chemistry, and reactive organic chemistry, we demonstrate that a single unified model achieves state-of-the-art performance across several chemical domains. Comprehensive benchmarking reveals superior cross-domain transferability compared with existing specialised and multi-task models, with notable improvements in molecular and surface properties while maintaining state-of-the-art performance in materials-property prediction.

Paper Structure

This paper contains 38 sections, 10 equations, 3 figures, 20 tables.

Figures (3)

  • Figure 1: Workflow for cross-domain machine learning interatomic potential development. (a) Stage 1 establishes foundation chemical knowledge through pre-training on large-scale inorganic materials data (OMAT-24). (b) Stage 2 implements multi-head fine-tuning with strategic replay across diverse chemical domains, enabling knowledge transfer while preventing catastrophic forgetting. (c) The resulting unified model is benchmarked across molecular, materials, and surface chemistry tests and achieves state-of-the-art performance.
  • Figure 2: Cross-domain performance summary of foundation interatomic potentials. (a) Polar chart of domain scores for five evaluation groups—Materials, Molecular Crystals, Surfaces, Molecules, and Physicality—where values are normalised to $[0,1]$ (1 = best; higher is better) and computed as per-metric means within each group. (b) Overall ranking by a global score, defined as a weighted sum of domain scores (Materials 0.25, Molecules 0.25, Surfaces 0.20, Molecular Crystals 0.20, Physicality 0.10). (c) Ablation within the MACE family comparing the baseline linear block (mace-omat-0), the non-linear block (mace-omat-1), and the multi-head model (mace-mh-1-omat), shown across Materials, Molecules, Surfaces, and the resulting global score. (d) Per-category model ranks (smaller is better; axis inverted so the top is rank 1) illustrating consistency across domains; the shaded region marks the global rank. The scoring procedure is described in Sec. \ref{['sec:global_overview']} and normalisation bounds and metric definitions are given in Table \ref{['tab:normalization_bounds']}.
  • Figure 3: Phonon dynamical stability classification confusion matrices. Materials are classified as unstable if $|\omega_{\mathrm{imag}}| > 0.05\,\text{THz}$ ($\approx 2.4$ K).