Table of Contents
Fetching ...

Ensemble Learning of Machine Learning Force Fields

Bangchen Yin, Yue Yin, Yuda W. Tang, Hai Xiao

TL;DR

This work introduces EL-MLFFs, a stacking-based ensemble framework that fuses diverse pre-trained ML force fields through a graph neural network meta-learner to deliver accurate and stable force predictions for molecular and materials simulations. It offers two meta-model variants—a direct-fitting GNN and a conservative energy-conserving model—demonstrating substantial reductions in force errors and improved long-term stability across methane, methanol/Cu(100), MD17, and MatPES datasets. The approach scales to large, out-of-domain datasets and enables a practical efficiency–fidelity trade-off, with direct ensembles favoring speed and conservative ensembles ensuring physical conservativity. Collectively, EL-MLFFs provides a principled framework to mitigate the paradox of choice in MLFFs, enhancing reliability and generalization for both molecular dynamics and materials science applications.

Abstract

Machine learning force fields (MLFFs) are a promising approach to balance the accuracy of quantum mechanics with the efficiency of classical potentials, yet selecting an optimal model amid increasingly diverse architectures that delivers reliable force predictions and stable simulations remains a core pratical challenge. Here we introduce EL-MLFFs, an ensemble learning framework that uses a stacking methodology to integrate predictions from diverse base MLFFs. Our approach constructs a graph representation where a graph neural network (GNN) acts as a meta-model to refine the initial force predictions. We present two meta-model architectures: a computationally efficient direct fitting model and a physically-principled conservative model that ensures energy conservation. The framework is evaluated on a diverse range of systems, including single molecules (methane), surface chemistry (methanol/Cu(100)), molecular dynamics benchmarks (MD17), and the MatPES materials dataset. Results show that EL-MLFFs improves predictive accuracy across these domains. For molecular systems, it reduces force errors and improves the simulation stability compared to base models. For materials, the method yields lower formation energy errors on the WBM test set. The EL- MLFFs framework offers a systematic approach to address challenges of model selection and the accuracy-stability trade-off in molecular and materials simulations.

Ensemble Learning of Machine Learning Force Fields

TL;DR

This work introduces EL-MLFFs, a stacking-based ensemble framework that fuses diverse pre-trained ML force fields through a graph neural network meta-learner to deliver accurate and stable force predictions for molecular and materials simulations. It offers two meta-model variants—a direct-fitting GNN and a conservative energy-conserving model—demonstrating substantial reductions in force errors and improved long-term stability across methane, methanol/Cu(100), MD17, and MatPES datasets. The approach scales to large, out-of-domain datasets and enables a practical efficiency–fidelity trade-off, with direct ensembles favoring speed and conservative ensembles ensuring physical conservativity. Collectively, EL-MLFFs provides a principled framework to mitigate the paradox of choice in MLFFs, enhancing reliability and generalization for both molecular dynamics and materials science applications.

Abstract

Machine learning force fields (MLFFs) are a promising approach to balance the accuracy of quantum mechanics with the efficiency of classical potentials, yet selecting an optimal model amid increasingly diverse architectures that delivers reliable force predictions and stable simulations remains a core pratical challenge. Here we introduce EL-MLFFs, an ensemble learning framework that uses a stacking methodology to integrate predictions from diverse base MLFFs. Our approach constructs a graph representation where a graph neural network (GNN) acts as a meta-model to refine the initial force predictions. We present two meta-model architectures: a computationally efficient direct fitting model and a physically-principled conservative model that ensures energy conservation. The framework is evaluated on a diverse range of systems, including single molecules (methane), surface chemistry (methanol/Cu(100)), molecular dynamics benchmarks (MD17), and the MatPES materials dataset. Results show that EL-MLFFs improves predictive accuracy across these domains. For molecular systems, it reduces force errors and improves the simulation stability compared to base models. For materials, the method yields lower formation energy errors on the WBM test set. The EL- MLFFs framework offers a systematic approach to address challenges of model selection and the accuracy-stability trade-off in molecular and materials simulations.
Paper Structure (16 sections, 12 equations, 3 figures, 3 tables)

This paper contains 16 sections, 12 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of our ensemble learning architecture.
  • Figure 2: Performance comparison of individual (base) models and ensemble models on the methane and methanol datasets. (a) For the Methane dataset, a linear scale clearly illustrates the order-of-magnitude reduction in RMSE achieved by the ensemble method. (b) For the more complex Methanol dataset, a logarithmic scale is used, highlighting that ensemble errors are thousands of times lower than those of the individual models.
  • Figure 3: (a) Raincloud plot of RMSE values for all possible ensemble combinations on the methanol dataset, grouped by ensemble size $k$ (out of 8 base models). (b) Parity plot comparing predicted forces from the 8-model conservative ensemble to reference forces for the methane test set.