Beyond Force Metrics: Pre-Training MLFFs for Stable MD Simulations
Shagun Maheshwari, Zhengxian Tang, Janghoon Ock, Adeesh Kolluru, Amir Barati Farimani, John R. Kitchin
TL;DR
Problem: ML force fields can achieve low force prediction errors yet yield unstable MD trajectories, undermining long-time simulations. Approach: compare direct MD17 aspirin training to OC20 pretraining followed by MD17 fine-tuning using GemNet-T, evaluating stability, latent structure, and local force behavior. Key findings: pre-training yields dramatically longer stable trajectories, more structured latent representations, smoother local force responses, and better local force-difference consistency, with only modest gains in force MAE. Significance: demonstrates that large-scale pretraining enhances robustness and generalization of MLFFs for MD, guiding evaluation metrics beyond force error alone.
Abstract
Machine-learning force fields (MLFFs) have emerged as a promising solution for speeding up ab initio molecular dynamics (MD) simulations, where accurate force predictions are critical but often computationally expensive. In this work, we employ GemNet-T, a graph neural network model, as an MLFF and investigate two training strategies: (1) direct training on MD17 (10K samples) without pre-training, and (2) pre-training on the large-scale OC20 dataset followed by fine-tuning on MD17 (10K). While both approaches achieve low force mean absolute errors (MAEs), reaching 5 meV/A per atom, we find that lower force errors do not necessarily guarantee stable MD simulations. Notably, the pre-trained GemNet-T model yields significantly improved simulation stability, sustaining trajectories up to three times longer than the model trained from scratch. By analyzing local properties of the learned force fields, we find that pre-training produces more structured latent representations, smoother force responses to local geometric changes, and more consistent force differences between nearby configurations, all of which contribute to more stable and reliable MD simulations. These findings underscore the value of pre-training on large, diverse datasets to capture complex molecular interactions and highlight that force MAE alone is not always a sufficient metric of MD simulation stability.
