Are Foundational Atomistic Models Reliable for Finite-Temperature Molecular Dynamics?
Denan Li, Jiyuan Yang, Xiangkai Chen, Lintao Yu, Shi Liu
TL;DR
This Perspective critically evaluates foundational atomistic models (universal ML force fields) for finite-temperature MD using PbTiO3 as a focused PTO-test, questioning whether static accuracy translates into reliable dynamic performance. It combines ground-state (static) assessments with finite-temperature MD to reveal a potential disconnect: several models reproduce ground-state structure and phonons yet fail to capture temperature-driven phase transitions or exhibit instabilities under MD. A key insight is that training data quality and the choice of exchange–correlation functionals at training time strongly influence dynamic reliability, and simple fine-tuning can substantially improve agreement with known physics, albeit at data and cost considerations. The work highlights the practical challenges of adopting foundational atomistic models—data diversity, scalability, and software integration—while proposing hybrid training strategies and targeted benchmarking as pragmatic paths forward toward robust, scalable MD for materials discovery.
Abstract
Machine learning force fields have emerged as promising tools for molecular dynamics (MD) simulations, potentially offering quantum-mechanical accuracy with the efficiency of classical MD. Inspired by foundational large language models, recent years have seen considerable progress in developing foundational atomistic models, sometimes referred to as universal force fields, designed to cover most elements in the periodic table. This Perspective adopts a practitioner's viewpoint to ask a critical question: Are these foundational atomistic models reliable for one of their most compelling applications, in particular simulating finite-temperature dynamics? Instead of a broad benchmark, we use the canonical ferroelectric-paraelectric phase transition in PbTiO$_3$ as a focused case study to evaluate prominent foundational atomistic models. Our findings suggest a potential disconnect between static accuracy and dynamic reliability. While 0 K properties are often well-reproduced, we observed that the models can struggle to consistently capture the correct phase transition, sometimes exhibiting simulation instabilities. We believe these challenges may stem from inherent biases in training data and a limited description of anharmonicity. These observed shortcomings, though demonstrated on a single system, appear to point to broader, systemic challenges that can be addressed with targeted fine-tuning. This Perspective serves not to rank models, but to initiate a crucial discussion on the practical readiness of foundational atomistic models and to explore future directions for their improvement.
