Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

Yi Cao; Paulette Clancy

Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

Yi Cao, Paulette Clancy

TL;DR

This work introduces migration-based probes as a general, physics-informed benchmark to compare specialist MLFFs trained from scratch against foundation-model fine-tuning for Cr-doped Sb$_2$Te$_3$. It demonstrates that fine-tuning sharply improves kinetic predictions but can degrade long-range physics, while foundation models offer robust extrapolation yet may require system-specific sharpening. Latent-space analyses reveal fundamentally different encodings across training strategies, explaining why models diverge on non-equilibrium pathways. The framework guides data-efficient active learning and stresses the importance of evaluating dynamic properties alongside equilibrium metrics for reliable MLFF deployment.

Abstract

Machine-learned force fields (MLFFs), especially pre-trained foundation models, are transforming computational materials science by enabling ab initio-level accuracy at molecular dynamics scales. Yet their rapid rise raises a key question: should researchers train specialist models from scratch, fine-tune generalist foundation models, or use hybrid approaches? The trade-offs in data efficiency, accuracy, cost, and robustness to out-of-distribution failure remain unclear. We introduce a benchmarking framework using defect migration pathways, evaluated through nudged elastic band trajectories, as diagnostic probes that test both interpolation and extrapolation. Using Cr-doped Sb2Te3 as a representative two-dimensional material, we benchmark multiple training paradigms within the MACE architecture across equilibrium, kinetic (atomic migration), and mechanical (interlayer sliding) tasks. Fine-tuned models substantially outperform from-scratch and zero-shot approaches for kinetic properties but show partial loss of long-range physics. Representational analysis reveals distinct, non-overlapping latent encodings, indicating that different training strategies learn different aspects of system physics. This framework provides practical guidelines for MLFF development and establishes migration-based probes as efficient diagnostics linking performance to learned representations, guiding future uncertainty-aware active learning.

Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

TL;DR

Abstract

Migration as a Probe: A Generalizable Benchmark Framework for Specialist vs. Generalist Machine-Learned Force Fields

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)