Pre-training, fine-tuning, and distillation (PFD): Automatically generating machine learning force fields from universal models

Ruoyu Wang; Yuxiang Gao; Hongyu Wu; Zhicheng Zhong

Pre-training, fine-tuning, and distillation (PFD): Automatically generating machine learning force fields from universal models

Ruoyu Wang, Yuxiang Gao, Hongyu Wu, Zhicheng Zhong

TL;DR

The paper tackles the challenge of obtaining material-specific, first-principles-accurate force fields without prohibitive data requirements. It introduces PFD, a workflow that starts from a pre-trained universal force field and uses iterative fine-tuning on small DFT datasets, followed by distillation to a fast, local-descriptor force field, facilitated by the PFD-kit. Across bulk and complex materials, PFD achieves substantial reductions in required DFT data (1–2 orders of magnitude) while delivering accuracy comparable to first-principles calculations and enabling large-scale MD simulations that are impractical with traditional training. This approach has broad implications for scalable, high-precision materials modeling, including interfaces, amorphous phases, and high-entropy systems, potentially transforming production-level force-field generation in computational materials science.

Abstract

Universal force fields generalizable across the periodic table represent a new trend in computational materials science. However, the applications of universal force fields in material simulations are limited by their slow inference speed and the lack of first-principles accuracy. Instead of building a single model simultaneously satisfying these characteristics, a strategy that quickly generates material-specific models from the universal model may be more feasible. Here, we propose a new workflow pattern, PFD (Pre-training, Fine-tuning, and Distillation), which automatically generates machine-learning force fields for specific materials from a pre-trained universal model through fine-tuning and distillation. By fine-tuning the pre-trained model, our PFD workflow generates force fields with first-principles accuracy while requiring one to two orders of magnitude less training data compared to traditional methods. The inference speed of the generated force field is further improved through distillation, meeting the requirements of large-scale molecular simulations. Comprehensive testing across diverse materials including complex systems, such as amorphous carbon, interface, etc., reveals marked enhancements in training efficiency, which suggests the PFD workflow a practical and reliable approach for force field generation in computational material sciences.

Pre-training, fine-tuning, and distillation (PFD): Automatically generating machine learning force fields from universal models

TL;DR

Abstract

Pre-training, fine-tuning, and distillation (PFD): Automatically generating machine learning force fields from universal models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)