Table of Contents
Fetching ...

Enhancing Machine Learning Potentials through Transfer Learning across Chemical Elements

Sebastien Röcken, Julija Zavadlav

TL;DR

This work leverages the trained MLP for silicon to initialize and expedite the training of an MLP for germanium, and demonstrates that transfer learning surpasses traditional training from scratch in force prediction, leading to more stable simulations and improved temperature transferability.

Abstract

Machine Learning Potentials (MLPs) can enable simulations of ab initio accuracy at orders of magnitude lower computational cost. However, their effectiveness hinges on the availability of considerable datasets to ensure robust generalization across chemical space and thermodynamic conditions. The generation of such datasets can be labor-intensive, highlighting the need for innovative methods to train MLPs in data-scarce scenarios. Here, we introduce transfer learning of potential energy surfaces between chemically similar elements. Specifically, we leverage the trained MLP for silicon to initialize and expedite the training of an MLP for germanium. Utilizing classical force field and ab initio datasets, we demonstrate that transfer learning surpasses traditional training from scratch in force prediction, leading to more stable simulations and improved temperature transferability. These advantages become even more pronounced as the training dataset size decreases. The out-of-target property analysis shows that transfer learning leads to beneficial but sometimes adversarial effects. Our findings demonstrate that transfer learning across chemical elements is a promising technique for developing accurate and numerically stable MLPs, particularly in a data-scarce regime.

Enhancing Machine Learning Potentials through Transfer Learning across Chemical Elements

TL;DR

This work leverages the trained MLP for silicon to initialize and expedite the training of an MLP for germanium, and demonstrates that transfer learning surpasses traditional training from scratch in force prediction, leading to more stable simulations and improved temperature transferability.

Abstract

Machine Learning Potentials (MLPs) can enable simulations of ab initio accuracy at orders of magnitude lower computational cost. However, their effectiveness hinges on the availability of considerable datasets to ensure robust generalization across chemical space and thermodynamic conditions. The generation of such datasets can be labor-intensive, highlighting the need for innovative methods to train MLPs in data-scarce scenarios. Here, we introduce transfer learning of potential energy surfaces between chemically similar elements. Specifically, we leverage the trained MLP for silicon to initialize and expedite the training of an MLP for germanium. Utilizing classical force field and ab initio datasets, we demonstrate that transfer learning surpasses traditional training from scratch in force prediction, leading to more stable simulations and improved temperature transferability. These advantages become even more pronounced as the training dataset size decreases. The out-of-target property analysis shows that transfer learning leads to beneficial but sometimes adversarial effects. Our findings demonstrate that transfer learning across chemical elements is a promising technique for developing accurate and numerically stable MLPs, particularly in a data-scarce regime.

Paper Structure

This paper contains 17 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: Transfer learning between chemical elements. An is initially trained on a large dataset containing various configurational states $\mathbf{S}$ and corresponding forces on atoms $\mathbf{F}$ for a certain chemical element, here silicon. The resulting parameters $\mathbf{\theta}$ are then transferred to initialize the parameters of an , which is subsequently trained on a small dataset of a similar but different chemical element, here germanium.
  • Figure 2: Data efficiency of transfer learning for Stillinger-Weber example. The green and blue lines denote the test set of force (a) and energy (b) predictions for the trained with and without transfer learning, respectively. The values are averaged over five different models corresponding to different randomly selected train and validation data samples from the data pool. We perform a hyperparameter search for each mode based on the validation dataset (170 samples, 5 per temperature) as reported in the Supplementary Information.
  • Figure 3: Phonon density of states (PDOS) for Stillinger-Weber example. The results are shown for five models obtained with (b, green) and without (a, blue) transfer learning. These models are trained with a single random sample at 2000 K and compared to the reference germanium model (dashed black) trained on a large dataset containing 340 samples across the entire considered temperature range. The five models correspond to the five best hyperparameter models with different random selections of training and validation data (5 samples at 2000 K).
  • Figure 4: Temperature transferability for Stillinger-Weber example. Transfer learning models (blue) are referenced against the models trained from scratch (green) by computing the force for samples at different temperatures. For both cases, we train five models using a single sample at 2000 K and test each one on 50 samples at the respective temperature, resulting in 250 predictions in total for the 5 models. On the whisker plot, the orange line indicates the median, the box represents the interquartile range (IQR) between the first and third quartiles, and the whiskers extend to the furthest point within 1.5 times the IQR.
  • Figure 5: Data efficiency of transfer learning for the DFT example. The green and blue lines denote the test set of force (a) and energy (b) predictions for the trained with and without transfer learning, respectively. The values are averaged over five different models corresponding to different randomly selected train and validation data samples. We perform a hyperparameter search for each model based on the validation dataset (22 samples) as reported in the Supplementary Information. The black dashed line denotes the chemical accuracy of 43 meV/atom.
  • ...and 1 more figures