Table of Contents
Fetching ...

Bias in Universal Machine-Learned Interatomic Potentials and its Effects on Fine-Tuning

Nicolas Wong, Julia H. Yang

TL;DR

The results find that naive fine-tuning generates constrained datasets that fail to represent MD simulations, and thus downstream fine-tuned models fail during extrapolation, so periodic fine-tuning yields models which are more generalizable and accurate, producing low-error dynamics.

Abstract

Universal machine learned interatomic potentials (uMLIPs) embody a growing area of interest due to their transferability across the periodic table, displaying an error of about 0.6 kcal/mol against the Matbench Discovery test set. However, we show that achieving more accurate predictions on out-of-domain tasks requires fine-tuning. Additionally, we investigate the existence and influence of model biases in molecular dynamics (MD) by examining two approaches for data generation: from multiple MD trajectories in parallel, which we call naive fine-tuning, and from a single MD trajectory with fine-tuning after set intervals, which we call periodic fine-tuning. Our results find that naive fine-tuning generates constrained datasets that fail to represent MD simulations, and thus downstream fine-tuned models fail during extrapolation. In contrast, periodic fine-tuning yields models which are more generalizable and accurate, producing low-error dynamics. These findings indicate the role of uMLIP bias in fine-tuning, and highlights the need for multiple fine-tuning steps. Lastly, we relate unphysical behavior to principal component space, and quantify extrapolations through Q-residual analysis, which are useful as a proxy for epistemic uncertainty for larger simulations.

Bias in Universal Machine-Learned Interatomic Potentials and its Effects on Fine-Tuning

TL;DR

The results find that naive fine-tuning generates constrained datasets that fail to represent MD simulations, and thus downstream fine-tuned models fail during extrapolation, so periodic fine-tuning yields models which are more generalizable and accurate, producing low-error dynamics.

Abstract

Universal machine learned interatomic potentials (uMLIPs) embody a growing area of interest due to their transferability across the periodic table, displaying an error of about 0.6 kcal/mol against the Matbench Discovery test set. However, we show that achieving more accurate predictions on out-of-domain tasks requires fine-tuning. Additionally, we investigate the existence and influence of model biases in molecular dynamics (MD) by examining two approaches for data generation: from multiple MD trajectories in parallel, which we call naive fine-tuning, and from a single MD trajectory with fine-tuning after set intervals, which we call periodic fine-tuning. Our results find that naive fine-tuning generates constrained datasets that fail to represent MD simulations, and thus downstream fine-tuned models fail during extrapolation. In contrast, periodic fine-tuning yields models which are more generalizable and accurate, producing low-error dynamics. These findings indicate the role of uMLIP bias in fine-tuning, and highlights the need for multiple fine-tuning steps. Lastly, we relate unphysical behavior to principal component space, and quantify extrapolations through Q-residual analysis, which are useful as a proxy for epistemic uncertainty for larger simulations.
Paper Structure (9 sections, 7 equations, 9 figures, 4 tables)

This paper contains 9 sections, 7 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Schematic of the fine-tuning dataset generation strategies (energy not to scale). Each circle denotes a starting configuration for the most recent fine-tuned potential, arrows represent MD trajectories, with equilibration and production runs separated by dashed lines. (a) The left panel describes the naive data generation workflow, where several independent trajectories are sampled in parallel to generate a dataset. (b) The right panel describes the periodic training workflow, where a single trajectory is sampled and the potential is fine-tuned at fixed-intervals.
  • Figure 2: Testing errors during 0.5 ns of dynamics as residuals against DFT. The y-axis represents the residual as MD against DFT (EMLIP-EDFT). For either workflow, the x-axis represents the model used to generate the trajectory, with colors to indicate different models. Lighter colors are models trained on fewer images, and darker colors are models trained on more images. The naive workflow (a) plateaus at about 10 meV/at, and the periodic workflow plateaus at about -5 meV/at, as indicated by the horizontal black lines on each plot.
  • Figure 3: Principal component analysis of the individual 50-point datasets (naive or periodic), with regions labeled by atom type. The first two principal components account for 41.82% of explained variance. Panels (a) and (b) show the full datasets, while panels (c) and (d) display only the unique regions sampled by each method, which is represented visually by overlaying white coloring on top of the corresponding dataset, constituting for example, Naive$\notin$(Periodic). The left panels (a, c) correspond to the naive workflow, and the right panels (b, d) correspond to the periodic workflow.
  • Figure 4: Principal component analysis of each dataset from the naive and periodic approaches, with regions labeled by atom type, and bond-length distributions of O-H bonds. Each color represents the unique coverage contributed by the most current iteration (lighter colors are earlier models). Panel (a) corresponds to datasets used to generate N-10pts, N-21pts, N-31pts, N-40pts, and N-50pts respectively. Panel (b) likewise corresponds to datasets used to generate FT1, FT2, FT3, FT4, and FT5. The arrows in panel (b) represent movement from data generated from the universal potential to data generated from FT1 to FT5. Panels (c) and (d) represent the distribution of O-H bond lengths between data generated by (c) only the universal model and (d) the universal and a fine-tuned model. Panel (d) relates the bond length distribution to the PCA distribution shift with arrows.
  • Figure 5: A summary of 8 ns production trajectories generated by N-50pts and FT5. Panels (a) and (b) summarize the 8 ns production trajectories in terms of potential energy, with model predicted energy (solid line), cross-evaluated energy (dashed line), and DFT reference (circles) smapled every 0.1 ns. Panels (c, d, e) highlight time ranges t1=[2.86, 3.36] ns, t2=[5.0525, 5.5525], and t3=[5.75, 6.25], during the sampling of N-50ps (a) where artifacts occur. For each range, representitive snapshots illustrate three figures of the artifacts: panels (c-d) illustrate deprotonation reactions between chlorine (green), oxygen (red), and hydrogen (white), while panel (e) demonstrates a change in cobalt's (pink) coordination environment from CoCl3 to CoCl4.
  • ...and 4 more figures