Table of Contents
Fetching ...

LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields

Joshua A. Vita, Amit Samanta, Fei Zhou, Vincenzo Lordi

TL;DR

The paper tackles the high cost and calibration issues of ensemble-based uncertainty quantification in deep learning atomistic force fields by introducing LTAU, which leverages per-sample training error PDFs and a latent-space nearest-neighbor search to estimate the full error PDF at test points. When instantiated as LTAU-FF on NequIP, the method delivers well-calibrated confidence intervals and strong correlation with true errors near the training domain, while achieving 2–3 orders of magnitude speedups over ensembles. It enables practical tasks such as out-of-domain detection, training data re-weighting, and predicting failure during simulations (e.g., OC20 IS2RS), and demonstrates robust performance on ID data with clear limitations for OOD data due to latent-space distance. The approach offers a broadly applicable, low-overhead UQ alternative for regression tasks in materials science and beyond, with potential for further refinements in OOD handling and distance-based calibration.

Abstract

Model ensembles are effective tools for estimating prediction uncertainty in deep learning atomistic force fields. However, their widespread adoption is hindered by high computational costs and overconfident error estimates. In this work, we address these challenges by leveraging distributions of per-sample errors obtained during training and employing a distance-based similarity search in the model latent space. Our method, which we call LTAU, efficiently estimates the full probability distribution function (PDF) of errors for any test point using the logged training errors, achieving speeds that are 2--3 orders of magnitudes faster than typical ensemble methods and allowing it to be used for tasks where training or evaluating multiple models would be infeasible. We apply LTAU towards estimating parametric uncertainty in atomistic force fields (LTAU-FF), demonstrating that its improved ensemble diversity produces well-calibrated confidence intervals and predicts errors that correlate strongly with the true errors for data near the training domain. Furthermore, we show that the errors predicted by LTAU-FF can be used in practical applications for detecting out-of-domain data, tuning model performance, and predicting failure during simulations. We believe that LTAU will be a valuable tool for uncertainty quantification (UQ) in atomistic force fields and is a promising method that should be further explored in other domains of machine learning.

LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields

TL;DR

The paper tackles the high cost and calibration issues of ensemble-based uncertainty quantification in deep learning atomistic force fields by introducing LTAU, which leverages per-sample training error PDFs and a latent-space nearest-neighbor search to estimate the full error PDF at test points. When instantiated as LTAU-FF on NequIP, the method delivers well-calibrated confidence intervals and strong correlation with true errors near the training domain, while achieving 2–3 orders of magnitude speedups over ensembles. It enables practical tasks such as out-of-domain detection, training data re-weighting, and predicting failure during simulations (e.g., OC20 IS2RS), and demonstrates robust performance on ID data with clear limitations for OOD data due to latent-space distance. The approach offers a broadly applicable, low-overhead UQ alternative for regression tasks in materials science and beyond, with potential for further refinements in OOD handling and distance-based calibration.

Abstract

Model ensembles are effective tools for estimating prediction uncertainty in deep learning atomistic force fields. However, their widespread adoption is hindered by high computational costs and overconfident error estimates. In this work, we address these challenges by leveraging distributions of per-sample errors obtained during training and employing a distance-based similarity search in the model latent space. Our method, which we call LTAU, efficiently estimates the full probability distribution function (PDF) of errors for any test point using the logged training errors, achieving speeds that are 2--3 orders of magnitudes faster than typical ensemble methods and allowing it to be used for tasks where training or evaluating multiple models would be infeasible. We apply LTAU towards estimating parametric uncertainty in atomistic force fields (LTAU-FF), demonstrating that its improved ensemble diversity produces well-calibrated confidence intervals and predicts errors that correlate strongly with the true errors for data near the training domain. Furthermore, we show that the errors predicted by LTAU-FF can be used in practical applications for detecting out-of-domain data, tuning model performance, and predicting failure during simulations. We believe that LTAU will be a valuable tool for uncertainty quantification (UQ) in atomistic force fields and is a promising method that should be further explored in other domains of machine learning.
Paper Structure (24 sections, 8 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: Comparison of LTAU-FF and Ensemble on the the 3BPA test sets. Panel a compares the uncertainty estimates for each method to the true errors observed on the test sets (note the log-scaled axes), and can be used to obtain $\mathcal{C}_P$ and $\mathcal{C}_S$ in Table \ref{['tab:metrics']}. The calibration curves, as described in Tran2020, show how well the predicted confidence intervals capture the true errors (with an ideal model falling on the $x=y$ line), and can be used to obtain $\mathcal{A}^+$, $\mathcal{A}^-$, $|\mathcal{A}|$, and $s$ in Table \ref{['tab:metrics']}.
  • Figure 2: Comparison of LTAU-FF and Ensemble on the Carbon_GAP_20 test set. Predicted uncertainty vs. true force errors shown in panel a and calibration curves in panel b are as described in Fig. \ref{['fig:parity_cal_3bpa']}. Quantitative analysis is provided in Table \ref{['tab:metrics']}. The "wall" of points around 1 eV/Å where LTAU-FF makes poor predictions are those which we identify as being OOD based on their nearest-neighbor distance in the latent space (see Appendix \ref{['sec:supp:ood']}).
  • Figure 3: RMSD between atoms of DFT-relaxed and model-relaxed samples from the IS2RS task of OC20, binned by predicted uncertainty. Panels correspond to different choices of snapshots along the relaxation trajectory to use for predicting the final RMSD. The splits identified by Chanussot2021 as being in-domain (val_id) or out-of-domain (val_ood_both) are shown in blue or orange, respectively. Distances are averaged within each bin, and error bars correspond to the standard error for each bin. Only the adsorbate and surface atoms were considered in these plots, as is done in Chanussot2021.
  • Figure D1: Distributions of first nearest-neighbor distances in the train/test sets of 3BPA and Carbon_GAP_20. The data points are sorted by distance to improve visibility. Each train/test set is plotted as a different color; the 3BPA 300K test data almost exact overlaps the 3BPA train data. Cutoffs (dashed black lines) can be chosen ad hoc to identify OOD data where LTAU-FF may begin to suffer from the drawbacks of distance-based UQ metrics discussed in Section \ref{['sec:distance_uq']}. Fig. \ref{['fig:supp:3bpa_ood']} and Fig. \ref{['fig:supp:gap20_ood']} show that data classified as being OOD roughly corresponds to the points where LTAU-FF fails to accurately predict the true force errors.
  • Figure D2: A version of Fig. \ref{['fig:parity_cal_3bpa']}a where the 3BPA test data is split into ID and OOD data based on the manual cutoffs defined in Fig. \ref{['fig:supp:distances']}.
  • ...and 3 more figures