Table of Contents
Fetching ...

Black-Box Uncertainty Estimation for Deep Learning Models in Atomistic Simulations

Idan Fonea, Amir Peles, Sivan Niv, Goren Gordon, Amir Natan

TL;DR

The paper tackles uncertainty quantification for atomistic force prediction by using a black-box ensemble approach that does not modify the base neural network. Direct uncertainty $UQ_d$ correlates with force magnitude and struggles to separate in-distribution from out-of-distribution data; introducing a relative uncertainty $UQ_r$ scaled by the ensemble force magnitude and smoothed over time yields robust OOD detection across Na and Al, including surface configurations. Thresholding $UQ_r$ via a conformal-like non-conformity score enables practical OOD decisions and can guide data acquisition and retraining, while $UQ_r$ also provides insight into training convergence. Overall, the method offers a general, architecture-agnostic tool for validating predictions in atomistic simulations and related black-box models, with clear applicability to active learning and surface chemistry problems.

Abstract

We analyze an ensemble-based approach for uncertainty quantification (UQ) in atomistic neural networks. This method generates an epistemic uncertainty signal without requiring changes to the underlying multi-headed regression neural network architecture, making it suitable for sealed or black-box models. We apply this method to molecular systems, specifically sodium (Na) and aluminum (Al), under various temperature conditions. By scaling the uncertainty signal, we account for heteroscedasticity in the data. We demonstrate the robustness of the scaled UQ signal for detecting out-of-distribution (OOD) behavior in several scenarios. This UQ signal also correlates with model convergence during training, providing an additional tool for optimizing the training process.

Black-Box Uncertainty Estimation for Deep Learning Models in Atomistic Simulations

TL;DR

The paper tackles uncertainty quantification for atomistic force prediction by using a black-box ensemble approach that does not modify the base neural network. Direct uncertainty correlates with force magnitude and struggles to separate in-distribution from out-of-distribution data; introducing a relative uncertainty scaled by the ensemble force magnitude and smoothed over time yields robust OOD detection across Na and Al, including surface configurations. Thresholding via a conformal-like non-conformity score enables practical OOD decisions and can guide data acquisition and retraining, while also provides insight into training convergence. Overall, the method offers a general, architecture-agnostic tool for validating predictions in atomistic simulations and related black-box models, with clear applicability to active learning and surface chemistry problems.

Abstract

We analyze an ensemble-based approach for uncertainty quantification (UQ) in atomistic neural networks. This method generates an epistemic uncertainty signal without requiring changes to the underlying multi-headed regression neural network architecture, making it suitable for sealed or black-box models. We apply this method to molecular systems, specifically sodium (Na) and aluminum (Al), under various temperature conditions. By scaling the uncertainty signal, we account for heteroscedasticity in the data. We demonstrate the robustness of the scaled UQ signal for detecting out-of-distribution (OOD) behavior in several scenarios. This UQ signal also correlates with model convergence during training, providing an additional tool for optimizing the training process.

Paper Structure

This paper contains 13 sections, 13 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Ensemble of $25$ neural networks for force prediction and uncertainty quantification. Each network, trained on a sampled dataset, predicts the Cartesian force components $\vec{F}_{i,a,k}$ for atom $a$ on data sample $i$. Ensemble averaging yields the mean forces $\vec{\overline{F}}_{i,a}$ and standard deviation across models $\vec{UQ_d}(i,a)$.
  • Figure 2: Panels (a, b) show a scatter plot of the AE (Eq. \ref{['eq:AE_defintion_mb2']}) vs. the $UQ_d$ signal (Eq. \ref{['eq:UQd_signal_mb4']}) produced for the 300$\,\mathrm{K}$ (a) and the 2000$\,\mathrm{K}$ (b) Na trained models, tested on 2000$\,\mathrm{K}$(red) and 300$\,\mathrm{K}$(blue) data. Panels (c,d) show a scatter plot of the AE (Eq. \ref{['eq:AE_defintion_mb2']}) vs the average absolute predicted force (Eq. \ref{['eq:F_norm_mb8']}) is shown for models trained on 300$\,\mathrm{K}$ (c) and 2000$\,\mathrm{K}$ (d) of Na. Red is the result of a Na2000$\,\mathrm{K}$ test set and Blue is Na300$\,\mathrm{K}$ test set. In all graphs, the data is grouped into 100 bins according to the x-axis, and the AE is averaged in each bin.
  • Figure 3: The ${UQ_d}$ and the ${UQ_r}$ behavior of models trained and tested on 2 different temperature datasets, Na300$\,\mathrm{K}$(blue) and Na2000$\,\mathrm{K}$ (red), is shown. To improve visual separation, the $UQ_{r}$ signal was normalized as in Eq. \ref{['eq:F_norm_mb8']}, calculated using $W_{n}$ of 5, and both the $UQ_{r}$ and the $UQ_{d}$ signals were smoothed as in Eq. \ref{['eq:U_smoothed_calbirated_mb9']} calculated using $W_{u}$ of 5. The top sub-figures pair (a,b) show the ${UQ_d}$ signal (Eq. \ref{['eq:UQd_signal_mb4']}), against the Absolute Error (AE of Eq. \ref{['eq:AE_defintion_mb2']}) on a logarithmic scale. In the second sub figures pair (c,d), we show the ${UQ_d}$ signal against the Relative AE, (${AE_r}$) (Eq. \ref{['eq:rel_AE_mb5']}).
  • Figure 4: In (a, b) - Distribution of the ${UQ_r}$ (Eq.\ref{['eq:U_smoothed_calbirated_mb9']}) for Sodium for 300$\,\mathrm{K}$ (a) and the 2000$\,\mathrm{K}$(b) trained models, tested on 2000$\,\mathrm{K}$(red) and 300$\,\mathrm{K}$(blue) data, in bins of 0.01. In (c,d) - The observed cumulative density function (CDF) obtained from Eq.\ref{['eq:P_ood_mb12']} measuring the frequency of ${UQ_r}$ signal samples that are above a threshold set at that point. The vertical line is set on the 95th percentile of the relative uncertainty on train temperature for Na300$\,\mathrm{K}$ (a,c) and Na2000$\,\mathrm{K}$ (b,d).
  • Figure 5: On all graphs, exactly like in Figure \ref{['fig:scatter_U_and_absF_vs_MAE_na']} only for Aluminum instead of Sodium, the data is grouped into 100 bins according to the x-axis, and the AE is averaged in each bin. (a, b) - Scatter of the AE (Eq.\ref{['eq:AE_defintion_mb2']}) vs. the direct UQ signal (Eq.\ref{['eq:UQd_signal_mb4']}) produced for the Al300$\,\mathrm{K}$ (a) and the Al2000$\,\mathrm{K}$ (b) Aluminum trained models, tested on Al2000$\,\mathrm{K}$(red) and Al300$\,\mathrm{K}$(blue) data. (c,d) a scatter plot of the AE (Eq.\ref{['eq:AE_defintion_mb2']}) vs the predicted force magnitude (Eq.\ref{['eq:F_norm_mb8']}) is shown for models trained on Al300$\,\mathrm{K}$ (c) and Al2000$\,\mathrm{K}$ (d). Red is the result of a Al2000$\,\mathrm{K}$ test set and Blue is Al300$\,\mathrm{K}$ test set.
  • ...and 3 more figures