Fast and Accurate Uncertainty Estimation in Chemical Machine Learning

Felix Musil; Michael J. Willatt; Mikhail A. Langovoy; Michele Ceriotti

Fast and Accurate Uncertainty Estimation in Chemical Machine Learning

Felix Musil, Michael J. Willatt, Mikhail A. Langovoy, Michele Ceriotti

TL;DR

This work presents a scalable framework for uncertainty estimation in chemical machine learning by combining sparse Gaussian Process Regression (PP) with SOAP kernels and resampling (sub-sampling) to generate ensembles of predictions. Uncertainty is calibrated via log-likelihood maximization and a maximum-likelihood scaling factor, improving reliability beyond the standard GP variance. The framework is validated on two benchmarks: 1H NMR chemical shieldings in molecular crystals and QM9 formation energies, showing that sub-sampling-based uncertainty (especially with non-linear scaling) can outperform or match the GPR-based uncertainty at reduced cost, and enabling robust uncertainty propagation for derived properties. The approach supports training-set optimization and active learning, and is readily adaptable to other ML schemes, providing practical benefits for data-driven materials chemistry.

Abstract

We present a scheme to obtain an inexpensive and reliable estimate of the uncertainty associated with the predictions of a machine-learning model of atomic and molecular properties. The scheme is based on resampling, with multiple models being generated based on sub-sampling of the same training data. The accuracy of the uncertainty prediction can be benchmarked by maximum likelihood estimation, which can also be used to correct for correlations between resampled models, and to improve the performance of the uncertainty estimation by a cross-validation procedure. In the case of sparse Gaussian Process Regression models, this resampled estimator can be evaluated at negligible cost. We demonstrate the reliability of these estimates for the prediction of molecular energetics, and for the estimation of nuclear chemical shieldings in molecular crystals. Extension to estimate the uncertainty in energy differences, forces, or other correlated predictions is straightforward. This method can be easily applied to other machine learning schemes, and will be beneficial to make data-driven predictions more reliable, and to facilitate training-set optimization and active-learning strategies.

Fast and Accurate Uncertainty Estimation in Chemical Machine Learning

TL;DR

Abstract

Fast and Accurate Uncertainty Estimation in Chemical Machine Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)