Table of Contents
Fetching ...

P-DRUM: Post-hoc Descriptor-based Residual Uncertainty Modeling for Machine Learning Potentials

Shih-Peng Huang, Nontawat Charoenphakdee, Yuta Tsuboi, Yong-Bin Zhuang, Wenwen Li

TL;DR

The study tackles uncertainty quantification for machine learning interatomic potentials by introducing P-DRUM, a post-hoc framework that derives residual uncertainty from descriptors of a trained GNN potential. P-DRUM trains atom-wise residual models to estimate energy residuals \Delta E and force residuals \Delta F using either norm-based or deviation-based targets, aggregating over atoms to yield structure-level uncertainty proxies. Across multiple datasets (HME21, rMD17, Ni3Al), P-DRUM variants show strong alignment with actual prediction errors, sometimes surpassing baselines like kNN and GMM in in-domain uncertainty correlation, while OOD detection reveals tradeoffs between the norm and diff formulations. The work demonstrates a practical, efficient alternative to ensembles for UQ in MLIPs and points to future directions in active learning and data-efficient residual training to extend applicability and reliability.

Abstract

Ensemble method is considered the gold standard for uncertainty quantification (UQ) in machine learning interatomic potentials (MLIPs). However, their high computational cost can limit its practicality. Alternative techniques, such as Monte Carlo dropout and deep kernel learning, have been proposed to improve computational efficiency; however, some of these methods cannot be applied to already trained models and may affect the prediction accuracy. In this paper, we propose a simple and efficient post-hoc framework for UQ that leverages the descriptor of a trained graph neural network potential to estimate residual errors. We refer to this method as post-hoc descriptor-based residual uncertainty modeling (P-DRUM). P-DRUM models the discrepancy between MLIP predictions and ground truth values, allowing these residuals to act as proxies for prediction uncertainty. We explore multiple variants of P-DRUM and benchmark them against established UQ methods, evaluating both their effectiveness and limitations.

P-DRUM: Post-hoc Descriptor-based Residual Uncertainty Modeling for Machine Learning Potentials

TL;DR

The study tackles uncertainty quantification for machine learning interatomic potentials by introducing P-DRUM, a post-hoc framework that derives residual uncertainty from descriptors of a trained GNN potential. P-DRUM trains atom-wise residual models to estimate energy residuals \Delta E and force residuals \Delta F using either norm-based or deviation-based targets, aggregating over atoms to yield structure-level uncertainty proxies. Across multiple datasets (HME21, rMD17, Ni3Al), P-DRUM variants show strong alignment with actual prediction errors, sometimes surpassing baselines like kNN and GMM in in-domain uncertainty correlation, while OOD detection reveals tradeoffs between the norm and diff formulations. The work demonstrates a practical, efficient alternative to ensembles for UQ in MLIPs and points to future directions in active learning and data-efficient residual training to extend applicability and reliability.

Abstract

Ensemble method is considered the gold standard for uncertainty quantification (UQ) in machine learning interatomic potentials (MLIPs). However, their high computational cost can limit its practicality. Alternative techniques, such as Monte Carlo dropout and deep kernel learning, have been proposed to improve computational efficiency; however, some of these methods cannot be applied to already trained models and may affect the prediction accuracy. In this paper, we propose a simple and efficient post-hoc framework for UQ that leverages the descriptor of a trained graph neural network potential to estimate residual errors. We refer to this method as post-hoc descriptor-based residual uncertainty modeling (P-DRUM). P-DRUM models the discrepancy between MLIP predictions and ground truth values, allowing these residuals to act as proxies for prediction uncertainty. We explore multiple variants of P-DRUM and benchmark them against established UQ methods, evaluating both their effectiveness and limitations.

Paper Structure

This paper contains 16 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the P-DRUM. Red arrow indicates "model training using supervised learning".
  • Figure 2: PCA visualization of the Ni3Al dataset. The left subplots show the prediction error of train and test set in PC space, while the uncertainty metrics of the test set on the right subplots.
  • Figure 3: PCA visualization of the oxygen atoms in HME21 dataset. The left subplots show the prediction error of train and test set in PC space, while the uncertainty metrics of the test set on the right subplots.
  • Figure 4: PCA visualization of the calcium atoms in HME21 dataset. The left subplots show the prediction error of train and test set in PC space, while the uncertainty metrics of the test set on the right subplots.