Table of Contents
Fetching ...

Epistemic Uncertainty Quantification For Pre-trained Neural Network

Hanjing Wang, Qiang Ji

TL;DR

The paper tackles the problem of estimating epistemic uncertainty for pre-trained, non-Bayesian models without access to training data or model retraining. It develops a gradient-based UQ framework that links perturbation-based and gradient-based perspectives, providing theoretical justification and practical methods. The core contributions are three enhancements—class-specific gradient weighting, layer-selective gradients, and gradient-perturbation integration—encapsulated in the REGrad approach. Empirical results across OOD detection, uncertainty calibration, and active learning demonstrate that REGrad outperforms existing baselines for pre-trained models, offering a scalable and architecture-agnostic tool for safer deployment of neural networks.

Abstract

Epistemic uncertainty quantification (UQ) identifies where models lack knowledge. Traditional UQ methods, often based on Bayesian neural networks, are not suitable for pre-trained non-Bayesian models. Our study addresses quantifying epistemic uncertainty for any pre-trained model, which does not need the original training data or model modifications and can ensure broad applicability regardless of network architectures or training techniques. Specifically, we propose a gradient-based approach to assess epistemic uncertainty, analyzing the gradients of outputs relative to model parameters, and thereby indicating necessary model adjustments to accurately represent the inputs. We first explore theoretical guarantees of gradient-based methods for epistemic UQ, questioning the view that this uncertainty is only calculable through differences between multiple models. We further improve gradient-driven UQ by using class-specific weights for integrating gradients and emphasizing distinct contributions from neural network layers. Additionally, we enhance UQ accuracy by combining gradient and perturbation methods to refine the gradients. We evaluate our approach on out-of-distribution detection, uncertainty calibration, and active learning, demonstrating its superiority over current state-of-the-art UQ methods for pre-trained models.

Epistemic Uncertainty Quantification For Pre-trained Neural Network

TL;DR

The paper tackles the problem of estimating epistemic uncertainty for pre-trained, non-Bayesian models without access to training data or model retraining. It develops a gradient-based UQ framework that links perturbation-based and gradient-based perspectives, providing theoretical justification and practical methods. The core contributions are three enhancements—class-specific gradient weighting, layer-selective gradients, and gradient-perturbation integration—encapsulated in the REGrad approach. Empirical results across OOD detection, uncertainty calibration, and active learning demonstrate that REGrad outperforms existing baselines for pre-trained models, offering a scalable and architecture-agnostic tool for safer deployment of neural networks.

Abstract

Epistemic uncertainty quantification (UQ) identifies where models lack knowledge. Traditional UQ methods, often based on Bayesian neural networks, are not suitable for pre-trained non-Bayesian models. Our study addresses quantifying epistemic uncertainty for any pre-trained model, which does not need the original training data or model modifications and can ensure broad applicability regardless of network architectures or training techniques. Specifically, we propose a gradient-based approach to assess epistemic uncertainty, analyzing the gradients of outputs relative to model parameters, and thereby indicating necessary model adjustments to accurately represent the inputs. We first explore theoretical guarantees of gradient-based methods for epistemic UQ, questioning the view that this uncertainty is only calculable through differences between multiple models. We further improve gradient-driven UQ by using class-specific weights for integrating gradients and emphasizing distinct contributions from neural network layers. Additionally, we enhance UQ accuracy by combining gradient and perturbation methods to refine the gradients. We evaluate our approach on out-of-distribution detection, uncertainty calibration, and active learning, demonstrating its superiority over current state-of-the-art UQ methods for pre-trained models.
Paper Structure (33 sections, 6 theorems, 32 equations, 1 figure, 10 tables)

This paper contains 33 sections, 6 theorems, 32 equations, 1 figure, 10 tables.

Key Result

Proposition 3.1

Assume the model parameters $\theta^*$ are learned given sufficient in-distribution training data $\mathcal{D}$, i.e., $|\mathcal{D}|\rightarrow \infty$. Under mild regularity conditions (i.e., the likelihood function of $\theta$ is continuous, $\theta^*$ is not on the boundary of the parameter spac and where $\sigma \rightarrow 0$ is a small positive constant.

Figures (1)

  • Figure 1: ACC and NLL for the MNIST, C10, SVHN, and C100 datasets across 10 active learning acquisition cycles are presented. In each figure, the x-axis represents the number of data samples acquired for training the model, while the y-axis shows either the accuracy (ACC) or the negative log-likelihood (NLL) on the testing data. The results are averaged over three independent runs.

Theorems & Definitions (11)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • Proposition 3.5
  • proof
  • proof
  • Lemma A.1
  • proof
  • proof
  • ...and 1 more