Table of Contents
Fetching ...

Be Bayesian by Attachments to Catch More Uncertainty

Shiyu Shen, Bin Pan, Tianyang Shi, Tao Li, Zhenwei Shi

TL;DR

Be Bayesian by Attachments to Catch More Uncertainty (ABNN) tackles uncertainty estimation by extending Bayesian neural networks with an attachment structure that captures OOD uncertainty while preserving ID predictive power. The authors formalize ID, semi-OOD, and full-OOD data partitions and connect uncertainty to posterior variance, providing convergence analysis. Training alternates between ID-focused KL minimization and OOD-oriented KL maximization, effectively inflating OOD variance via a lightweight attachment while maintaining backbone performance. Empirical results on MNIST/SVHN/CIFAR demonstrate ABNN’s strong OOD detection and misclassification detection capabilities, with robustness across backbones and insensitivity to the exact choice of the balancing parameter, highlighting practical impact for reliable uncertainty estimation in real-world systems.

Abstract

Bayesian Neural Networks (BNNs) have become one of the promising approaches for uncertainty estimation due to the solid theorical foundations. However, the performance of BNNs is affected by the ability of catching uncertainty. Instead of only seeking the distribution of neural network weights by in-distribution (ID) data, in this paper, we propose a new Bayesian Neural Network with an Attached structure (ABNN) to catch more uncertainty from out-of-distribution (OOD) data. We first construct a mathematical description for the uncertainty of OOD data according to the prior distribution, and then develop an attached Bayesian structure to integrate the uncertainty of OOD data into the backbone network. ABNN is composed of an expectation module and several distribution modules. The expectation module is a backbone deep network which focuses on the original task, and the distribution modules are mini Bayesian structures which serve as attachments of the backbone. In particular, the distribution modules aim at extracting the uncertainty from both ID and OOD data. We further provide theoretical analysis for the convergence of ABNN, and experimentally validate its superiority by comparing with some state-of-the-art uncertainty estimation methods Code will be made available.

Be Bayesian by Attachments to Catch More Uncertainty

TL;DR

Be Bayesian by Attachments to Catch More Uncertainty (ABNN) tackles uncertainty estimation by extending Bayesian neural networks with an attachment structure that captures OOD uncertainty while preserving ID predictive power. The authors formalize ID, semi-OOD, and full-OOD data partitions and connect uncertainty to posterior variance, providing convergence analysis. Training alternates between ID-focused KL minimization and OOD-oriented KL maximization, effectively inflating OOD variance via a lightweight attachment while maintaining backbone performance. Empirical results on MNIST/SVHN/CIFAR demonstrate ABNN’s strong OOD detection and misclassification detection capabilities, with robustness across backbones and insensitivity to the exact choice of the balancing parameter, highlighting practical impact for reliable uncertainty estimation in real-world systems.

Abstract

Bayesian Neural Networks (BNNs) have become one of the promising approaches for uncertainty estimation due to the solid theorical foundations. However, the performance of BNNs is affected by the ability of catching uncertainty. Instead of only seeking the distribution of neural network weights by in-distribution (ID) data, in this paper, we propose a new Bayesian Neural Network with an Attached structure (ABNN) to catch more uncertainty from out-of-distribution (OOD) data. We first construct a mathematical description for the uncertainty of OOD data according to the prior distribution, and then develop an attached Bayesian structure to integrate the uncertainty of OOD data into the backbone network. ABNN is composed of an expectation module and several distribution modules. The expectation module is a backbone deep network which focuses on the original task, and the distribution modules are mini Bayesian structures which serve as attachments of the backbone. In particular, the distribution modules aim at extracting the uncertainty from both ID and OOD data. We further provide theoretical analysis for the convergence of ABNN, and experimentally validate its superiority by comparing with some state-of-the-art uncertainty estimation methods Code will be made available.
Paper Structure (22 sections, 3 theorems, 26 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 22 sections, 3 theorems, 26 equations, 5 figures, 7 tables, 1 algorithm.

Key Result

Theorem 6

$var(y[i]|X,\mathcal{D}_{\rm{ID}})<var(y[i]|X,\mathcal{D}_{\rm{fullOOD}})$

Figures (5)

  • Figure 1: Visualization of uncertainty for BBP BBB, SDE-Net kong2020sde and ABNN. The horizontal axis is the index of predictions ordered by uncertainty on each dataset, and the longitudinal axis is uncertainty. An ideal estimate should increase gradually as the input becomes more OOD, while still being separable on different datasets. Please refer to \ref{['4_2']} for more details.
  • Figure 2: Segmentation of data space. Whether a data point belongs to a specific distribution determines the segmentation bound.
  • Figure 3: Components of ABNN. We take ResNet as the backbone and attach a distribution module to each Resblock. For out-of-distribution data, our network can catch its uncertainty and generate probabilistic results with large variances. For in-distribution data, our network can predict results with small variances.
  • Figure 4: Uncertainty distributions on MNIST, SVHN and CIFAR10. The first row is uncertainty distributions; the second row is ordered uncertainty on different datasets. Distributions that are more separated indicate better performances.
  • Figure 5: Uncertainty distributions on CIFAR10 and CIFAR100. The first row is uncertainty distributions; the second row is ordered uncertainty on different datasets. Distributions that are more separated indicate better performances.

Theorems & Definitions (11)

  • Definition 1: $\mathcal{D}_{\rm{ID}}$
  • Definition 2: $\mathcal{D}_{\rm{semiOOD}}$
  • Definition 3: $\mathcal{D}_{\rm{fullOOD}}$
  • Definition 4: $p(X|\theta,\mathcal{D}_{\rm{ID}})$
  • Definition 5: $p(X|\theta,\mathcal{D}_{\rm{fullOOD}})$
  • Theorem 6
  • proof
  • Theorem 7
  • proof
  • Theorem 8
  • ...and 1 more