Table of Contents
Fetching ...

A Rate-Distortion View of Uncertainty Quantification

Ifigeneia Apostolopoulou, Benjamin Eysenbach, Frank Nielsen, Artur Dubrawski

TL;DR

The paper addresses reliable uncertainty quantification for deep neural networks by making predictions distance-aware relative to the training data manifold. It introduces Distance Aware Bottleneck (DAB), a single-model, deterministic approach that learns a codebook of encoder distributions and uses their distance to quantify uncertainty in a rate-distortionIB framework. By replacing the IB complexity term with a finite-codebook rate-distortion objective and employing alternating minimization, DAB achieves superior out-of-distribution detection and misclassification calibration, often surpassing ensembles with far lower computational cost. The method supports post-hoc deployment on pre-trained feature extractors and demonstrates strong results across synthetic tasks, CIFAR-10, and ImageNet-1K, highlighting practical impact for scalable, calibrated uncertainty estimation in real-world applications.

Abstract

In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching deep neural networks with this property. Building on prior information bottleneck approaches, our method learns a codebook that stores a compressed representation of all inputs seen during training. The distance of a new example from this codebook can serve as an uncertainty estimate for the example. The resulting model is simple to train and provides deterministic uncertainty estimates by a single forward pass. Finally, our method achieves better out-of-distribution (OOD) detection and misclassification prediction than prior methods, including expensive ensemble methods, deep kernel Gaussian Processes, and approaches based on the standard information bottleneck.

A Rate-Distortion View of Uncertainty Quantification

TL;DR

The paper addresses reliable uncertainty quantification for deep neural networks by making predictions distance-aware relative to the training data manifold. It introduces Distance Aware Bottleneck (DAB), a single-model, deterministic approach that learns a codebook of encoder distributions and uses their distance to quantify uncertainty in a rate-distortionIB framework. By replacing the IB complexity term with a finite-codebook rate-distortion objective and employing alternating minimization, DAB achieves superior out-of-distribution detection and misclassification calibration, often surpassing ensembles with far lower computational cost. The method supports post-hoc deployment on pre-trained feature extractors and demonstrates strong results across synthetic tasks, CIFAR-10, and ImageNet-1K, highlighting practical impact for scalable, calibrated uncertainty estimation in real-world applications.

Abstract

In supervised learning, understanding an input's proximity to the training data can help a model decide whether it has sufficient evidence for reaching a reliable prediction. While powerful probabilistic models such as Gaussian Processes naturally have this property, deep neural networks often lack it. In this paper, we introduce Distance Aware Bottleneck (DAB), i.e., a new method for enriching deep neural networks with this property. Building on prior information bottleneck approaches, our method learns a codebook that stores a compressed representation of all inputs seen during training. The distance of a new example from this codebook can serve as an uncertainty estimate for the example. The resulting model is simple to train and provides deterministic uncertainty estimates by a single forward pass. Finally, our method achieves better out-of-distribution (OOD) detection and misclassification prediction than prior methods, including expensive ensemble methods, deep kernel Gaussian Processes, and approaches based on the standard information bottleneck.
Paper Structure (32 sections, 1 theorem, 34 equations, 8 figures, 16 tables, 1 algorithm)

This paper contains 32 sections, 1 theorem, 34 equations, 8 figures, 16 tables, 1 algorithm.

Key Result

Proposition 1

Let the variational marginal $q(\boldsymbol{z};\boldsymbol{\phi})$ of Eq. eq:tau_z_centroid be a mixture of $k$ distributions in $\mathbb{R}^d$ that belong to the scaled regular exponential family (def. def:scaled_exp_family) $\mathcal{F}^\alpha_{\psi}$ with $\alpha>0$ and log-partition function $\p where $D_{\psi^*}$ is the Bregman divergence of $\mathcal{F}_{\psi}$, i.e., the Bregman divergence

Figures (8)

  • Figure 1: Distance awareness for principled uncertainty quantification. A distance-aware model can measure the distance between input examples and the training examples. Our method learns distances where misclassified datapoints, semantic (near OOD), and domain (far OOD) deviations can be identified by larger distances. Our method learns and uses a codebook for representing the training dataset. Here, we report distances from a codebook trained on CIFAR-10.
  • Figure 2: Overview of DAB. Uncertainty quantification in DAB is based on compressing the training dataset $\mathcal{D}_\mathcal{D}$ by learning a codebook and computing distances from the codebook. The datapoints in $\mathcal{D}_\mathcal{D}$, originally lying in $\mathbb{R}^d$ (\ref{['d_train']}), are embedded into distribution space $\mathcal{P}$ of a parametric family of distributions through their encoders (\ref{['d_train_enc']}). Compression of $\mathcal{D}_\mathcal{D}$ amounts to finding the centroids of the encoders in terms of a statistical distance $D$ (\ref{['d_train_enc_center']}). For complex datasets, usually multiple centroids are needed (\ref{['d_train_enc_center_codebook']}). The uncertainty for a previously unseen test datapoint is quantified by its expected distance from the codebook: $\mathrm{uncertainty}(\color{red}{x_{\text{test}}}\color{black})=\mathbb{E}[D(\color{red} p(\boldsymbol{z} \mid \boldsymbol{x}_{\text{test}};\boldsymbol{\theta})\color{black},\color{blue} q_\kappa (\boldsymbol{z}; \boldsymbol{\phi}) \color{black})]$.
  • Figure 3: Uncertainty estimation on noisy regression tasks. We consider the Kullback-Leibler divergence as the distortion function in the uncertainty score of Eq. \ref{['eq:distance_3']}. A larger distance from the training datapoints (blue dots) is consistently quantified by higher uncertainty (width of pink area). Moreover, the true values lie well within $\pm 2 \times$ the proposed uncertainty score around the predictive mean.
  • Figure 4: Qualitative evaluation of encoders' codebook. We visualize the number of CIFAR-10 test data points per class assigned to each centroid during training. We assign a data point to the centroid with the smallest statistical distance from its encoder. Each centroid progressively attracts data points of the same class. Moreover, all centroids are assigned a non-zero number of test datapoints. Therefore, the centroids are useful for better explaining both train and previously unseen, test data points.
  • Figure 5: DAB's AUROC vs corruption intensity for common corruptions to test CIFAR. The shaded area corresponds to $+/-$ one standard deviation across 10 random seeds.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 1.1: Bregman Divergence
  • Definition 1.2: Dual Bregman Form of Exponential Family
  • Definition 1.3: Scaled Exponential Family
  • Proposition 1
  • proof