Transitional Uncertainty with Layered Intermediate Predictions

Ryan Benkert; Mohit Prabhushankar; Ghassan AlRegib

Transitional Uncertainty with Layered Intermediate Predictions

Ryan Benkert, Mohit Prabhushankar, Ghassan AlRegib

TL;DR

This work tackles the challenge of reliable single-pass uncertainty estimation by examining why output-distance preservation can hinder the learning objective. It introduces Transitional Uncertainty with Layered Intermediate Predictions (TULIP), which employs transitional feature preservation across intermediate representations and a combination head to produce uncertainty scores in one forward pass. The authors provide theoretical insight into when distance preservation helps and demonstrate a practical, scalable method that matches or surpasses existing single-pass estimators on CIFAR, medical CT data, and ImageNet, especially under architectural complexity and class imbalance. The approach offers a pragmatic alternative to ensembles, improving uncertainty estimates without the computational burden of multiple forward passes, and shows promise for deployment in real-world scenarios with diverse data modalities.

Abstract

In this paper, we discuss feature engineering for single-pass uncertainty estimation. For accurate uncertainty estimates, neural networks must extract differences in the feature space that quantify uncertainty. This could be achieved by current single-pass approaches that maintain feature distances between data points as they traverse the network. While initial results are promising, maintaining feature distances within the network representations frequently inhibits information compression and opposes the learning objective. We study this effect theoretically and empirically to arrive at a simple conclusion: preserving feature distances in the output is beneficial when the preserved features contribute to learning the label distribution and act in opposition otherwise. We then propose Transitional Uncertainty with Layered Intermediate Predictions (TULIP) as a simple approach to address the shortcomings of current single-pass estimators. Specifically, we implement feature preservation by extracting features from intermediate representations before information is collapsed by subsequent layers. We refer to the underlying preservation mechanism as transitional feature preservation. We show that TULIP matches or outperforms current single-pass methods on standard benchmarks and in practical settings where these methods are less reliable (imbalances, complex architectures, medical modalities).

Transitional Uncertainty with Layered Intermediate Predictions

TL;DR

Abstract

Paper Structure (49 sections, 1 theorem, 21 equations, 6 figures, 9 tables, 2 algorithms)

This paper contains 49 sections, 1 theorem, 21 equations, 6 figures, 9 tables, 2 algorithms.

Introduction
Background
Neural Networks in the Information Plane
Distance-Based Feature Preservation in the Output
Theoretical Analysis
Pitfalls of Feature Preservation in the Output
Distance Preservation under Class Imbalance
Experimental Setup
Accuracy Curves
Our Method: TULIP
Transitional Feature Preservation
Algorithm
Shallow-Deep Network Exits
Training Procedure
Combination Head
...and 34 more sections

Key Result

Proposition 4.1

Consider the neural network mapping $h_w:\mathcal{X}\rightarrow \mathcal{H}$ with the layered architecture $h_w = h_{w_0}\circ h_{w_1}...\circ h_{w_L}$, where the first layer $h_{w_0}$ is collapse resistant with respect to the input space, $d_{H_0}(h_{w_0}(\mathbf{x}_1), h_{w_0}(\mathbf{x}_2)) \neq where $C=1$ under an appropriate choice of $r_l$. In other words, there exists a linear combination

Figures (6)

Figure 1: a) overview of different feature preservation paradigms. We show 2D representations of neural networks where $h_1$ and $h_2$ denote the output dimensions in the feature space. Left: conventional neural network. Samples are collapsed to two tight clusters with little uncertainty information. Center: feature preservation in the output. Feature differences are maintained resulting in higher uncertainty related content but also in a cluster overlap. Right: transitional feature preservation. uncertainty is measured from differences between several representations of the same sample (denoted as $h_{w_{1,2,3}})$. b) uncertainty comparison of our intuition to ensembles. Transitions in between network layers provide an accurate signal for uncertainty estimation in comparison to ensembles.
Figure 2: Classification accuracy with and without distance preservation in the output: a) uniformely removed training and test data (left); b) class imbalance at different severity levels (right). In both graphs, we show the classification accuracy on the y-axis. The x-axis on the left graph represents the percentage of uniformely removed data, on the right the axis represents the fraction of imbalanced classes. The zero point on the x-axis is equivalent for both scenarios and represents the standard CIFAR100 benchmark without imbalance or data removal.
Figure 3: Workflow of our method during inference. The architecture consists of a main network, internal classifiers (IC), as well as a combination head. During inference, the input traverses the main network, as well as the internal classifiers. The prediction is obtained from the main network output, while the uncertainty score is obtained from a combination of internal classifier outputs.
Figure 4: Toy example of class distributions at different imbalance severity levels. Each column represents a different severity level, and each column the training and test set distribution respectively.
Figure 5: Accuracy and number of samples (sample concentration) with respect to dataset imbalance and average uncertainty score. In the the top row we show accuracy, in the bottom row we show sample concentration. The left column represents a conventional DNN while the right shows feature distance preservation through spectral normalization.
...and 1 more figures

Theorems & Definitions (5)

Proposition 4.1: Transitional Feature Preservation in Intermediate Representations
Definition 1.1: Unique Distance Set and Partition
proof
proof
proof

Transitional Uncertainty with Layered Intermediate Predictions

TL;DR

Abstract

Transitional Uncertainty with Layered Intermediate Predictions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)