Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

Felix Jimenez; Matthias Katzfuss

Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

Felix Jimenez, Matthias Katzfuss

TL;DR

This work introduces Probabilistic Skip Connections (PSCs) to enable deterministic uncertainty quantification in classification networks without retraining with spectral normalization. By locating an intermediate layer that simultaneously preserves feature sensitivity and smoothness via neural-collapse metrics, PSCs project and attach a probabilistic head to yield reliable iD and OOD assessments in a single forward pass. The method leverages a Tucker-based projection of intermediate representations and a linear probabilistic head (Laplace/KFAC) to provide accurate uncertainty estimates, matching or surpassing SN-trained baselines across architectures and datasets, including networks without residual connections. Empirical results also show the intermediate representations exhibit exploitable low-rank structure, allowing effective dimensionality reduction without eroding UQ quality. Overall, PSCs offer a drop-in, scalable path to high-quality UQ and OOD detection without retraining, expanding deterministic UQ to a broader class of networks.

Abstract

Deterministic uncertainty quantification (UQ) in deep learning aims to estimate uncertainty with a single pass through a network by leveraging outputs from the network's feature extractor. Existing methods require that the feature extractor be both sensitive and smooth, ensuring meaningful input changes produce meaningful changes in feature vectors. Smoothness enables generalization, while sensitivity prevents feature collapse, where distinct inputs are mapped to identical feature vectors. To meet these requirements, current deterministic methods often retrain networks with spectral normalization. Instead of modifying training, we propose using measures of neural collapse to identify an existing intermediate layer that is both sensitive and smooth. We then fit a probabilistic model to the feature vector of this intermediate layer, which we call a probabilistic skip connection (PSC). Through empirical analysis, we explore the impact of spectral normalization on neural collapse and demonstrate that PSCs can effectively disentangle aleatoric and epistemic uncertainty. Additionally, we show that PSCs achieve uncertainty quantification and out-of-distribution (OOD) detection performance that matches or exceeds existing single-pass methods requiring training modifications. By retrofitting existing models, PSCs enable high-quality UQ and OOD capabilities without retraining.

Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

TL;DR

Abstract

Paper Structure (27 sections, 9 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 27 sections, 9 equations, 9 figures, 5 tables, 2 algorithms.

Introduction
Background
Uncertainty quantification
Feature geometry
Related work
Probabilistic skip connections (PSCs)
Choosing layer: collapse-accuracy trade-off
Processing intermediate representations
Linear layers as PSCs
Experiments
Trade-off between sensitivity and smoothness
DDU case study
Intermediate representations have exploitable low-rank structure
iD UQ performance for PSCs matches using SN
Conclusion
...and 12 more sections

Figures (9)

Figure 1: Neural-collapse metrics help identify intermediate layers without feature collapse but high nearest-centroid accuracy, allowing us to place probabilistic skip connections (PSCs) that enhance uncertainty quantification. The top left shows the trade-off between collapse and accuracy across network depth. In this example, dog, mouse, and cat images are processed through a network trained only on dog and cat images, with the mouse being out-of-distribution (OOD). Without a PSC, the network incorrectly labels the mouse as "Cat". However, adding a PSC at a layer with high accuracy and low collapse gives a model that honestly reflects the uncertainty for the mouse image and correctly labels the two in-distribution instances. The magnifying glass view illustrates why: in pre-collapse layers, the feature vector difference, $\delta$, between the cat and mouse remains large. By the penultimate layer, this difference collapses, leading to overconfidence. The crux is to identify an appropriate intermediate layer that maintains feature sensitivity while preserving predictive performance.
Figure 2: Adding PSCs to a pretrained model involves measuring collapse (left), projecting intermediate layers to a feature vector (center), and fitting a probabilistic model to the feature vector (right). The first step is to find a subset of the layers that tradeoff accuracy and collapse, higher is better for both accuracy and collapse. Once we have those layers we then combine them and project to a single feature vector. Finally, we fit a probabilistic model to that feature vector which can identify iD vs OOD and make class predictions.
Figure 3: SN delays neural collapse to the final layer, whereas without SN, intermediate layers behave similarly to the penultimate layer of a network trained with SN. (Left) SN pushes the point where $\mathcal{NC}_1$ drops below 0.2 to the final layer. Without SN, there is a region where $\mathcal{NC}_1$ remains above 0.2, but its values differ from those observed with SN. (Right) SN reduces the performance of the nearest-centroid classifier (NCC) across all but the last two layers, while without SN, more layers exhibit NCC accuracy comparable to the final prediction. (Both Panels) Boxes highlight layers that have not yet collapsed but maintain high NCC accuracy, indicating that even without SN, some layers are both sensitive and smooth.
Figure 4: (a) PSC feature density separates iD and OOD just as well as when using SN. The panels each show the probability of the feature vector under GDA trained on the embeddings of the training data, but each panel differs in how the feature vectors are computed. The base network's original feature density uses no SN (left) and can be improved equally well by using SN (middle) or an intermediate layer (right). (b) Intermediate feature density also works for networks without residual connections and no SN. The feature density of VGG-16 (left) conflates iD and OOD but using an intermediate layer (right) separates iD and OOD.
Figure 5: For a wide range of $c_{proj}$ and $d_{proj}$, neural collapse is avoided, and NCC accuracy remains high, demonstrating the robustness of the projection method. Neural collapse metrics for ResNet-18 on MNIST are shown, with the left panel displaying $\mathcal{NC}1$ values and the right panel showing $\mathcal{NC}4$ values. In both panels, the x-axis represents $d_{proj}$, and color indicates $c_{proj}$.
...and 4 more figures

Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

TL;DR

Abstract

Probabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)