Table of Contents
Fetching ...

Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles

Sophie Steger, Christian Knoll, Bernhard Klein, Holger Fröning, Franz Pernkopf

TL;DR

This work tackles uncertainty quantification in neural networks by performing inference in function space using repulsive last-layer ensembles. It introduces a parameter-efficient, multi-headed architecture that shares a base network and enforces diversity through function-space repulsion, leveraging unlabeled or augmented data as repulsion samples. The method effectively disentangles aleatoric and epistemic uncertainty, improves out-of-distribution detection, and provides calibrated uncertainty under distribution shifts, with strong compatibility for pretrained models and minimal computational overhead. Practically, it enables scalable uncertainty-aware fine-tuning of large models and offers robust uncertainty estimates with modest training and memory costs.

Abstract

Bayesian inference in function space has gained attention due to its robustness against overparameterization in neural networks. However, approximating the infinite-dimensional function space introduces several challenges. In this work, we discuss function space inference via particle optimization and present practical modifications that improve uncertainty estimation and, most importantly, make it applicable for large and pretrained networks. First, we demonstrate that the input samples, where particle predictions are enforced to be diverse, are detrimental to the model performance. While diversity on training data itself can lead to underfitting, the use of label-destroying data augmentation, or unlabeled out-of-distribution data can improve prediction diversity and uncertainty estimates. Furthermore, we take advantage of the function space formulation, which imposes no restrictions on network parameterization other than sufficient flexibility. Instead of using full deep ensembles to represent particles, we propose a single multi-headed network that introduces a minimal increase in parameters and computation. This allows seamless integration to pretrained networks, where this repulsive last-layer ensemble can be used for uncertainty aware fine-tuning at minimal additional cost. We achieve competitive results in disentangling aleatoric and epistemic uncertainty for active learning, detecting out-of-domain data, and providing calibrated uncertainty estimates under distribution shifts with minimal compute and memory.

Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles

TL;DR

This work tackles uncertainty quantification in neural networks by performing inference in function space using repulsive last-layer ensembles. It introduces a parameter-efficient, multi-headed architecture that shares a base network and enforces diversity through function-space repulsion, leveraging unlabeled or augmented data as repulsion samples. The method effectively disentangles aleatoric and epistemic uncertainty, improves out-of-distribution detection, and provides calibrated uncertainty under distribution shifts, with strong compatibility for pretrained models and minimal computational overhead. Practically, it enables scalable uncertainty-aware fine-tuning of large models and offers robust uncertainty estimates with modest training and memory costs.

Abstract

Bayesian inference in function space has gained attention due to its robustness against overparameterization in neural networks. However, approximating the infinite-dimensional function space introduces several challenges. In this work, we discuss function space inference via particle optimization and present practical modifications that improve uncertainty estimation and, most importantly, make it applicable for large and pretrained networks. First, we demonstrate that the input samples, where particle predictions are enforced to be diverse, are detrimental to the model performance. While diversity on training data itself can lead to underfitting, the use of label-destroying data augmentation, or unlabeled out-of-distribution data can improve prediction diversity and uncertainty estimates. Furthermore, we take advantage of the function space formulation, which imposes no restrictions on network parameterization other than sufficient flexibility. Instead of using full deep ensembles to represent particles, we propose a single multi-headed network that introduces a minimal increase in parameters and computation. This allows seamless integration to pretrained networks, where this repulsive last-layer ensemble can be used for uncertainty aware fine-tuning at minimal additional cost. We achieve competitive results in disentangling aleatoric and epistemic uncertainty for active learning, detecting out-of-domain data, and providing calibrated uncertainty estimates under distribution shifts with minimal compute and memory.

Paper Structure

This paper contains 30 sections, 9 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: fLLPOVI, with $N$ particles. Colored dots correspond to the prediction of a particle. Unlabeled data points from a different distribution are used as repulsion samples for the function space repulsion loss. Epistemic uncertainty (EU) is the lowest, when all particle predictions agree, and increases with the spread of the particles. The aleatoric uncertainty (AU) increases with ambiguous samples, e.g. the digit on the lower right belonging to both classes, resulting in particle predictions centered in the probability region.
  • Figure 2: Example of repulsion samples for DirtyMNIST (top row) and CIFAR10/100 (bottom row).
  • Figure 3: For regression, we show the prediction of individual particles, the mean and the standard deviation. For classification, we show the standard deviation of $p({\mathbf{y}}|{\mathbf{x}},\theta)$. are highly confident in regions distant from training data, while fLLPOVI predictions are enforced to be diverse outside of the training data.
  • Figure 4: Histograms of aleatoric versus epistemic uncertainty on ID data (MNIST, Ambiguous MNIST) and OOD data (fashion-MNIST). We compare (a) an unregularized DE-5, (b) an unregularized LLPOVI, (c) and fLLPOVI with augmented training data as repulsion samples (${\mathbf{x}}_{rep}=\textit{Patches-8}$).
  • Figure 5: Test accuracy of the model as a function of the data samples that are acquired using the different uncertainty estimates. Predictive entropy (PE) combines aleatoric and epistemic uncertainty. Using the mutual information (MI) of the LLPOVI and fLLPOVI prediction outperforms softmax entropy of the single network and performs on par with the other uncertainty baselines. The results are averaged over 5 runs.
  • ...and 5 more figures