Table of Contents
Fetching ...

ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference

Nikita Durasov, Nik Dorndorf, Hieu Le, Pascal Fua

TL;DR

ZigZag tackles the challenge of reliable uncertainty estimation without the overhead of sampling-based methods. By modifying the network to accept an extra input and training it to produce consistent outputs with and without prior information, it enables a fast two-pass inference where the uncertainty is the distance between the two predictions, $\hat{\mathbf{u}} = ||\mathbf{y}_0 - \mathbf{y}_1||$. The approach delivers uncertainty estimates on par with ensembles across classification and regression tasks while requiring only two forward passes, making it broadly applicable with minimal architectural changes. Empirically, ZigZag matches or outperformes sampling-based baselines on multiple benchmarks (MNIST, CIFAR, ImageNet, age, aerodynamics, etc.) and OOD detection tasks, while achieving substantially lower compute and memory costs. This sampling-free method provides a practical, scalable alternative for uncertainty estimation in real-world deployments and supports robust decision-making under distributional shifts.

Abstract

Whereas the ability of deep networks to produce useful predictions has been amply demonstrated, estimating the reliability of these predictions remains challenging. Sampling approaches such as MC-Dropout and Deep Ensembles have emerged as the most popular ones for this purpose. Unfortunately, they require many forward passes at inference time, which slows them down. Sampling-free approaches can be faster but suffer from other drawbacks, such as lower reliability of uncertainty estimates, difficulty of use, and limited applicability to different types of tasks and data. In this work, we introduce a sampling-free approach that is generic and easy to deploy, while producing reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost. It is predicated on training the network to produce the same output with and without additional information about it. At inference time, when no prior information is given, we use the network's own prediction as the additional information. We then take the distance between the predictions with and without prior information as our uncertainty measure. We demonstrate our approach on several classification and regression tasks. We show that it delivers results on par with those of Ensembles but at a much lower computational cost.

ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference

TL;DR

ZigZag tackles the challenge of reliable uncertainty estimation without the overhead of sampling-based methods. By modifying the network to accept an extra input and training it to produce consistent outputs with and without prior information, it enables a fast two-pass inference where the uncertainty is the distance between the two predictions, . The approach delivers uncertainty estimates on par with ensembles across classification and regression tasks while requiring only two forward passes, making it broadly applicable with minimal architectural changes. Empirically, ZigZag matches or outperformes sampling-based baselines on multiple benchmarks (MNIST, CIFAR, ImageNet, age, aerodynamics, etc.) and OOD detection tasks, while achieving substantially lower compute and memory costs. This sampling-free method provides a practical, scalable alternative for uncertainty estimation in real-world deployments and supports robust decision-making under distributional shifts.

Abstract

Whereas the ability of deep networks to produce useful predictions has been amply demonstrated, estimating the reliability of these predictions remains challenging. Sampling approaches such as MC-Dropout and Deep Ensembles have emerged as the most popular ones for this purpose. Unfortunately, they require many forward passes at inference time, which slows them down. Sampling-free approaches can be faster but suffer from other drawbacks, such as lower reliability of uncertainty estimates, difficulty of use, and limited applicability to different types of tasks and data. In this work, we introduce a sampling-free approach that is generic and easy to deploy, while producing reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost. It is predicated on training the network to produce the same output with and without additional information about it. At inference time, when no prior information is given, we use the network's own prediction as the additional information. We then take the distance between the predictions with and without prior information as our uncertainty measure. We demonstrate our approach on several classification and regression tasks. We show that it delivers results on par with those of Ensembles but at a much lower computational cost.
Paper Structure (45 sections, 3 equations, 10 figures, 9 tables)

This paper contains 45 sections, 3 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: ZigZaging. At inference time, we make two forward passes. First, we use $[\mathbf{x}, \textbf{0}]$ as input to produce a prediction $\mathbf{y}_{0}$. Second, we feed $[\mathbf{x}, \mathbf{y}_{0}]$ to the network and generate $\mathbf{y}_{1}$. We take $\| \mathbf{y}_{0} - \mathbf{y}_{1} \|$ to be our uncertainty estimate. In essence, the second pass performs a reconstruction in much the same way an auto-encoder does and a high reconstruction error correlates with uncertainty.
  • Figure 2: Autoencoder Reconstruction Error An autoencoder trained exclusively on cat images yields accurate reconstructions on other cat images (left) and inaccurate ones on dog images (right). Thus, the distance between an image and its reconstruction can be used to estimate whether that image is likely to be a cat image or not.
  • Figure 3: Architecture Modification. Given a model with weights $W_1 \in \mathbb{R}^{d \times h}, W_2 \in \mathbb{R}^{h \times 1}$, we modify its first layer $W_1$ to accept two inputs instead of only one. The modified model consists of $\widetilde{W_1} \in \mathbb{R}^{(d+1) \times h}$ and $W_2 \in \mathbb{R}^{h \times 1}$ and can process the concatenation of the original input $\mathbf{x}$ and additional value $\mathbf{y}_{0}$.
  • Figure 4: True vs Estimated Error. We use MNIST (left) and CIFAR (right) validation data to plot the true prediction errors as measured by the loss being minimized against our uncertainty estimates $\|\hat{\mathcal{M}}(\mathbf{x},\mathbf{0})-\hat{\mathcal{M}}(\mathbf{x},\hat{\mathcal{M}}(\mathbf{x},\mathbf{0}))\|$ for individual samples. In both cases, the correlation is strong and Pearson's correlation coefficient is above 90%. The red line represents a linear fit to the data.
  • Figure 5: Uncertainty Estimation for Classification. The task is to classify data points drawn in the range $x \in [-2, 3]$, $y \in [-2, 2]$ as being red or blue given the red and blue training samples from two interleaving half circles with added Gaussian noise. The background color depicts the classification uncertainty assigned by different techniques to individual grid points. Violet is low and yellow is high. (a) Single model, (b) MC-Dropout, (c) Deep Ensembles, (d)ZigZag.
  • ...and 5 more figures