ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference
Nikita Durasov, Nik Dorndorf, Hieu Le, Pascal Fua
TL;DR
ZigZag tackles the challenge of reliable uncertainty estimation without the overhead of sampling-based methods. By modifying the network to accept an extra input and training it to produce consistent outputs with and without prior information, it enables a fast two-pass inference where the uncertainty is the distance between the two predictions, $\hat{\mathbf{u}} = ||\mathbf{y}_0 - \mathbf{y}_1||$. The approach delivers uncertainty estimates on par with ensembles across classification and regression tasks while requiring only two forward passes, making it broadly applicable with minimal architectural changes. Empirically, ZigZag matches or outperformes sampling-based baselines on multiple benchmarks (MNIST, CIFAR, ImageNet, age, aerodynamics, etc.) and OOD detection tasks, while achieving substantially lower compute and memory costs. This sampling-free method provides a practical, scalable alternative for uncertainty estimation in real-world deployments and supports robust decision-making under distributional shifts.
Abstract
Whereas the ability of deep networks to produce useful predictions has been amply demonstrated, estimating the reliability of these predictions remains challenging. Sampling approaches such as MC-Dropout and Deep Ensembles have emerged as the most popular ones for this purpose. Unfortunately, they require many forward passes at inference time, which slows them down. Sampling-free approaches can be faster but suffer from other drawbacks, such as lower reliability of uncertainty estimates, difficulty of use, and limited applicability to different types of tasks and data. In this work, we introduce a sampling-free approach that is generic and easy to deploy, while producing reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost. It is predicated on training the network to produce the same output with and without additional information about it. At inference time, when no prior information is given, we use the network's own prediction as the additional information. We then take the distance between the predictions with and without prior information as our uncertainty measure. We demonstrate our approach on several classification and regression tasks. We show that it delivers results on par with those of Ensembles but at a much lower computational cost.
