Table of Contents
Fetching ...

Enabling Uncertainty Estimation in Iterative Neural Networks

Nikita Durasov, Doruk Oner, Jonathan Donier, Hieu Le, Pascal Fua

TL;DR

The paper tackles uncertainty quantification for iterative neural networks by exploiting convergence behavior of successive refinements. It defines per-output uncertainty as $U^i = \mathrm{Var}(\{\mathbf{y}_1^i, \ldots, \mathbf{y}_N^i\})$ and trains using $\mathcal{L}_{total} = \sum_{i=1}^{N} \mathcal{D}(\mathbf{y}_{i}, \mathbf{y}^{\rm gt})$ while keeping the architecture fixed. The approach delivers competitive calibration and uncertainty quality, often on par with Deep Ensembles but at a fraction of the computational cost, demonstrated in road delineation and aerodynamic shape optimization using Bayesian optimization. Overall, convergence speed serves as a robust proxy for predictive certainty, enabling fast, practical uncertainty estimates across diverse tasks without modifying model design.

Abstract

Turning pass-through network architectures into iterative ones, which use their own output as input, is a well-known approach for boosting performance. In this paper, we argue that such architectures offer an additional benefit: The convergence rate of their successive outputs is highly correlated with the accuracy of the value to which they converge. Thus, we can use the convergence rate as a useful proxy for uncertainty. This results in an approach to uncertainty estimation that provides state-of-the-art estimates at a much lower computational cost than techniques like Ensembles, and without requiring any modifications to the original iterative model. We demonstrate its practical value by embedding it in two application domains: road detection in aerial images and the estimation of aerodynamic properties of 2D and 3D shapes.

Enabling Uncertainty Estimation in Iterative Neural Networks

TL;DR

The paper tackles uncertainty quantification for iterative neural networks by exploiting convergence behavior of successive refinements. It defines per-output uncertainty as and trains using while keeping the architecture fixed. The approach delivers competitive calibration and uncertainty quality, often on par with Deep Ensembles but at a fraction of the computational cost, demonstrated in road delineation and aerodynamic shape optimization using Bayesian optimization. Overall, convergence speed serves as a robust proxy for predictive certainty, enabling fast, practical uncertainty estimates across diverse tasks without modifying model design.

Abstract

Turning pass-through network architectures into iterative ones, which use their own output as input, is a well-known approach for boosting performance. In this paper, we argue that such architectures offer an additional benefit: The convergence rate of their successive outputs is highly correlated with the accuracy of the value to which they converge. Thus, we can use the convergence rate as a useful proxy for uncertainty. This results in an approach to uncertainty estimation that provides state-of-the-art estimates at a much lower computational cost than techniques like Ensembles, and without requiring any modifications to the original iterative model. We demonstrate its practical value by embedding it in two application domains: road detection in aerial images and the estimation of aerodynamic properties of 2D and 3D shapes.
Paper Structure (32 sections, 5 equations, 13 figures, 9 tables)

This paper contains 32 sections, 5 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Uncertainty in recursive models. Such models use their initial predictions as inputs to produce subsequent predictions. We display the output of three consecutive iterations of a network trained to compute distance maps to road pixels. (Top:) All roads are clearly visible. The three maps are similar and the per pixel variance is low. (Bottom:) The road in the red square is tree-covered. It is eventually detected properly but the variance is high.
  • Figure 2: Uncertainty vs Convergence. In this example, we generated training data from a sinusoidal function for $x \in [1, 6]$ and added Gaussian noise with a variance that increases from left to right. We take $f_{\Theta}$ to be a simple MLP with three hidden layers that takes two inputs, $x$ and the output of the previous iteration. We train it to predict the noisy data points at each step of the iterative process by minimizing the loss of Eq. \ref{['eq:loss']}. Once trained, we use $f_{\Theta}$ to produce predictions $\mathbf{Y}(x) = \{ \mathbf{y}_1(x), \ldots \mathbf{y}_{N}(x)\}$ for $x \in [-3, 7]$(Top): The red line denotes the final prediction $\mathbf{y}_{N}(x)$, and the standard deviation of $\mathbf{Y}(x)$ is shown in pink. It increases away from the data and when the data is noisy, as it should. (Bottom): The plots depict the values in the sequence $\mathbf{Y}(x)$ for four different values. For $x=2$, both aleatoric and systemic uncertainties are low and convergence quickly. For $x=5$, the aleatoric uncertainty data is high because the data is noisy and the convergence is slow. For $x=-2.0$ and $x=7.0$ the systemic uncertainty is high because the points are out of distribution and the convergence is slow or erratic.
  • Figure 3: Error vs Uncertainty. These plots illustrate the error-uncertainty relationship for three methods on the RoadTracer(Top) and Massachusetts(Bottom) datasets. Our method surpasses the others on the Massachusetts dataset and performs comparably with Ensembles on RoadTracer. Correlation numbers are in Tab. \ref{['tab:unc_results_delineation']}. The red line indicates the optimal linear fit.
  • Figure 4: Bayesian optimization pipeline.(1) Run physical simulations. (2) Train the GNN. (3) Evaluate the acquisition function on samples without an associated physical simulation. (4) Select promising samples according the acquisition function, optimize their shape, add them to the training set, and go back to step 1.
  • Figure 5: Left. Accuracy of the lift-to-drag estimate as a function of the number of exemplars used to train the emulators. Right. Lift-to-drag ratio of the shapes during optimization, as a function of number of iterations.
  • ...and 8 more figures