Table of Contents
Fetching ...

Quantifying Uncertainty in the Presence of Distribution Shifts

Yuli Slavutsky, David M. Blei

TL;DR

Neural networks often provide unreliable uncertainty under covariate distribution shifts. The authors introduce VIDS, a Bayesian framework that uses a covariate-conditioned adaptive prior and amortized variational inference to produce posterior predictive uncertainty that responds to shift proximity; they also generate synthetic environments via bootstrap to simulate potential shifts. The approach jointly learns an amortized posterior and trains across multiple environments, improving calibration and robustness across synthetic and real data for classification and regression. This work advances trustworthy uncertainty quantification in settings where test-time covariate shifts are common and potentially harmful.

Abstract

Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for uncertainty estimation that explicitly accounts for covariate shifts. While conventional approaches rely on fixed priors, the key idea of our method is an adaptive prior, conditioned on both training and new covariates. This prior naturally increases uncertainty for inputs that lie far from the training distribution in regions where predictive performance is likely to degrade. To efficiently approximate the resulting posterior predictive distribution, we employ amortized variational inference. Finally, we construct synthetic environments by drawing small bootstrap samples from the training data, simulating a range of plausible covariate shift using only the original dataset. We evaluate our method on both synthetic and real-world data. It yields substantially improved uncertainty estimates under distribution shifts.

Quantifying Uncertainty in the Presence of Distribution Shifts

TL;DR

Neural networks often provide unreliable uncertainty under covariate distribution shifts. The authors introduce VIDS, a Bayesian framework that uses a covariate-conditioned adaptive prior and amortized variational inference to produce posterior predictive uncertainty that responds to shift proximity; they also generate synthetic environments via bootstrap to simulate potential shifts. The approach jointly learns an amortized posterior and trains across multiple environments, improving calibration and robustness across synthetic and real data for classification and regression. This work advances trustworthy uncertainty quantification in settings where test-time covariate shifts are common and potentially harmful.

Abstract

Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for uncertainty estimation that explicitly accounts for covariate shifts. While conventional approaches rely on fixed priors, the key idea of our method is an adaptive prior, conditioned on both training and new covariates. This prior naturally increases uncertainty for inputs that lie far from the training distribution in regions where predictive performance is likely to degrade. To efficiently approximate the resulting posterior predictive distribution, we employ amortized variational inference. Finally, we construct synthetic environments by drawing small bootstrap samples from the training data, simulating a range of plausible covariate shift using only the original dataset. We evaluate our method on both synthetic and real-world data. It yields substantially improved uncertainty estimates under distribution shifts.

Paper Structure

This paper contains 33 sections, 1 theorem, 19 equations, 9 figures, 5 tables, 2 algorithms.

Key Result

Proposition B.1

Let $B_1,\dots,B_k$ be a partition of $\mathcal{X}$. Denote the binned empirical distribution of the train set $x_{1:N}$ as $p\in\Delta^k$, where $p(i) = \frac{1}{N} \sum_{j=1}^N \mathds{1} \{x_j \in B_i \} \forall 1\leq i\leq k$. Similarly, define the binned test distribution induced by the partiti

Figures (9)

  • Figure 1: (a) In training one covariate is fixed; the data lies on a one-dimensional subspace. All predictors intersecting the fixed axis at the same point are equivalent. (b) At test time, variation along the second dimension reveals that some predictors may fit better the new data, prompting a prior shift. (c) Possible labeling where only the solid line separates the test data.
  • Figure 2: Graphical model. Thick red arrows denote additional dependencies introduced by our model. Observed variables shown in gray.
  • Figure 3: Changes in the prior due to the introduction of test covariates drawn from a shifted distribution $x^* \sim \mathcal{N}(\frac{1}{2}, 1)$, where both features vary.
  • Figure 4: Optimization mechanism for a single test example in a single synthetic environment.
  • Figure 5: Simulation results. Red crosses represent training data, and gray dots test data. Black lines depict predictions, gray shaded area spanning $\pm 1$ standard deviation. Top: Heteroskedastic linear regression for $a=0.5$; Bottom: Binary classification with missing data for $t=0.3$. VIDS is the only one to capture correct variance structures and thus achieves the best results.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Proposition B.1
  • proof
  • Remark B.2