Table of Contents
Fetching ...

Robust Predictive Uncertainty and Double Descent in Contaminated Bayesian Random Features

Michele Caprio, Katerina Papagiannouli, Siu Lun Chau, Sayan Mukherjee

TL;DR

This paper develops a robust Bayesian framework for Random Features regression by incorporating $\epsilon$-contaminated priors and $\eta$-contaminated likelihoods through credal sets and pessimistic generalized Bayes updating. It derives explicit lower/upper posterior predictive density bounds, introduces the Imprecise Highest Density Region (IHDR) for robust uncertainty quantification, and provides tractable variance bounds that retain the leading proportional-growth asymptotics of RFs. The authors show that predictive uncertainty forms affine envelopes around the classical Gaussian predictive and that the characteristic double-descent behavior of RF variance is preserved under bounded misspecification, with the envelope magnitude controlled by the contamination parameters. The IHDR admits an efficient outer approximation via adjusted Gaussian intervals, ensuring practical, computation-friendly robust predictive uncertainty in high-dimensional RF settings. Overall, the work establishes a robustness theory for Bayesian RF that delivers worst-case guarantees under misspecification while preserving essential predictive structure.

Abstract

We propose a robust Bayesian formulation of random feature (RF) regression that accounts explicitly for prior and likelihood misspecification via Huber-style contamination sets. Starting from the classical equivalence between ridge-regularized RF training and Bayesian inference with Gaussian priors and likelihoods, we replace the single prior and likelihood with $ε$- and $η$-contaminated credal sets, respectively, and perform inference using pessimistic generalized Bayesian updating. We derive explicit and tractable bounds for the resulting lower and upper posterior predictive densities. These bounds show that, when contamination is moderate, prior and likelihood ambiguity effectively acts as a direct contamination of the posterior predictive distribution, yielding uncertainty envelopes around the classical Gaussian predictive. We introduce an Imprecise Highest Density Region (IHDR) for robust predictive uncertainty quantification and show that it admits an efficient outer approximation via an adjusted Gaussian credible interval. We further obtain predictive variance bounds (under a mild truncation approximation for the upper bound) and prove that they preserve the leading-order proportional-growth asymptotics known for RF models. Together, these results establish a robustness theory for Bayesian random features: predictive uncertainty remains computationally tractable, inherits the classical double-descent phase structure, and is improved by explicit worst-case guarantees under bounded prior and likelihood misspecification.

Robust Predictive Uncertainty and Double Descent in Contaminated Bayesian Random Features

TL;DR

This paper develops a robust Bayesian framework for Random Features regression by incorporating -contaminated priors and -contaminated likelihoods through credal sets and pessimistic generalized Bayes updating. It derives explicit lower/upper posterior predictive density bounds, introduces the Imprecise Highest Density Region (IHDR) for robust uncertainty quantification, and provides tractable variance bounds that retain the leading proportional-growth asymptotics of RFs. The authors show that predictive uncertainty forms affine envelopes around the classical Gaussian predictive and that the characteristic double-descent behavior of RF variance is preserved under bounded misspecification, with the envelope magnitude controlled by the contamination parameters. The IHDR admits an efficient outer approximation via adjusted Gaussian intervals, ensuring practical, computation-friendly robust predictive uncertainty in high-dimensional RF settings. Overall, the work establishes a robustness theory for Bayesian RF that delivers worst-case guarantees under misspecification while preserving essential predictive structure.

Abstract

We propose a robust Bayesian formulation of random feature (RF) regression that accounts explicitly for prior and likelihood misspecification via Huber-style contamination sets. Starting from the classical equivalence between ridge-regularized RF training and Bayesian inference with Gaussian priors and likelihoods, we replace the single prior and likelihood with - and -contaminated credal sets, respectively, and perform inference using pessimistic generalized Bayesian updating. We derive explicit and tractable bounds for the resulting lower and upper posterior predictive densities. These bounds show that, when contamination is moderate, prior and likelihood ambiguity effectively acts as a direct contamination of the posterior predictive distribution, yielding uncertainty envelopes around the classical Gaussian predictive. We introduce an Imprecise Highest Density Region (IHDR) for robust predictive uncertainty quantification and show that it admits an efficient outer approximation via an adjusted Gaussian credible interval. We further obtain predictive variance bounds (under a mild truncation approximation for the upper bound) and prove that they preserve the leading-order proportional-growth asymptotics known for RF models. Together, these results establish a robustness theory for Bayesian random features: predictive uncertainty remains computationally tractable, inherits the classical double-descent phase structure, and is improved by explicit worst-case guarantees under bounded prior and likelihood misspecification.
Paper Structure (10 sections, 11 theorems, 48 equations, 4 figures)

This paper contains 10 sections, 11 theorems, 48 equations, 4 figures.

Key Result

Lemma 1

The following are true, and Similar characterization holds for $\overline{L}$ and $\underline{L}$. In addition,

Figures (4)

  • Figure 2: Test MSE as a function of $N/d$ under increasing levels of label misspecification $\rho$. Larger contamination amplifies the interpolation peak without shifting its location, consistent with Lemma \ref{['var-compl']} and Corollary \ref{['cor:robust-dd-variance']}.
  • Figure : (a) Bias-variance decomposition
  • Figure : (a) Bias-variance decomposition
  • Figure : (b) Variance with $\eta$-contamination bands

Theorems & Definitions (25)

  • Remark 1: Role of Bayesian Sensitivity Analysis
  • Lemma 1: Characterizing the Contamination Sets
  • Lemma 2: Lower and Upper Densities
  • Remark 2: On Envelope Densities
  • Theorem 3: Bounding the Lower Posterior Predictive Density
  • Corollary 3.1: Bounding the Upper Posterior Predictive Density
  • Definition 4: Imprecise Highest Density Region---IHDR
  • Proposition 5: Approximating the IHDR
  • Proposition 6: Bounding the Lower Predictive Variance
  • Proposition 7: Bounding the Upper Predictive Variance
  • ...and 15 more