Robust Regression over Averaged Uncertainty

Dimitris Bertsimas; Yu Ma

Robust Regression over Averaged Uncertainty

Dimitris Bertsimas, Yu Ma

TL;DR

This work reframes regression under data uncertainty by averaging over all realizations of the uncertainty set, revealing an exact link to ridge regression when the uncertainty is symmetric. It provides closed-form expressions for the regularization strength across ellipsoidal, box, diamond, budget, and Schatten-norm uncertainty sets, and extends to non-symmetric polytopes where equivalence no longer holds. The authors prove that averaged-uncertainty robust regression (AUR) matches ridge regression under symmetric sets and demonstrate consistent out-of-sample improvements over the traditional worst-case robust formulation on both synthetic and real-world UCI data. The approach offers a principled, computationally tractable alternative to worst-case RO with practical benefits and broad applicability to other learning problems beyond linear regression.

Abstract

We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least squares regression problem. We show that this formulation recovers ridge regression exactly and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We further demonstrate that the condition of this equivalence relies on the geometric properties of the defined uncertainty set. We provide exact, closed-form, in some cases, analytical solutions to the equivalent regularization strength under uncertainty sets induced by $\ell_p$ norm, Schatten $p$-norm, and general polytopes. We then show in synthetic datasets with different levels of uncertainties, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. In real-world regression problems obtained from UCI datasets, similar improvements are seen in the out-of-sample datasets.

Robust Regression over Averaged Uncertainty

TL;DR

Abstract

norm, Schatten

-norm, and general polytopes. We then show in synthetic datasets with different levels of uncertainties, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. In real-world regression problems obtained from UCI datasets, similar improvements are seen in the out-of-sample datasets.

Paper Structure (36 sections, 28 theorems, 62 equations, 3 figures, 2 tables)

This paper contains 36 sections, 28 theorems, 62 equations, 3 figures, 2 tables.

Introduction
Related Literature
Statistical Properties of Ridge Regression
Equivalence of Robustness and Ridge Regression
Interpretations of Regularization Strength
Contributions
Structure of the Paper
Brief Overview of Robust Optimization
Norms
Dual Norms
Robust Optimization
Global-Robustness
Robust Optimization under Averaged Uncertainty
Characterization of Averaged Uncertainty
Connections to Other Robustness Methods
...and 21 more sections

Key Result

Theorem 1

If $r, q \in [1, \infty]$, and $\mathcal{U}_{(q, r)} = \{\bm{\Delta}: \| \bm{\Delta} \|_{(q, r)} \leq \lambda\}$ with $\| \bm{\Delta} \| _{(q, r)} = \max_{\bm{\beta} \in \mathbb{R}} \frac{\| \bm{\Delta\beta} \|_r}{\| \bm{\beta} \|_q}$ then

Figures (3)

Figure 1: Percentage of AUR over WUR across 10 UCI datasets, where the orange line is the trend line for AUC improvements from Theorem \ref{['lp-linear-regression-main']} computed regularization strength.
Figure 2: Percentage of improvement of AUR from WUR across different synthetic datasets with different sample sizes and different informative feature sizes. The orange line indicates the trend of improvement, where it monotonically decreases as the sample size increases, and as the number of informative features increases.
Figure :

Theorems & Definitions (48)

Definition 1: RO Average
Theorem 1: origina_dbBertsimas2011
Theorem 2
Lemma 1
Lemma 2: coxeter
Lemma 3
proof
Theorem 3: KABLUCHKO2020105457
Remark
Definition 2
...and 38 more

Robust Regression over Averaged Uncertainty

TL;DR

Abstract

Robust Regression over Averaged Uncertainty

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (48)