Robust Regression over Averaged Uncertainty
Dimitris Bertsimas, Yu Ma
TL;DR
This work reframes regression under data uncertainty by averaging over all realizations of the uncertainty set, revealing an exact link to ridge regression when the uncertainty is symmetric. It provides closed-form expressions for the regularization strength across ellipsoidal, box, diamond, budget, and Schatten-norm uncertainty sets, and extends to non-symmetric polytopes where equivalence no longer holds. The authors prove that averaged-uncertainty robust regression (AUR) matches ridge regression under symmetric sets and demonstrate consistent out-of-sample improvements over the traditional worst-case robust formulation on both synthetic and real-world UCI data. The approach offers a principled, computationally tractable alternative to worst-case RO with practical benefits and broad applicability to other learning problems beyond linear regression.
Abstract
We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least squares regression problem. We show that this formulation recovers ridge regression exactly and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We further demonstrate that the condition of this equivalence relies on the geometric properties of the defined uncertainty set. We provide exact, closed-form, in some cases, analytical solutions to the equivalent regularization strength under uncertainty sets induced by $\ell_p$ norm, Schatten $p$-norm, and general polytopes. We then show in synthetic datasets with different levels of uncertainties, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. In real-world regression problems obtained from UCI datasets, similar improvements are seen in the out-of-sample datasets.
