Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning
Liviu Aolaritei, Soroosh Shafiee, Florian Dörfler
TL;DR
This work analyzes Wasserstein distributionally robust estimators for high-dimensional linear regression, providing precise asymptotic characterizations of the estimation error via CGMT. By reformulating DRO with type-$1$ and type-$2$ Wasserstein distances into convex-concave minimax problems involving at most four scalars, the authors derive deterministic limits that depend on the under/over-parametrization ratio $\rho$ and the ambiguity radius $\varepsilon$. For Type-1, the error is captured by a four-variable minimax; for Type-2, two coupled minimax problems yield the limit, with a distributionally regularized variant reducing to a single scalar problem. The results enable efficient radius tuning that matches cross-validation in practice while reducing computational load, and are supported by numerical experiments showing universality across different feature distributions and robustness to model assumptions.
Abstract
Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius, which controls the trade-off between robustness and accuracy. Focusing on high-dimensional settings where the dimension and the number of samples are both large and comparable in size, we employ tools from high-dimensional asymptotic statistics to precisely characterize the estimation error of the resulting estimator. Remarkably, this error can be recovered by solving a simple convex-concave optimization problem involving only four scalar variables. This characterization enables efficient selection of the radius that minimizes the estimation error. In doing so, it achieves the same effect as cross-validation, but at a fraction of the computational cost. Numerical experiments confirm that our theoretical predictions closely match empirical performance and that the optimal radius selected through our method aligns with that chosen by cross-validation, highlighting both the accuracy and the practical benefits of our approach.
