Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Liviu Aolaritei; Soroosh Shafiee; Florian Dörfler

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Liviu Aolaritei, Soroosh Shafiee, Florian Dörfler

TL;DR

This work analyzes Wasserstein distributionally robust estimators for high-dimensional linear regression, providing precise asymptotic characterizations of the estimation error via CGMT. By reformulating DRO with type-$1$ and type-$2$ Wasserstein distances into convex-concave minimax problems involving at most four scalars, the authors derive deterministic limits that depend on the under/over-parametrization ratio $\rho$ and the ambiguity radius $\varepsilon$. For Type-1, the error is captured by a four-variable minimax; for Type-2, two coupled minimax problems yield the limit, with a distributionally regularized variant reducing to a single scalar problem. The results enable efficient radius tuning that matches cross-validation in practice while reducing computational load, and are supported by numerical experiments showing universality across different feature distributions and robustness to model assumptions.

Abstract

Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius, which controls the trade-off between robustness and accuracy. Focusing on high-dimensional settings where the dimension and the number of samples are both large and comparable in size, we employ tools from high-dimensional asymptotic statistics to precisely characterize the estimation error of the resulting estimator. Remarkably, this error can be recovered by solving a simple convex-concave optimization problem involving only four scalar variables. This characterization enables efficient selection of the radius that minimizes the estimation error. In doing so, it achieves the same effect as cross-validation, but at a fraction of the computational cost. Numerical experiments confirm that our theoretical predictions closely match empirical performance and that the optimal radius selected through our method aligns with that chosen by cross-validation, highlighting both the accuracy and the practical benefits of our approach.

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

TL;DR

and type-

Wasserstein distances into convex-concave minimax problems involving at most four scalars, the authors derive deterministic limits that depend on the under/over-parametrization ratio

and the ambiguity radius

. For Type-1, the error is captured by a four-variable minimax; for Type-2, two coupled minimax problems yield the limit, with a distributionally regularized variant reducing to a single scalar problem. The results enable efficient radius tuning that matches cross-validation in practice while reducing computational load, and are supported by numerical experiments showing universality across different feature distributions and robustness to model assumptions.

Abstract

Paper Structure (40 sections, 14 theorems, 265 equations, 5 figures)

This paper contains 40 sections, 14 theorems, 265 equations, 5 figures.

Introduction
Contributions.
Paper organization.
Notation.
Convex Gaussian Minimax Theorem
Wasserstein Distributionally Robust Estimation
Type-1 Wasserstein DRE
Type-2 Wasserstein DRE
Type-2 Wasserstein Distributional Regularization
High-Dimensional Error Analysis
Type-1 Wasserstein DRE
Type-2 Wasserstein DRE
Type-2 Wasserstein Distributional Regularization
Numerical Experiments
Radius Tuning in High Dimensions
...and 25 more sections

Key Result

Lemma 3.2

Let Assumptions assump:droassump:ell2-assump:pstar be satisfied. Then the optimal value of eq:dro is finite.

Figures (5)

Figure 1: The impact of the ambiguity radius $\varepsilon$ on the estimation error $\|\hat{\theta}_{\mathrm{DRE}} - \theta_0\|$. This illustrative behavior aligns with the numerical results presented in Figures \ref{['figure:high:W1:W2']} and \ref{['figure:high:universality']}. A central goal of this work is to determine, in a computationally efficient manner, the value $\varepsilon^\star$ that minimizes the estimation error in the high-dimensional regime.
Figure 2: Theory vs simulation for different choices of $d$ and $\rho = d/n = 0.8$.
Figure 3: (a) The effect of $\rho = d/n$ on the estimation error; (b) the validity of the results extends to broader classes of probability ensembles.
Figure 4: Type-$2$ Wasserstein distributional regularization.
Figure 5: Comparison between theory-based and cross-validation-based selection of $\varepsilon$ in Wasserstein-1 and Wasserstein-2 DRE.

Theorems & Definitions (39)

Lemma 3.2: Finite optimal value of DRE problem
Lemma 3.3: Ambiguity set description
Example 3.5: DRE $\to$ (PO)
Lemma 3.8: Concavity in $u$ of \ref{['eq:dro:dual:uni']}
Lemma 3.9: Growth rate of $u_\star$ in \ref{['eq:dro:dual:uni']}
Lemma 3.11: Concavity in $u$ of \ref{['eq:dro:dual:uni:3']}
Lemma 3.12: Growth rate of $u_\star$ in \ref{['eq:dro:dual:uni:3']}
Theorem 4.3: Performance of $\hat{\theta}_{\mathrm{DRE}}$ for $p=1$
Example 4.4: Moreau envelope for the LAD estimator
Theorem 4.5: Performance of $\hat{\theta}_{\mathrm{DRE}}$ for $p=2$
...and 29 more

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

TL;DR

Abstract

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (39)