Table of Contents
Fetching ...

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Liviu Aolaritei, Soroosh Shafiee, Florian Dörfler

TL;DR

This work analyzes Wasserstein distributionally robust estimators for high-dimensional linear regression, providing precise asymptotic characterizations of the estimation error via CGMT. By reformulating DRO with type-$1$ and type-$2$ Wasserstein distances into convex-concave minimax problems involving at most four scalars, the authors derive deterministic limits that depend on the under/over-parametrization ratio $\rho$ and the ambiguity radius $\varepsilon$. For Type-1, the error is captured by a four-variable minimax; for Type-2, two coupled minimax problems yield the limit, with a distributionally regularized variant reducing to a single scalar problem. The results enable efficient radius tuning that matches cross-validation in practice while reducing computational load, and are supported by numerical experiments showing universality across different feature distributions and robustness to model assumptions.

Abstract

Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius, which controls the trade-off between robustness and accuracy. Focusing on high-dimensional settings where the dimension and the number of samples are both large and comparable in size, we employ tools from high-dimensional asymptotic statistics to precisely characterize the estimation error of the resulting estimator. Remarkably, this error can be recovered by solving a simple convex-concave optimization problem involving only four scalar variables. This characterization enables efficient selection of the radius that minimizes the estimation error. In doing so, it achieves the same effect as cross-validation, but at a fraction of the computational cost. Numerical experiments confirm that our theoretical predictions closely match empirical performance and that the optimal radius selected through our method aligns with that chosen by cross-validation, highlighting both the accuracy and the practical benefits of our approach.

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

TL;DR

This work analyzes Wasserstein distributionally robust estimators for high-dimensional linear regression, providing precise asymptotic characterizations of the estimation error via CGMT. By reformulating DRO with type- and type- Wasserstein distances into convex-concave minimax problems involving at most four scalars, the authors derive deterministic limits that depend on the under/over-parametrization ratio and the ambiguity radius . For Type-1, the error is captured by a four-variable minimax; for Type-2, two coupled minimax problems yield the limit, with a distributionally regularized variant reducing to a single scalar problem. The results enable efficient radius tuning that matches cross-validation in practice while reducing computational load, and are supported by numerical experiments showing universality across different feature distributions and robustness to model assumptions.

Abstract

Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius, which controls the trade-off between robustness and accuracy. Focusing on high-dimensional settings where the dimension and the number of samples are both large and comparable in size, we employ tools from high-dimensional asymptotic statistics to precisely characterize the estimation error of the resulting estimator. Remarkably, this error can be recovered by solving a simple convex-concave optimization problem involving only four scalar variables. This characterization enables efficient selection of the radius that minimizes the estimation error. In doing so, it achieves the same effect as cross-validation, but at a fraction of the computational cost. Numerical experiments confirm that our theoretical predictions closely match empirical performance and that the optimal radius selected through our method aligns with that chosen by cross-validation, highlighting both the accuracy and the practical benefits of our approach.
Paper Structure (40 sections, 14 theorems, 265 equations, 5 figures)

This paper contains 40 sections, 14 theorems, 265 equations, 5 figures.

Key Result

Lemma 3.2

Let Assumptions assump:droassump:ell2-assump:pstar be satisfied. Then the optimal value of eq:dro is finite.

Figures (5)

  • Figure 1: The impact of the ambiguity radius $\varepsilon$ on the estimation error $\|\hat{\theta}_{\mathrm{DRE}} - \theta_0\|$. This illustrative behavior aligns with the numerical results presented in Figures \ref{['figure:high:W1:W2']} and \ref{['figure:high:universality']}. A central goal of this work is to determine, in a computationally efficient manner, the value $\varepsilon^\star$ that minimizes the estimation error in the high-dimensional regime.
  • Figure 2: Theory vs simulation for different choices of $d$ and $\rho = d/n = 0.8$.
  • Figure 3: (a) The effect of $\rho = d/n$ on the estimation error; (b) the validity of the results extends to broader classes of probability ensembles.
  • Figure 4: Type-$2$ Wasserstein distributional regularization.
  • Figure 5: Comparison between theory-based and cross-validation-based selection of $\varepsilon$ in Wasserstein-1 and Wasserstein-2 DRE.

Theorems & Definitions (39)

  • Lemma 3.2: Finite optimal value of DRE problem
  • Lemma 3.3: Ambiguity set description
  • Example 3.5: DRE $\to$ (PO)
  • Lemma 3.8: Concavity in $u$ of \ref{['eq:dro:dual:uni']}
  • Lemma 3.9: Growth rate of $u_\star$ in \ref{['eq:dro:dual:uni']}
  • Lemma 3.11: Concavity in $u$ of \ref{['eq:dro:dual:uni:3']}
  • Lemma 3.12: Growth rate of $u_\star$ in \ref{['eq:dro:dual:uni:3']}
  • Theorem 4.3: Performance of $\hat{\theta}_{\mathrm{DRE}}$ for $p=1$
  • Example 4.4: Moreau envelope for the LAD estimator
  • Theorem 4.5: Performance of $\hat{\theta}_{\mathrm{DRE}}$ for $p=2$
  • ...and 29 more