Comparative Analysis of Two-Stage Distributionally Robust Optimization over 1-Wasserstein and 2-Wasserstein Balls

Geunyeong Byeon

Comparative Analysis of Two-Stage Distributionally Robust Optimization over 1-Wasserstein and 2-Wasserstein Balls

Geunyeong Byeon

TL;DR

This work compares 1- and 2-Wasserstein ambiguity sets in two-stage distributionally robust optimization with RHS uncertainty, revealing a pathological behavior for $W_1$ absent in $W_2$. By deriving worst-case distributions for $\Xi = \mathbb{R}^k$ or $\mathbb{R}^k_+$ and providing closed-form newsvendor solutions, it shows that $W_2$-based decisions are more informed and robust across a broader range of radii. A penalty-based dual interpretation explains why $W_2$ yields superior out-of-sample performance even for general $\Xi$, aligning with empirical observations. The results guide the choice of Wasserstein ambiguity sets in practice and extend to general conic settings via a unified framework.

Abstract

This paper investigates advantages of using 2-Wasserstein ambiguity sets over 1-Wasserstein sets in two-stage distributionally robust optimization with right-hand side uncertainty. We examine the worst-case distributions within 1- and 2-Wasserstein balls under both unrestricted and nonnegative orthant supports, highlighting a pathological behavior arising in 1-Wasserstein balls. Closed-form solutions for a single-scenario newsvendor problem illustrate that 2-Wasserstein balls enable more informed decisions. Additionally, a penalty-based dual interpretation suggests that 2-Wasserstein balls may outperform 1-Wasserstein balls across a broader range of Wasserstein radii, even with general support sets.

Comparative Analysis of Two-Stage Distributionally Robust Optimization over 1-Wasserstein and 2-Wasserstein Balls

TL;DR

This work compares 1- and 2-Wasserstein ambiguity sets in two-stage distributionally robust optimization with RHS uncertainty, revealing a pathological behavior for

absent in

. By deriving worst-case distributions for

and providing closed-form newsvendor solutions, it shows that

-based decisions are more informed and robust across a broader range of radii. A penalty-based dual interpretation explains why

yields superior out-of-sample performance even for general

, aligning with empirical observations. The results guide the choice of Wasserstein ambiguity sets in practice and extend to general conic settings via a unified framework.

Abstract

Paper Structure (16 sections, 5 theorems, 22 equations, 4 figures)

This paper contains 16 sections, 5 theorems, 22 equations, 4 figures.

Introduction
Problem statement
Motivating example
Related Literature
Contributions
Notation
Preliminaries and assumptions
$\Xi=\mathbb R^k$ or $\Xi=\mathbb R^k_+$
$r=1$
$r=2$
Example: single-scenario newsvendor problem
Discussion on general $\Xi$
Proof of Proposition \ref{['prop:r1:worst-expectation']}
Proof of Proposition \ref{['prop:1-wass-worst-dist']}
Proof of Proposition \ref{['prop:2-wass-worst-dist']}
...and 1 more sections

Key Result

Proposition 1

Under Assumption assum, for any fixed $x \in \mathbb{R}^{n_x}$, the dual of $\mathbb E_{\mathbb P \in \mathcal{P}^{r}_{N,\epsilon}}[Z(x,\tilde{\xi})]$ is and when ${\epsilon} > 0$, strong duality holds and the minimum is attained.

Figures (4)

Figure 1: (a)-(c): The median values (line) and the 10th to 90th percentiles (shaded region) of $\tilde{O}^{r}_5(\epsilon)$, where the expectation is estimated using 2,000 testing samples drawn from $\mathbb P^{\texttt{true}}$. (d): $\mathbb P[\tilde{O}^{2}_5(\epsilon) \le \tilde{O}^{1}_5(\epsilon) \times 1.001]$. For each choice of $\epsilon$, $\tilde{O}^{r}_5(\epsilon)$ is simulated over 20 runs
Figure 2: Worst-case probability distribution in $\mathcal{P}^{2}_{N,\epsilon}$, where $\pi^*_{[i]} = \arg\max_{\pi \in \Pi}\pi^\top (Tx+\zeta^i)+\frac{1}{4\lambda^*}\|\pi\|^2$.
Figure 3: $\tilde{O}^r_{1}(\epsilon)$ for $r=1,2$, and $\tilde{x}^2_1(\epsilon)$ when $\mathbb{P}^{\texttt{true}} = \mathcal{N}(-10, 1.5)$, $\texttt{p} = 5$, $\texttt{c} = 4.5$.
Figure 4: The graph of $g$ when $-\zeta - \frac{\epsilon}{2}\sqrt{\frac{\texttt{p}}{\texttt{p} - \texttt{c}}}>0$.

Theorems & Definitions (8)

Proposition 1
Proposition 2
Proposition 3
Remark 1
Proposition 4
Remark 2
Proposition 5
Remark 3

Comparative Analysis of Two-Stage Distributionally Robust Optimization over 1-Wasserstein and 2-Wasserstein Balls

TL;DR

Abstract

Comparative Analysis of Two-Stage Distributionally Robust Optimization over 1-Wasserstein and 2-Wasserstein Balls

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (8)