Table of Contents
Fetching ...

Distributionally Robust Stochastic Optimization with Wasserstein Distance

Rui Gao, Anton J. Kleywegt

TL;DR

This work develops a constructive dual framework for distributionally robust stochastic optimization using Wasserstein distance-based ambiguity sets, enabling tractable analysis even with general (possibly infinite-dimensional) nominal distributions. It proves strong duality, characterizes the structure of worst-case distributions (often supported on at most N+1 points), and shows how data-driven DRSO problems can be well approximated by robust optimization, including practical two-stage and VaR applications. The approach yields actionable insights for choosing the ambiguity radius and demonstrates applicability to infinite-dimensional process control and intensity estimation problems. Overall, the results bridge DRSO and robust optimization, providing both theoretical guarantees and practical computational schemes for robust decision-making under distributional uncertainty.

Abstract

Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is a known true underlying probability distribution, one hedges against a chosen set of distributions. In this paper we first point out that the set of distributions should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices. We next consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. Such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets. (2) The problem of determining the worst-case expectation over the resulting set of distributions has desirable tractability properties. We derive a strong duality reformulation of the corresponding DRSO problem and construct approximate worst-case distributions explicitly via the first-order optimality conditions of the dual problem. Our contributions are four-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which are naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) Our strong duality result holds in a very general setting. As examples, we show that it can be applied to infinite-dimensional process control and intensity estimation for point processes.

Distributionally Robust Stochastic Optimization with Wasserstein Distance

TL;DR

This work develops a constructive dual framework for distributionally robust stochastic optimization using Wasserstein distance-based ambiguity sets, enabling tractable analysis even with general (possibly infinite-dimensional) nominal distributions. It proves strong duality, characterizes the structure of worst-case distributions (often supported on at most N+1 points), and shows how data-driven DRSO problems can be well approximated by robust optimization, including practical two-stage and VaR applications. The approach yields actionable insights for choosing the ambiguity radius and demonstrates applicability to infinite-dimensional process control and intensity estimation problems. Overall, the results bridge DRSO and robust optimization, providing both theoretical guarantees and practical computational schemes for robust decision-making under distributional uncertainty.

Abstract

Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is a known true underlying probability distribution, one hedges against a chosen set of distributions. In this paper we first point out that the set of distributions should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices. We next consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. Such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets. (2) The problem of determining the worst-case expectation over the resulting set of distributions has desirable tractability properties. We derive a strong duality reformulation of the corresponding DRSO problem and construct approximate worst-case distributions explicitly via the first-order optimality conditions of the dual problem. Our contributions are four-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which are naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) Our strong duality result holds in a very general setting. As examples, we show that it can be applied to infinite-dimensional process control and intensity estimation for point processes.

Paper Structure

This paper contains 32 sections, 19 theorems, 257 equations, 7 figures, 2 tables.

Key Result

Lemma 1

For any finite Borel measures $\mu,\nu \in \mathcal{B}(\Xi)$ with $\mu(\Xi) \neq \nu(\Xi)$, it holds that $W_{p}(\mu,\nu) = \infty$.

Figures (7)

  • Figure 1: Three images and their gray-scale histograms. For KL divergence, it holds that $I_{\phi_{KL}}(\mu_{true},\nu) = 5.05 > I_{\phi_{KL}}(\mu_{pathol},\nu) = 2.33$, while in contrast, Wasserstein distance satisfies $W_{1}(\mu_{true},\nu) = 30.70 < W_{1}(\mu_{pathol},\nu) = 84.03$.
  • Figure 2: Examples for existence and non-existence of the worst-case distribution
  • Figure 3: When $\Psi = -\mathds{1}_C$, then the worst-case distribution perturbs the nominal distribution in a greedy fashion. The solid and diamond dots are the support of the nominal distribution $\nu$. $\widehat{\xi}^{1},\widehat{\xi}^{2},\widehat{\xi}^{3}$ are the three interior points closest to $\partial C$ and thus are transported to $\xi_{\ast}^{1},\xi_{\ast}^{2},\xi_{\ast}^{3}$ respectively. $\widehat{\xi}^{4}$ is the interior point fourth closest to $\partial C$, but its full mass cannot be transported to $\partial C$ due to the Wasserstein distance constraint, so it is split and the parts are moved to $\overline{\xi}_{\ast}^{4}$ and $\underline{\xi}_{\ast}^{4} = \widehat{\xi}^{4}$.
  • Figure 4: Optimal on/off system control for the true process and the DRSO.
  • Figure 5: Estimated intensity functions using Wasserstein DRSO and MLE
  • ...and 2 more figures

Theorems & Definitions (42)

  • Example 1
  • Definition 1: Push-forward Measure
  • Definition 2: Wasserstein distance
  • Example 2: Transportation problem
  • Example 3: Revisiting Example \ref{['eg:bird']}
  • Lemma 1
  • Proposition 1: Weak duality
  • Definition 3: Regularization Operator $\Phi$
  • Definition 4: Growth rate
  • Lemma 2: Properties of the growth rate $\kappa$
  • ...and 32 more