Table of Contents
Fetching ...

Distributionally Robust Optimization

Daniel Kuhn, Soroosh Shafiee, Wolfram Wiesemann

TL;DR

This survey presents a comprehensive framework for distributionally robust optimization (DRO), where decisions x are optimized against the worst-case distribution within a chosen ambiguity set P. It develops a unified duality theory across moment, phi-divergence, and optimal-transport ambiguity sets, connecting worst-case expectations and risk measures to finite-dimensional convex programs. The work then provides analytical tools for nature's subproblem, including Jensen and Edmundson–Madansky bounds and semidefinite relaxations, and outlines practical reformulations and algorithms for DRO and risk-robust optimization. By bridging DRO with regularization, adversarial training, and coherent risk measures, the paper emphasizes both theoretical foundations and wide-ranging applications in statistics, finance, and ML. Overall, it offers a cohesive roadmap from ambiguity set construction to tractable solutions and statistical guarantees, highlighting how different ambiguity choices affect tractability, guarantees, and interpretability.

Abstract

Distributionally robust optimization (DRO) studies decision problems under uncertainty where the probability distribution governing the uncertain problem parameters is itself uncertain. A key component of any DRO model is its ambiguity set, that is, a family of probability distributions consistent with any available structural or statistical information. DRO seeks decisions that perform best under the worst distribution in the ambiguity set. This worst case criterion is supported by findings in psychology and neuroscience, which indicate that many decision-makers have a low tolerance for distributional ambiguity. DRO is rooted in statistics, operations research and control theory, and recent research has uncovered its deep connections to regularization techniques and adversarial training in machine learning. This survey presents the key findings of the field in a unified and self-contained manner.

Distributionally Robust Optimization

TL;DR

This survey presents a comprehensive framework for distributionally robust optimization (DRO), where decisions x are optimized against the worst-case distribution within a chosen ambiguity set P. It develops a unified duality theory across moment, phi-divergence, and optimal-transport ambiguity sets, connecting worst-case expectations and risk measures to finite-dimensional convex programs. The work then provides analytical tools for nature's subproblem, including Jensen and Edmundson–Madansky bounds and semidefinite relaxations, and outlines practical reformulations and algorithms for DRO and risk-robust optimization. By bridging DRO with regularization, adversarial training, and coherent risk measures, the paper emphasizes both theoretical foundations and wide-ranging applications in statistics, finance, and ML. Overall, it offers a cohesive roadmap from ambiguity set construction to tractable solutions and statistical guarantees, highlighting how different ambiguity choices affect tractability, guarantees, and interpretability.

Abstract

Distributionally robust optimization (DRO) studies decision problems under uncertainty where the probability distribution governing the uncertain problem parameters is itself uncertain. A key component of any DRO model is its ambiguity set, that is, a family of probability distributions consistent with any available structural or statistical information. DRO seeks decisions that perform best under the worst distribution in the ambiguity set. This worst case criterion is supported by findings in psychology and neuroscience, which indicate that many decision-makers have a low tolerance for distributional ambiguity. DRO is rooted in statistics, operations research and control theory, and recent research has uncovered its deep connections to regularization techniques and adversarial training in machine learning. This survey presents the key findings of the field in a unified and self-contained manner.

Paper Structure

This paper contains 93 sections, 115 theorems, 603 equations, 2 tables.

Key Result

proposition 1

For any mean-covariance pairs $(\mu, \Sigma)$ and $(\hat{\mu}, \hat{\Sigma})$ in $\R^d \times \S_+^d$, we have

Theorems & Definitions (240)

  • definition 1: Gelbrich Distance
  • proposition 1: SDP Representation of the Gelbrich Distance
  • proof
  • proposition 2: Gelbrich Uncertainty Set
  • proof
  • definition 2: Entropy Functions
  • definition 3: $\phi$-Divergences csiszar1964informationstheoretischecsiszar1967informationali1966general
  • proposition 3: Dual Representation of $\phi$-Divergences
  • proof
  • remark 1: Csiszár Duals
  • ...and 230 more