Table of Contents
Fetching ...

Sinkhorn Distributionally Robust Optimization

Jie Wang, Rui Gao, Yao Xie

TL;DR

This work introduces Sinkhorn distributionally robust optimization (DRO), a robust framework built on entropic-regularized transport (the Sinkhorn distance). It derives a strong dual reformulation that yields a smooth, tractable objective and characterizes the worst-case distribution as absolutely continuous with respect to a reference measure. The authors develop a biased stochastic mirror descent algorithm, augmented with RT-MLMC estimators and a bisection search over the dual multiplier, and provide convergence and complexity guarantees. Through applications to the Newsvendor problem, mean-risk portfolio optimization, and adversarial multi-class classification, the method demonstrates superior out-of-sample performance and competitive computational efficiency relative to SAA, Wasserstein DRO, and KL-divergence DRO. The work offers a flexible, scalable DRO approach with practical relevance for data-driven decision-making under distributional uncertainty.

Abstract

We study distributionally robust optimization with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We derive a convex programming dual reformulation for general nominal distributions, transport costs, and loss functions. To solve the dual reformulation, we develop a stochastic mirror descent algorithm with biased subgradient estimators and derive its computational complexity guarantees. Finally, we provide numerical examples using synthetic and real data to demonstrate its superior performance.

Sinkhorn Distributionally Robust Optimization

TL;DR

This work introduces Sinkhorn distributionally robust optimization (DRO), a robust framework built on entropic-regularized transport (the Sinkhorn distance). It derives a strong dual reformulation that yields a smooth, tractable objective and characterizes the worst-case distribution as absolutely continuous with respect to a reference measure. The authors develop a biased stochastic mirror descent algorithm, augmented with RT-MLMC estimators and a bisection search over the dual multiplier, and provide convergence and complexity guarantees. Through applications to the Newsvendor problem, mean-risk portfolio optimization, and adversarial multi-class classification, the method demonstrates superior out-of-sample performance and competitive computational efficiency relative to SAA, Wasserstein DRO, and KL-divergence DRO. The work offers a flexible, scalable DRO approach with practical relevance for data-driven decision-making under distributional uncertainty.

Abstract

We study distributionally robust optimization with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We derive a convex programming dual reformulation for general nominal distributions, transport costs, and loss functions. To solve the dual reformulation, we develop a stochastic mirror descent algorithm with biased subgradient estimators and derive its computational complexity guarantees. Finally, we provide numerical examples using synthetic and real data to demonstrate its superior performance.

Paper Structure

This paper contains 34 sections, 20 theorems, 150 equations, 15 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

Let $\widehat{\mathbb{P}}\in\mathcal{P}(\mathcal{Z})$, and assume Assumption Assumption:distance:measure:function holds. Then the following holds:

Figures (15)

  • Figure 1: Visualization of worst-case distributions from Wasserstein DRO (left plot) and Sinkhorn DRO models (right three plots) with varying choices of $\epsilon$.
  • Figure 2: Experiment results of the newsvendor problem for different sample sizes and different data distributions in box plots.
  • Figure 3: Plots for the density of worst-case distributions generated by the $1$-SDRO or $2$-SDRO model for newsvendor problem with different data distributions.
  • Figure 4: Experiment results of the newsvendor problem for exponential data distribution. Subplots from different rows correspond to different training sample sizes $n\in\{10,30,100\}$. Subplots from the first and second columns correspond to the heatmap plot of the coefficient of prescriptiveness for $1$-SDRO and $2$-SDRO models with different radius and regularization parameters, and the subplots from the last column correspond to the histogram plot of the coefficient of prescriptiveness for $2$-WDRO model with different radius parameters. Each instance is taken the average of the simulation results over $50$ independent trials. For SDRO models, we add a green triangle for each radius-regularization combination that outperforms the corresponding WDRO models with the same radius choice.
  • Figure 5: Experiment results of the portfolio optimization problem for different sample sizes and dimensions in box plots.
  • ...and 10 more figures

Theorems & Definitions (38)

  • Definition 1: Sinkhorn Distance
  • Remark 1: Variants of Sinkhorn Distance
  • Remark 2: Choice of Reference Measures
  • Theorem 1: Strong Duality
  • Remark 3: Comparison with Wasserstein DRO
  • Remark 4: Worst-case Distribution
  • Remark 5: Connection with KL-divergence DRO
  • Remark 6: Connection with Bayesian DRO
  • Example 1: Linear loss
  • Example 2: Quadratic loss
  • ...and 28 more