Table of Contents
Fetching ...

Gradient Flow Sampler-based Distributionally Robust Optimization

Zusen Xu, Jia-Jie Zhu

TL;DR

A mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO) is proposed and some simple reductions of this framework recover exactly previously proposed popular DRO methods, and provide new insights into their theoretical limit and optimization dynamics.

Abstract

We propose a mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO). Exploiting the recent advances in the intersection of Markov Chain Monte Carlo sampling and gradient flow theory, we show that our theoretical framework can be implemented as practical algorithms for sampling from worst-case distributions and, consequently, DRO. While numerous previous works have proposed various reformulation techniques and iterative algorithms, we contribute a sound gradient flow view of the distributional optimization that can be used to construct new algorithms. As an example of applications, we solve a class of Wasserstein and Sinkhorn DRO problems using the recently-discovered Wasserstein Fisher-Rao and Stein variational gradient flows. Notably, we also show some simple reductions of our framework recover exactly previously proposed popular DRO methods, and provide new insights into their theoretical limit and optimization dynamics. Numerical studies based on stochastic gradient descent provide empirical backing for our theoretical findings.

Gradient Flow Sampler-based Distributionally Robust Optimization

TL;DR

A mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO) is proposed and some simple reductions of this framework recover exactly previously proposed popular DRO methods, and provide new insights into their theoretical limit and optimization dynamics.

Abstract

We propose a mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO). Exploiting the recent advances in the intersection of Markov Chain Monte Carlo sampling and gradient flow theory, we show that our theoretical framework can be implemented as practical algorithms for sampling from worst-case distributions and, consequently, DRO. While numerous previous works have proposed various reformulation techniques and iterative algorithms, we contribute a sound gradient flow view of the distributional optimization that can be used to construct new algorithms. As an example of applications, we solve a class of Wasserstein and Sinkhorn DRO problems using the recently-discovered Wasserstein Fisher-Rao and Stein variational gradient flows. Notably, we also show some simple reductions of our framework recover exactly previously proposed popular DRO methods, and provide new insights into their theoretical limit and optimization dynamics. Numerical studies based on stochastic gradient descent provide empirical backing for our theoretical findings.

Paper Structure

This paper contains 37 sections, 10 theorems, 70 equations, 8 figures, 6 algorithms.

Key Result

Lemma 1

The variational problem eq:proximal-EOT is equivalent to the Schrödinger half bridge problem Consequently, it is equivalent to the minimization of the expected KL divergence with respect to the conditional distribution The optimal marginal distribution of $Y$ in eq:sb-klform-half-bridge is given by a mixture distribution, for some normalization constant $Z_x$:

Figures (8)

  • Figure 1: Robust Decision Boundaries with Biased Data.(a) Decision boundaries learned by different methods on the biased circle dataset. The training data are shown as orange (positive class) and blue (negative class) points. The final classification boundaries are shown for the WRM (green), WGF (purple), WFR (red), and Dual (brown) models. All models were trained for 40 epochs. We set the regularization parameter $\tau=2.5$ for all methods and the entropy regularization $\epsilon=0.15$ for methods based on entropy-regularized Wasserstein DRO problem. (b) Samples from the worst-case distribution generated by WFR sampler at the first epoch. (c) Worst-case samples generated by WRM method at the first epoch. WRM can only generate discrete distributions as worst-case distribution while entropy-regularized DRO uses potentially continuous distributions as worst-case distribution. The original data points are shown as circles with black edge and the worst-case samples are shown by circles with a shallower color.
  • Figure 2: Decision boundary comparison for all methods on the two-moon classification task. For all DRO methods, we set $\tau=0.1$, and for SDRO methods, we set $\epsilon=0.01$. In each inner loop, WFR and WGF generate $m=5$ particles.
  • Figure 3: Evolution of $\mathbb{E}[\widetilde{V}_{x, \tau}(z)]$. We run all methods for 300 steps with a stepsize of 0.01. For RGO (blue), we run a rejection sampling procedure after solving the inner optimization problem. SVG_0.1 and SVG_0.2 denote initial distributions with a standard deviation of 0.1 and 0.2, respectively.
  • Figure 4: Visualization of perturbed samples (gray-edged smaller circles) generated from original data (black-edged bigger circles) against the SAA boundary at the final step. For all methods, we use a stepsize of 0.01 and run for 300 iterations. For WFR, the intensity of points visualizes the sample weights. We show perturbations by SVG-DRO with different initializations.
  • Figure 5: Evolution of particle positions in the SVG method. The driving force guides initial convergence, while the repulsive force prevents complete collapse. But in this experiment, the repulsive force fails to push particles apart efficiently, leading to the mode collapse.
  • ...and 3 more figures

Theorems & Definitions (17)

  • Definition 1: Gradient system
  • Lemma 1
  • Example 1: Wasserstein GF for SDRO
  • Lemma 2
  • Remark 1: sinha2020certifyingdistributionalrobustnessprincipled's WRM
  • Definition 2
  • Proposition 1
  • Theorem 1: Outer Loop Convergence
  • Theorem 2: Complexity of Algorithm \ref{['alg:SDRO-NGD']}
  • Remark 2
  • ...and 7 more