Flow-based Distributionally Robust Optimization

Chen Xu; Jonghyeok Lee; Xiuyuan Cheng; Yao Xie

Flow-based Distributionally Robust Optimization

Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie

TL;DR

FlowDRO introduces a flow-based framework to solve WDRO by finding a continuous Least Favorable Distribution (LFD) within a Wasserstein ball around a reference distribution $P$. By reformulating WDRO as a Wasserstein proximal problem and leveraging Brenier’s theorem to parametrize the optimal transport map with neural networks or NeuralODEs, FlowDRO learns a transport map $T$ pushing $P$ to the LFD $Q^*$ and enables sampling from $Q^*$. A block-wise progressive training scheme (inspired by JKO-iFlow) yields scalable optimization in high dimensions, and a generative sampler for $Q^*$ is constructed by composing with a separate flow that maps latent noise to $P$. Theoretical results connect the proximal formulation to a Moreau envelope and establish equivalence between the transport-map formulation and the original LFD problem, while numerical experiments show FlowDRO improves LFD discovery and robustness for adversarial learning, robust hypothesis testing, and differential privacy, especially on high-dimensional data like CIFAR-10 and MNIST.

Abstract

We present a computationally efficient framework, called $\texttt{FlowDRO}$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution and develop a Wasserstein proximal gradient flow type algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on high-dimensional real data.

Flow-based Distributionally Robust Optimization

TL;DR

FlowDRO introduces a flow-based framework to solve WDRO by finding a continuous Least Favorable Distribution (LFD) within a Wasserstein ball around a reference distribution

. By reformulating WDRO as a Wasserstein proximal problem and leveraging Brenier’s theorem to parametrize the optimal transport map with neural networks or NeuralODEs, FlowDRO learns a transport map

pushing

to the LFD

and enables sampling from

. A block-wise progressive training scheme (inspired by JKO-iFlow) yields scalable optimization in high dimensions, and a generative sampler for

is constructed by composing with a separate flow that maps latent noise to

. Theoretical results connect the proximal formulation to a Moreau envelope and establish equivalence between the transport-map formulation and the original LFD problem, while numerical experiments show FlowDRO improves LFD discovery and robustness for adversarial learning, robust hypothesis testing, and differential privacy, especially on high-dimensional data like CIFAR-10 and MNIST.

Abstract

We present a computationally efficient framework, called

, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution and develop a Wasserstein proximal gradient flow type algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on high-dimensional real data.

Paper Structure (46 sections, 10 theorems, 109 equations, 27 figures, 1 table, 2 algorithms)

This paper contains 46 sections, 10 theorems, 109 equations, 27 figures, 1 table, 2 algorithms.

Introduction
Proposed: Flow-DRO
Motivating example: Why continuous density for LFD?
Flow-based generative models
Applications
Framework
Dual formulation and Wasserstein proximal problem
Dual form and proximal problem.
Explicit form of dual function.
Solving the Wasserstein proximal problem by transport map
Connection to existing Wasserstein DRO
Reduction in the case of discrete reference measure
Connection to the dual formulation of WDRO
Theory
Preliminaries
...and 31 more sections

Key Result

Lemma 2.1

For any $\mu, \nu \in \mathcal{P}_2$, $\mathcal{W}_2( \mu, \nu) < \infty$.

Figures (27)

Figure 1: WDRO on training samples
Figure 2: Proposed FlowDRO on test samples
Figure 4: An illustration of the FlowDRO framework, which learns a sequence of invertible optimal transport maps that pushes the underlying population density $P$ to a target LFD $Q^*$; the maps are learned from finite training samples. The handwritten digits represent samples in each stage that show the gradual (continuous) transition of samples.
Figure 5: Construction of the proposed sampler from LFD. After training the proposed FlowDRO$T_{\hat{\theta}}$, we train a separate generic flow model $T_{\rm gen}$ to map between the noise distribution $P_Z$ (a multivariate Gaussian $\mathcal{N}(0, I_d)$) and the data distribution $P$. The full sampler $T_{\rm adv}=T_{\hat{\theta}} \circ T_{\rm gen}$.
Figure 6: Additive perturbation mechanism (APM)
...and 22 more figures

Theorems & Definitions (23)

Lemma 2.1
Proposition 2.2: Equivalent solution by transport map
Proposition 2.3: Dual form for discrete $P$
Lemma 3.1
Lemma 3.3: Strong differential of $\varphi$
Lemma 3.4: Strong super-differential of $\psi$
Remark 3.6
Theorem 3.7: First-order condition of LFD problem
Theorem 3.8: First-order condition of proximal problem
Remark 3.9: Correspondence between of LFD problem and proximal problem
...and 13 more

Flow-based Distributionally Robust Optimization

TL;DR

Abstract

Flow-based Distributionally Robust Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (27)

Theorems & Definitions (23)