Flow-based Distributionally Robust Optimization
Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie
TL;DR
FlowDRO introduces a flow-based framework to solve WDRO by finding a continuous Least Favorable Distribution (LFD) within a Wasserstein ball around a reference distribution $P$. By reformulating WDRO as a Wasserstein proximal problem and leveraging Brenier’s theorem to parametrize the optimal transport map with neural networks or NeuralODEs, FlowDRO learns a transport map $T$ pushing $P$ to the LFD $Q^*$ and enables sampling from $Q^*$. A block-wise progressive training scheme (inspired by JKO-iFlow) yields scalable optimization in high dimensions, and a generative sampler for $Q^*$ is constructed by composing with a separate flow that maps latent noise to $P$. Theoretical results connect the proximal formulation to a Moreau envelope and establish equivalence between the transport-map formulation and the original LFD problem, while numerical experiments show FlowDRO improves LFD discovery and robustness for adversarial learning, robust hypothesis testing, and differential privacy, especially on high-dimensional data like CIFAR-10 and MNIST.
Abstract
We present a computationally efficient framework, called $\texttt{FlowDRO}$, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD) and sample from it. The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution and develop a Wasserstein proximal gradient flow type algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on high-dimensional real data.
