Worst-case generation via minimax optimization in Wasserstein space
Xiuyuan Cheng, Yao Xie, Linglingzhi Zhu, Yunqin Zhu
TL;DR
This work tackles worst-case sample generation under distribution shifts by formulating a minimax problem over Wasserstein space and recasting the inner maximization as a transport-map pushforward from a reference measure. It proposes a single-loop Gradient Descent-Ascent (GDA) algorithm with a neural transport map, providing convergence guarantees across various nonconvex regimes and enabling scalable, out-of-sample worst-case generation through L^2 transport-map matching. Theoretical results cover NC-PL, NC-SC, and NC-NC settings, complemented by practical particle-optimization and neural transport-map algorithms implemented on finite samples. Empirical validation on synthetic 2D data and image datasets (MNIST and CIFAR-10) demonstrates meaningful worst-case distributions and effective generalization via the learned transport map, highlighting the method’s robustness and scalability for stress-testing and robustness certification in high-dimensional settings.
Abstract
Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids and medical prediction systems. We develop a generative modeling framework for worst-case generation for a pre-specified risk, based on min-max optimization over continuous probability distributions, namely the Wasserstein space. Unlike traditional discrete distributionally robust optimization approaches, which often suffer from scalability issues, limited generalization, and costly worst-case inference, our framework exploits the Brenier theorem to characterize the least favorable (worst-case) distribution as the pushforward of a transport map from a continuous reference measure, enabling a continuous and expressive notion of risk-induced generation beyond classical discrete DRO formulations. Based on the min-max formulation, we propose a Gradient Descent Ascent (GDA)-type scheme that updates the decision model and the transport map in a single loop, establishing global convergence guarantees under mild regularity assumptions and possibly without convexity-concavity. We also propose to parameterize the transport map using a neural network that can be trained simultaneously with the GDA iterations by matching the transported training samples, thereby achieving a simulation-free approach. The efficiency of the proposed method as a risk-induced worst-case generator is validated by numerical experiments on synthetic and image data.
