Efficient Generative Modeling via Penalized Optimal Transport Network
Wenhui Sophia Lu, Chenyang Zhong, Wing Hung Wong
TL;DR
The paper tackles the instability and mode-collapse issues of traditional Wasserstein-based generative models for high-dimensional data by introducing the Marginally-Penalized Wasserstein (MPW) distance and the Penalized Optimal Transport Network (POTNet). By optimizing a primal MPW objective that combines joint transport with coordinate-wise marginal penalties, POTNet eliminates the need for a critic, supports mixed data types, and leverages fast marginal convergence to mitigate mode dropping and tail shrinkage. The authors establish non-asymptotic generalization bounds and demonstrate theoretical attenuation of Type I and II mode collapse, along with substantial empirical gains across synthetic benchmarks and real data—achieving accurate data structure capture and orders-of-magnitude speedups in sampling. The approach yields robust performance on tabular data, scalable image generation, and competitive inference tasks, highlighting the practical impact of marginal information in high-dimensional generative modeling.
Abstract
The generation of synthetic data with distributions that faithfully emulate the underlying data-generating mechanism holds paramount significance. Wasserstein Generative Adversarial Networks (WGANs) have emerged as a prominent tool for this task; however, due to the delicate equilibrium of the minimax formulation and the instability of Wasserstein distance in high dimensions, WGAN often manifests the pathological phenomenon of mode collapse. This results in generated samples that converge to a restricted set of outputs and fail to adequately capture the tail behaviors of the true distribution. Such limitations can lead to serious downstream consequences. To this end, we propose the Penalized Optimal Transport Network (POTNet), a versatile deep generative model based on the marginally-penalized Wasserstein (MPW) distance. Through the MPW distance, POTNet effectively leverages low-dimensional marginal information to guide the overall alignment of joint distributions. Furthermore, our primal-based framework enables direct evaluation of the MPW distance, thus eliminating the need for a critic network. This formulation circumvents training instabilities inherent in adversarial approaches and avoids the need for extensive parameter tuning. We derive a non-asymptotic bound on the generalization error of the MPW loss and establish convergence rates of the generative distribution learned by POTNet. Our theoretical analysis together with extensive empirical evaluations demonstrate the superior performance of POTNet in accurately capturing underlying data structures, including their tail behaviors and minor modalities. Moreover, our model achieves orders of magnitude speedup during the sampling stage compared to state-of-the-art alternatives, which enables computationally efficient large-scale synthetic data generation.
