Max-Sliced Wasserstein Distance and its use for GANs
Ishan Deshpande, Yuan-Ting Hu, Ruoyu Sun, Ayis Pyrros, Nasir Siddiqui, Sanmi Koyejo, Zhizhen Zhao, David Forsyth, Alexander Schwing
TL;DR
The paper analyzes the generalization and sample-efficiency of distance metrics used in GANs, showing that the sliced Wasserstein distance $\tilde{W}_2$ generalizes well for Gaussian families while the full Wasserstein distance $W_2$ does not. To address projection-coverage limits, it introduces the max-sliced Wasserstein distance $\max-\tilde{W}_2$ and proves it retains polynomial-sample generalization. It furthermore proposes a practical max-sliced GAN algorithm that alternates discriminator and projection-direction optimization, achieving high- resolution image generation with far fewer projections. Empirically, the method yields competitive unsupervised word translation results and high-quality 256×256 image samples (CelebA-HQ, LSUN) while significantly reducing computational demands. Overall, the approach offers improved stability and efficiency for training GANs on high-dimensional data with strong practical impact for scalable generative modeling.
Abstract
Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning. However, to model high-dimensional distributions, sequential training and stacked architectures are common, increasing the number of tunable hyper-parameters as well as the training time. Nonetheless, the sample complexity of the distance metrics remains one of the factors affecting GAN training. We first show that the recently proposed sliced Wasserstein distance has compelling sample complexity properties when compared to the Wasserstein distance. To further improve the sliced Wasserstein distance we then analyze its `projection complexity' and develop the max-sliced Wasserstein distance which enjoys compelling sample complexity while reducing projection complexity, albeit necessitating a max estimation. We finally illustrate that the proposed distance trains GANs on high-dimensional images up to a resolution of 256x256 easily.
