Monotone Generative Modeling via a Gromov-Monge Embedding
Wonjun Lee, Yifei Yang, Dongmian Zou, Gilad Lerman
TL;DR
This work addresses generative modeling by embedding the data distribution into a low-dimensional latent space via a geometry-preserving encoder, then transporting a reference distribution to the embedded one using an optimal transport-based objective. The authors introduce the Gromov-Monge Embedding (GME) cost to regularize the encoder so it preserves geometry, derive that the generator is $c$-cyclically monotone, and show the discriminator’s modulus of continuity improves with geometry preservation. The resulting GMEGAN framework achieves high-quality image generation with strong robustness to mode collapse and training instability, outperforming several GAN-based and encoder-based baselines on synthetic and real data (e.g., CIFAR-10, Tiny ImageNet). The work combines Gromov-Wasserstein-inspired geometry with OT in latent space to provide theoretical guarantees and practical improvements for generative modeling.
Abstract
Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the underlying geometry, and then optimally transports a reference measure to the embedded distribution. We prove three key properties of our method: 1) The encoder preserves the geometry of the underlying data; 2) The generator is $c$-cyclically monotone, where $c$ is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.
