Table of Contents
Fetching ...

Monotone Generative Modeling via a Gromov-Monge Embedding

Wonjun Lee, Yifei Yang, Dongmian Zou, Gilad Lerman

TL;DR

This work addresses generative modeling by embedding the data distribution into a low-dimensional latent space via a geometry-preserving encoder, then transporting a reference distribution to the embedded one using an optimal transport-based objective. The authors introduce the Gromov-Monge Embedding (GME) cost to regularize the encoder so it preserves geometry, derive that the generator is $c$-cyclically monotone, and show the discriminator’s modulus of continuity improves with geometry preservation. The resulting GMEGAN framework achieves high-quality image generation with strong robustness to mode collapse and training instability, outperforming several GAN-based and encoder-based baselines on synthetic and real data (e.g., CIFAR-10, Tiny ImageNet). The work combines Gromov-Wasserstein-inspired geometry with OT in latent space to provide theoretical guarantees and practical improvements for generative modeling.

Abstract

Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the underlying geometry, and then optimally transports a reference measure to the embedded distribution. We prove three key properties of our method: 1) The encoder preserves the geometry of the underlying data; 2) The generator is $c$-cyclically monotone, where $c$ is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.

Monotone Generative Modeling via a Gromov-Monge Embedding

TL;DR

This work addresses generative modeling by embedding the data distribution into a low-dimensional latent space via a geometry-preserving encoder, then transporting a reference distribution to the embedded one using an optimal transport-based objective. The authors introduce the Gromov-Monge Embedding (GME) cost to regularize the encoder so it preserves geometry, derive that the generator is -cyclically monotone, and show the discriminator’s modulus of continuity improves with geometry preservation. The resulting GMEGAN framework achieves high-quality image generation with strong robustness to mode collapse and training instability, outperforming several GAN-based and encoder-based baselines on synthetic and real data (e.g., CIFAR-10, Tiny ImageNet). The work combines Gromov-Wasserstein-inspired geometry with OT in latent space to provide theoretical guarantees and practical improvements for generative modeling.

Abstract

Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the underlying geometry, and then optimally transports a reference measure to the embedded distribution. We prove three key properties of our method: 1) The encoder preserves the geometry of the underlying data; 2) The generator is -cyclically monotone, where is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.
Paper Structure (29 sections, 6 theorems, 59 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 6 theorems, 59 equations, 14 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

For $p \geq 1$ and $0 < \alpha \leq 1$, if $T: \mathcal{M} \rightarrow Y$ is $(\alpha^{-1})$-bi-Lipschitz, then

Figures (14)

  • Figure 1: Illustration of our method for generation of samples in $\mathcal{M}$ with a latent space $Y$ and a geometry-preserving map $T$.
  • Figure 2: Demonstration of the graphs of the functions $G_0$ (left) and $G_2$ (right). For any $k \in \{0\} \cup \mathbb{N}$ and for $\mu$ and $\nu$ uniform distributions on $[0,1]$, $G_k\#\nu = \mu$.
  • Figure 3: Scatter plots depicting the ratio $\frac{\|T(x)-T(x')\|}{\|x-x'\|}$ versus $\|x-x'\|$ for encoders $T$ obtained by our GME-based method and VAE. They are applied to both the MNIST and CIFAR10 datasets. We use the commonly implemented latent space for these datasets, $\mathbb{R}^{100}$. Clearly, our encoder is geometry-preserving, unlike the VAE encoder.
  • Figure 4: Illustration of a sample from the latent distribution (left), colored by distances from the origin, an input dataset in $\mathbb{R}^{100}$ with 9 Gaussians (middle) and another input dataset in $\mathbb{R}^{500}$ with 12 Gaussians (right), both colored by Gaussian membership.
  • Figure 6: Box and whisker plots of FID scores obtained from CIFAR10 and Tiny ImageNet using 11 different NN architectures.
  • ...and 9 more figures

Theorems & Definitions (13)

  • Proposition 1
  • Definition 1: $c$-cyclical monotonicity santambrogio2015optimal
  • Theorem 1
  • proof
  • Remark 1
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3: $c_T$-Cyclical Monotonicity of $G$
  • ...and 3 more