Monotone Generative Modeling via a Gromov-Monge Embedding

Wonjun Lee; Yifei Yang; Dongmian Zou; Gilad Lerman

Monotone Generative Modeling via a Gromov-Monge Embedding

Wonjun Lee, Yifei Yang, Dongmian Zou, Gilad Lerman

TL;DR

This work addresses generative modeling by embedding the data distribution into a low-dimensional latent space via a geometry-preserving encoder, then transporting a reference distribution to the embedded one using an optimal transport-based objective. The authors introduce the Gromov-Monge Embedding (GME) cost to regularize the encoder so it preserves geometry, derive that the generator is $c$-cyclically monotone, and show the discriminator’s modulus of continuity improves with geometry preservation. The resulting GMEGAN framework achieves high-quality image generation with strong robustness to mode collapse and training instability, outperforming several GAN-based and encoder-based baselines on synthetic and real data (e.g., CIFAR-10, Tiny ImageNet). The work combines Gromov-Wasserstein-inspired geometry with OT in latent space to provide theoretical guarantees and practical improvements for generative modeling.

Abstract

Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the underlying geometry, and then optimally transports a reference measure to the embedded distribution. We prove three key properties of our method: 1) The encoder preserves the geometry of the underlying data; 2) The generator is $c$-cyclically monotone, where $c$ is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.

Monotone Generative Modeling via a Gromov-Monge Embedding

TL;DR

-cyclically monotone, and show the discriminator’s modulus of continuity improves with geometry preservation. The resulting GMEGAN framework achieves high-quality image generation with strong robustness to mode collapse and training instability, outperforming several GAN-based and encoder-based baselines on synthetic and real data (e.g., CIFAR-10, Tiny ImageNet). The work combines Gromov-Wasserstein-inspired geometry with OT in latent space to provide theoretical guarantees and practical improvements for generative modeling.

Abstract

-cyclically monotone, where

is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.

Paper Structure (29 sections, 6 theorems, 59 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 29 sections, 6 theorems, 59 equations, 14 figures, 1 table, 1 algorithm.

Introduction
Related Works
Encoder-based GAN
Local isometry encoders
Generative models via OT
GW distance
Contribution
Structure of the Rest of the paper
Motivating Our Method
Notation and Conventions
Review of the Mathematical Frameworks of GANs and Encoder-based GANs
GAN
Encoder-based GAN
Addressing Mode Collapse via a Specialized Encoder
Addressing training instability by $c$-cyclical monotonicity
...and 14 more sections

Key Result

Proposition 1

For $p \geq 1$ and $0 < \alpha \leq 1$, if $T: \mathcal{M} \rightarrow Y$ is $(\alpha^{-1})$-bi-Lipschitz, then

Figures (14)

Figure 1: Illustration of our method for generation of samples in $\mathcal{M}$ with a latent space $Y$ and a geometry-preserving map $T$.
Figure 2: Demonstration of the graphs of the functions $G_0$ (left) and $G_2$ (right). For any $k \in \{0\} \cup \mathbb{N}$ and for $\mu$ and $\nu$ uniform distributions on $[0,1]$, $G_k\#\nu = \mu$.
Figure 3: Scatter plots depicting the ratio $\frac{\|T(x)-T(x')\|}{\|x-x'\|}$ versus $\|x-x'\|$ for encoders $T$ obtained by our GME-based method and VAE. They are applied to both the MNIST and CIFAR10 datasets. We use the commonly implemented latent space for these datasets, $\mathbb{R}^{100}$. Clearly, our encoder is geometry-preserving, unlike the VAE encoder.
Figure 4: Illustration of a sample from the latent distribution (left), colored by distances from the origin, an input dataset in $\mathbb{R}^{100}$ with 9 Gaussians (middle) and another input dataset in $\mathbb{R}^{500}$ with 12 Gaussians (right), both colored by Gaussian membership.
Figure 6: Box and whisker plots of FID scores obtained from CIFAR10 and Tiny ImageNet using 11 different NN architectures.
...and 9 more figures

Theorems & Definitions (13)

Proposition 1
Definition 1: $c$-cyclical monotonicity santambrogio2015optimal
Theorem 1
proof
Remark 1
Lemma 1
proof
Theorem 2
proof
Theorem 3: $c_T$-Cyclical Monotonicity of $G$
...and 3 more

Monotone Generative Modeling via a Gromov-Monge Embedding

TL;DR

Abstract

Monotone Generative Modeling via a Gromov-Monge Embedding

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (13)