Table of Contents
Fetching ...

Semi-Supervised Generative Learning via Latent Space Distribution Matching

Kwong Yu Chong, Long Feng

TL;DR

By extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches and establishes non-asymptotic error bounds and demonstrates a key benefit of unpaired data: enhanced geometric fidelity in generated outputs.

Abstract

We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and (ii) performing joint distribution matching in this space via the 1-Wasserstein distance, using only paired data. This two-step approach minimizes an upper bound on the 1-Wasserstein distance between joint distributions, reducing reliance on scarce paired samples while enabling fast one-step generation. Theoretically, we establish non-asymptotic error bounds and demonstrate a key benefit of unpaired data: enhanced geometric fidelity in generated outputs. Furthermore, by extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches. Notably, Latent Diffusion Models (LDMs) can be viewed as a variant of LSDM, in which joint distribution matching is achieved indirectly via score matching. Consequently, our results also provide theoretical insights into the consistency of LDMs. Empirical evaluations on real-world image tasks, including class-conditional generation and image super-resolution, demonstrate the effectiveness of LSDM in leveraging unpaired data to enhance generation quality.

Semi-Supervised Generative Learning via Latent Space Distribution Matching

TL;DR

By extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches and establishes non-asymptotic error bounds and demonstrates a key benefit of unpaired data: enhanced geometric fidelity in generated outputs.

Abstract

We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and (ii) performing joint distribution matching in this space via the 1-Wasserstein distance, using only paired data. This two-step approach minimizes an upper bound on the 1-Wasserstein distance between joint distributions, reducing reliance on scarce paired samples while enabling fast one-step generation. Theoretically, we establish non-asymptotic error bounds and demonstrate a key benefit of unpaired data: enhanced geometric fidelity in generated outputs. Furthermore, by extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches. Notably, Latent Diffusion Models (LDMs) can be viewed as a variant of LSDM, in which joint distribution matching is achieved indirectly via score matching. Consequently, our results also provide theoretical insights into the consistency of LDMs. Empirical evaluations on real-world image tasks, including class-conditional generation and image super-resolution, demonstrate the effectiveness of LSDM in leveraging unpaired data to enhance generation quality.
Paper Structure (22 sections, 10 theorems, 30 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 22 sections, 10 theorems, 30 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Theorem 2.1

Let $E: \mathcal{Y} \to \mathcal{Z}$ be an encoder. Suppose the generator $G$ has the form $G = D\circ H$, where $D: \mathcal{Z} \to \mathbb{R}^q$ and $H: \mathcal{X} \times \mathbb{R}^d \to \mathcal{Z}$. Then, the 1-Wasserstein distance between the joint distributions of $\left(X, G(X,\eta)\ri Consequently, if $(D,E,H)$ is a triplet that satisfies then the generator achieves conditional dis

Figures (8)

  • Figure 1: An Illustrative Figure of LSDM.
  • Figure 2: Qualitative result on the MNIST dataset ($n=250$, $N=29{,}750$, $m=13$).
  • Figure 3: Ablation study of LSDM on the MNIST dataset. Left: Total sample size $n + N$ is fixed at $3{,}000$ while $n$ varies. Right: Number of paired samples $n$ is fixed at $250$ while the total sample size $n + N$ varies. Model architecture and training parameters are fixed within each study but differ between the two studies.
  • Figure 4: Left: Qualitative changes of cLSDM on the MNIST dataset for fixed $n=250$ and varying $N$. Right: UMAP-reduced latent space of an autoencoder on the MNIST dataset. Red contours represent the distribution of generated latent codes $H(X, \eta)$ in cLSDM.
  • Figure 5: Quantitative results on the CelebA dataset for varying $n$ and $N$ ($m_c=4$, cLSDM).
  • ...and 3 more figures

Theorems & Definitions (16)

  • Definition 2.1: 1-Wasserstein Distance
  • Theorem 2.1: Risk Decomposition for the Composite Generator
  • Proposition 2.2: Existence of $H$ for Arbitrary $\widehat{D}, \widehat{E}$
  • Theorem 2.3
  • Proposition 4.1: Bound on the 1‑Wasserstein Distance by Score Matching Objective
  • Proposition 4.2: Bound on the 1‑Wasserstein Distance by f‑divergences
  • Definition 5.1: ReLU Neural Networks
  • Definition 5.2: Hölder Class
  • Theorem 5.1: Reconstruction Error Bound
  • Corollary 1
  • ...and 6 more