Table of Contents
Fetching ...

GMapLatent: Geometric Mapping in Latent Space

Wei Zeng, Xuebin Chang, Jianghao Su, Xiang Gu, Jian Sun, Zongben Xu

TL;DR

GMapLatent tackles cross-domain image generation by addressing latent-space misalignment with a canonical, geometry-aware representation. The method transforms each domain’s latent space into a canonical convex-subdivision domain via barycentric translation, optimal transport merging, and graph-constrained harmonic mapping, then registers the canonical spaces with a diffeomorphic, cluster-constrained map $f = t_2^{-1} \circ o_2^{-1} \circ \phi_2^{-1} \circ h \circ \phi_1 \circ o_1 \circ t_1$ to achieve precise cross-domain generation. Key contributions include the first integration of diffeomorphic geometric mapping into latent space for cross-domain translation, the introduction of interpretable canonical latent representations with curve-to-curve generation, and extensive validation on binary handwritten and color image datasets showing superior accuracy and competitive or better $FID$ compared to state-of-the-art methods. This framework advances domain adaptation by leveraging geometric structure in latent spaces, enabling robust, controllable, and semantically aligned cross-domain generation with potential applicability to large-scale and multimodal settings.

Abstract

Cross-domain generative models based on encoder-decoder AI architectures have attracted much attention in generating realistic images, where domain alignment is crucial for generation accuracy. Domain alignment methods usually deal directly with the initial distribution; however, mismatched or mixed clusters can lead to mode collapse and mixture problems in the decoder, compromising model generalization capabilities. In this work, we innovate a cross-domain alignment and generation model that introduces a canonical latent space representation based on geometric mapping to align the cross-domain latent spaces in a rigorous and precise manner, thus avoiding mode collapse and mixture in the encoder-decoder generation architectures. We name this model GMapLatent. The core of the method is to seamlessly align latent spaces with strict cluster correspondence constraints using the canonical parameterizations of cluster-decorated latent spaces. We first (1) transform the latent space to a canonical parameter domain by composing barycenter translation, optimal transport merging and constrained harmonic mapping, and then (2) compute geometric registration with cluster constraints over the canonical parameter domains. This process realizes a bijective (one-to-one and onto) mapping between newly transformed latent spaces and generates a precise alignment of cluster pairs. Cross-domain generation is then achieved through the aligned latent spaces embedded in the encoder-decoder pipeline. Experiments on gray-scale and color images validate the efficiency, efficacy and applicability of GMapLatent, and demonstrate that the proposed model has superior performance over existing models.

GMapLatent: Geometric Mapping in Latent Space

TL;DR

GMapLatent tackles cross-domain image generation by addressing latent-space misalignment with a canonical, geometry-aware representation. The method transforms each domain’s latent space into a canonical convex-subdivision domain via barycentric translation, optimal transport merging, and graph-constrained harmonic mapping, then registers the canonical spaces with a diffeomorphic, cluster-constrained map to achieve precise cross-domain generation. Key contributions include the first integration of diffeomorphic geometric mapping into latent space for cross-domain translation, the introduction of interpretable canonical latent representations with curve-to-curve generation, and extensive validation on binary handwritten and color image datasets showing superior accuracy and competitive or better compared to state-of-the-art methods. This framework advances domain adaptation by leveraging geometric structure in latent spaces, enabling robust, controllable, and semantically aligned cross-domain generation with potential applicability to large-scale and multimodal settings.

Abstract

Cross-domain generative models based on encoder-decoder AI architectures have attracted much attention in generating realistic images, where domain alignment is crucial for generation accuracy. Domain alignment methods usually deal directly with the initial distribution; however, mismatched or mixed clusters can lead to mode collapse and mixture problems in the decoder, compromising model generalization capabilities. In this work, we innovate a cross-domain alignment and generation model that introduces a canonical latent space representation based on geometric mapping to align the cross-domain latent spaces in a rigorous and precise manner, thus avoiding mode collapse and mixture in the encoder-decoder generation architectures. We name this model GMapLatent. The core of the method is to seamlessly align latent spaces with strict cluster correspondence constraints using the canonical parameterizations of cluster-decorated latent spaces. We first (1) transform the latent space to a canonical parameter domain by composing barycenter translation, optimal transport merging and constrained harmonic mapping, and then (2) compute geometric registration with cluster constraints over the canonical parameter domains. This process realizes a bijective (one-to-one and onto) mapping between newly transformed latent spaces and generates a precise alignment of cluster pairs. Cross-domain generation is then achieved through the aligned latent spaces embedded in the encoder-decoder pipeline. Experiments on gray-scale and color images validate the efficiency, efficacy and applicability of GMapLatent, and demonstrate that the proposed model has superior performance over existing models.

Paper Structure

This paper contains 38 sections, 1 theorem, 12 equations, 13 figures, 3 tables.

Key Result

Theorem 1

Let $\Omega$ be a compact convex domain in $\mathbb{R}^{d}$, $\left\lbrace p_{1},p_{2}, \dots ,p_{n} \right\rbrace$ be a set of $n$ distinct points in $\mathbb{R}^{d}$, and $f:\Omega \longrightarrow \mathbb{R}$ be a continuous density function. For any discrete probability measures on $n$ points, $\ where the $\tau$-volume $\tau(W_{i}(\textbf{h}))$ denotes the probability measure of each power cel

Figures (13)

  • Figure 1: Autoencoder architecture for Chinese MNIST dataset. $f_\theta$ and $g_\varepsilon$ represent the encoding and decoding maps, respectively, where $\theta$ and $\varepsilon$ are their corresponding network parameters.
  • Figure 2: T-SNE embedding results for two handwritten digit datasets ChineseMnistdeng2012mnist.
  • Figure 3: Canonical representation of latent space.
  • Figure 4: Workflow of generation from Chinese MNIST to Arabic MNIST. Given a latent code from source latent space, $f$ is the desired final mapping to generate the corresponding latent code in target latent space. $t_i$ - latent preprocessing (translation); $o_i$ - optimal transport merging; $\phi_i$ - canonical graph-constrained harmonic mapping (straightening); $h$ - graph-constrained harmonic registration (alignment).
  • Figure 5: Workflow of building canonical structural representation of latent space and generative model for a single domain (one dataset).
  • ...and 8 more figures

Theorems & Definitions (1)

  • Theorem 1: Gu et al. gu2013variational