Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

Jaehoon Hahm; Junho Lee; Sunghyun Kim; Joonseok Lee

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

Jaehoon Hahm, Junho Lee, Sunghyun Kim, Joonseok Lee

TL;DR

This paper tackles entanglement in the latent space of diffusion models by introducing Isometric Diffusion, which regularizes the mapping from latent space ${\mathcal{X}}$ to a semantic space ${\mathcal{H}}$ to be near an isometry. It leverages a spherical approximation of ${\mathcal{X}}$, defines a scaled isometry condition, and introduces an isometry loss ${\mathcal{L}_{\text{iso}}}$ that uses a stochastic trace estimator to regularize the encoder of the score model across diffusion timesteps. The method yields a more disentangled latent space, enabling smoother interpolations, more accurate inversions, and more controllable linear editing, as demonstrated on multiple datasets with ablations showing the importance of the Riemannian metric choice and timing of the regularization. Overall, Isometric Diffusion offers a geometry-aware enhancement to diffusion models that improves latent interpretability and manipulation without substantially sacrificing generation quality, with potential extensions to conditional generation and video applications.

Abstract

The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold. This approach allows diffusion models to learn a more disentangled latent space, which enables smoother interpolation, more accurate inversion, and more precise control over attributes directly in the latent space. Our extensive experiments consisting of image interpolations, image inversions, and linear editing show the effectiveness of our method.

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

TL;DR

This paper tackles entanglement in the latent space of diffusion models by introducing Isometric Diffusion, which regularizes the mapping from latent space

to a semantic space

to be near an isometry. It leverages a spherical approximation of

, defines a scaled isometry condition, and introduces an isometry loss

that uses a stochastic trace estimator to regularize the encoder of the score model across diffusion timesteps. The method yields a more disentangled latent space, enabling smoother interpolations, more accurate inversions, and more controllable linear editing, as demonstrated on multiple datasets with ablations showing the importance of the Riemannian metric choice and timing of the regularization. Overall, Isometric Diffusion offers a geometry-aware enhancement to diffusion models that improves latent interpretability and manipulation without substantially sacrificing generation quality, with potential extensions to conditional generation and video applications.

Abstract

Paper Structure (29 sections, 15 equations, 16 figures, 4 tables)

This paper contains 29 sections, 15 equations, 16 figures, 4 tables.

Introduction
Background
Diffusion Model
Analysis on Latent Space $\mathcal{X}$ of Diffusion Models
Intermediate Latent Space $\mathcal{H}$ as a Semantic Space
Path Length Regularizer
Isometric Representation Learning for Diffusion Models
Spherical Approximation of the Latent Space
Isometric Mappings
Isometry Loss for Diffusion Models
Computational Considerations
Experiments
Experimental Settings
Quantitative Comparison
Analysis on the Disentanglement of Latent Space $\mathcal{X}$
...and 14 more sections

Figures (16)

Figure 1: An illustration of latent traversal between two latents ${\bm{x}}$ and ${\bm{x}}'$. Top: naive linear interpolation (Lerp) assuming Euclidean space, Mid: spherical interpolation (Slerp) between ${\bm{x}}$ and ${\bm{x}}'$ (direction ${\bm{x}} \rightarrow {\bm{x}}'$ is entangled with unwanted gender axis inducing abrupt changes), Bottom: Slerp with the same latents with our Isometric Diffusion resolving unwanted entanglement.
Figure 2: Illustration of $\mathcal{X}, \mathcal{H}$, and local coordinates of those two manifolds. Our isometric loss regularizes the encoder of the score model to map a spherical trajectory in $\mathcal{X}$ to a linear trajectory in $\mathcal{H}$, preserving a geodesic in $\mathcal{X}$ to a geodesic in $\mathcal{H}$. $e_{\tilde{\theta}}$ denotes the encoder of score model $s_\theta$. $\Pi_{n-1}$ and $\Phi$ are charts mapping from Riemmanian manifolds to local coordinate spaces. ${\bm{z}}, {\bm{z}}'$ denote the local coordinates of $\mathcal{X}, \mathcal{H}$, respectively.
Figure 3: (a) The input $S^2$ manifold. (b--d) Mapped contours in latent coordinates learned by an autoencoder; (b) with reconstruction loss only, (c) with isometric loss assuming navie Euclidean geometry, and (d) with our isometric loss considering $S^2$ geometry.
Figure 4: RTL with various $\lambda_{\text{iso}}$. A stronger regularization reduces the ratio to 1, flattening the trajectories in $\mathcal{H}$.
Figure 5: Image interpolation. Examples of latent traversal between two latents ${\bm{x}}$ and ${\bm{x}}'$ with DDPM ho2020ddpm, trained on $256 \times 256$ CelebA-HQ. We observe unnecessary changes of female $\rightarrow$ male in the baseline, while smoother transitions in ours. For quantitative support, we plot LPIPS distance between each adjacent frames (Blue: Base, Orange: Ours).
...and 11 more figures

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

TL;DR

Abstract

Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)