Isometric Representation Learning for Disentangled Latent Space of Diffusion Models
Jaehoon Hahm, Junho Lee, Sunghyun Kim, Joonseok Lee
TL;DR
This paper tackles entanglement in the latent space of diffusion models by introducing Isometric Diffusion, which regularizes the mapping from latent space ${\mathcal{X}}$ to a semantic space ${\mathcal{H}}$ to be near an isometry. It leverages a spherical approximation of ${\mathcal{X}}$, defines a scaled isometry condition, and introduces an isometry loss ${\mathcal{L}_{\text{iso}}}$ that uses a stochastic trace estimator to regularize the encoder of the score model across diffusion timesteps. The method yields a more disentangled latent space, enabling smoother interpolations, more accurate inversions, and more controllable linear editing, as demonstrated on multiple datasets with ablations showing the importance of the Riemannian metric choice and timing of the regularization. Overall, Isometric Diffusion offers a geometry-aware enhancement to diffusion models that improves latent interpretability and manipulation without substantially sacrificing generation quality, with potential extensions to conditional generation and video applications.
Abstract
The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold. This approach allows diffusion models to learn a more disentangled latent space, which enables smoother interpolation, more accurate inversion, and more precise control over attributes directly in the latent space. Our extensive experiments consisting of image interpolations, image inversions, and linear editing show the effectiveness of our method.
