Table of Contents
Fetching ...

Latent Diffusion Inversion Requires Understanding the Latent Space

Mingxing Rao, Bowen Qu, Daniel Moyer

TL;DR

Latent Diffusion Inversion Requires Understanding the Latent Space shows that memorization in latent-domain diffusion models is not evenly distributed across latent codes and is heavily influenced by the decoder’s geometry. The authors introduce a decoder pullback metric $G(z)=J_D(z)^{\top}J_D(z)$ to quantify local distortion and define a per-dimension influence Infl$_i$(z) to identify which latent coordinates drive memorization. By masking out the least-memorizing latent dimensions before computing attack statistics, they achieve consistent improvements in score-based membership inference across six datasets and multiple attack methods, highlighting the practical privacy implications. The work emphasizes that encoder–decoder geometry, rather than diffusion dynamics alone, governs memorization in LDMs and motivates geometry-aware analyses and defenses for latent-space inversion threats.

Abstract

The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7\% and substantial increases in TPR@1\%FPR (6.42\%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokémon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.

Latent Diffusion Inversion Requires Understanding the Latent Space

TL;DR

Latent Diffusion Inversion Requires Understanding the Latent Space shows that memorization in latent-domain diffusion models is not evenly distributed across latent codes and is heavily influenced by the decoder’s geometry. The authors introduce a decoder pullback metric to quantify local distortion and define a per-dimension influence Infl(z) to identify which latent coordinates drive memorization. By masking out the least-memorizing latent dimensions before computing attack statistics, they achieve consistent improvements in score-based membership inference across six datasets and multiple attack methods, highlighting the practical privacy implications. The work emphasizes that encoder–decoder geometry, rather than diffusion dynamics alone, governs memorization in LDMs and motivates geometry-aware analyses and defenses for latent-space inversion threats.

Abstract

The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7\% and substantial increases in TPR@1\%FPR (6.42\%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokémon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.

Paper Structure

This paper contains 35 sections, 22 equations, 4 figures, 5 tables, 2 algorithms.

Figures (4)

  • Figure 1: Left: Points mapped to high-distortion regions of the latent space (red) are more vulnerable to inversion and exhibit stronger memorization compared to those in low-distortion regions (blue). Right: Latent grid lines and their decoded counterparts show that high-distortion latent dimensions (red) induce larger changes in data space, whereas low-distortion dimensions (blue) induce smaller changes. This reflects that latent diffusion models memorize high-distortion dimensions more strongly than low-distortion ones.
  • Figure 2: Each violin plot shows the distribution of the decoder-induced local distortion, with representative samples displayed at high-distortion (top) and low-distortion (bottom) regions. CelebA shows an almost uniform distortion distribution, while CIFAR-10 and ImageNet-1K display larger variation across samples. Refer to the appendix for additional samples.
  • Figure 3: Membership inference AUC (measured by SimA) at different times for four datasets, stratified by quartiles of local distortion of decoder (0--25%, 25--50%, 50--75%, 75--100%). Quartile thresholds are computed jointly over members and held-out samples. Attacks are evaluated separately within each quartile, and a random baseline (mean and variance over ten trials) is shown for comparison. Higher-distortion quartiles consistently yield higher attack AUC.
  • Figure 4: More examples of images in high (low) distortion region