Latent Diffusion Inversion Requires Understanding the Latent Space
Mingxing Rao, Bowen Qu, Daniel Moyer
TL;DR
Latent Diffusion Inversion Requires Understanding the Latent Space shows that memorization in latent-domain diffusion models is not evenly distributed across latent codes and is heavily influenced by the decoder’s geometry. The authors introduce a decoder pullback metric $G(z)=J_D(z)^{\top}J_D(z)$ to quantify local distortion and define a per-dimension influence Infl$_i$(z) to identify which latent coordinates drive memorization. By masking out the least-memorizing latent dimensions before computing attack statistics, they achieve consistent improvements in score-based membership inference across six datasets and multiple attack methods, highlighting the practical privacy implications. The work emphasizes that encoder–decoder geometry, rather than diffusion dynamics alone, governs memorization in LDMs and motivates geometry-aware analyses and defenses for latent-space inversion threats.
Abstract
The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7\% and substantial increases in TPR@1\%FPR (6.42\%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokémon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.
