Table of Contents
Fetching ...

A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

Zhongliang Guo, Chun Tong Lei, Lei Fang, Shuai Zhao, Yifei Qian, Jingyu Lin, Zeyu Wang, Cunjian Chen, Ognjen Arandjelović, Chun Pong Lau

TL;DR

This work tackles the risk of unauthorized LDM-based image editing by proposing a Posterior Collapse Attack (PCA) that targets the VAE encoder. By inducing two collapse modes in the VAE posterior—diffusion collapse for content preservation and concentration collapse for disruption—the authors present a unified KL-based loss that can switch objectives with a single parameter. PCA requires minimal white-box information (less than 4% of parameters) and offers strong transferability across VAE-based LDM variants, achieving superior protection efficiency compared to existing baselines. The approach is validated on multiple SD models, prompts, and defenses, highlighting its practical impact for safeguarding digital assets in rapid advances of generative AI.

Abstract

Recent advancements in Latent Diffusion Models (LDMs) have revolutionized image synthesis and manipulation, raising significant concerns about data misappropriation and intellectual property infringement. While adversarial attacks have been extensively explored as a protective measure against such misuse of generative AI, current approaches are severely limited by their heavy reliance on model-specific knowledge and substantial computational costs. Drawing inspiration from the posterior collapse phenomenon observed in VAE training, we propose the Posterior Collapse Attack (PCA), a novel framework for protecting images from unauthorized manipulation. Through comprehensive theoretical analysis and empirical validation, we identify two distinct collapse phenomena during VAE inference: diffusion collapse and concentration collapse. Based on this discovery, we design a unified loss function that can flexibly achieve both types of collapse through parameter adjustment, each corresponding to different protection objectives in preventing image manipulation. Our method significantly reduces dependence on model-specific knowledge by requiring access to only the VAE encoder, which constitutes less than 4\% of LDM parameters. Notably, PCA achieves prompt-invariant protection by operating on the VAE encoder before text conditioning occurs, eliminating the need for empty prompt optimization required by existing methods. This minimal requirement enables PCA to maintain adequate transferability across various VAE-based LDM architectures while effectively preventing unauthorized image editing. Extensive experiments show PCA outperforms existing techniques in protection effectiveness, computational efficiency (runtime and VRAM), and generalization across VAE-based LDM variants. Our code is available at https://github.com/ZhongliangGuo/PosteriorCollapseAttack.

A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

TL;DR

This work tackles the risk of unauthorized LDM-based image editing by proposing a Posterior Collapse Attack (PCA) that targets the VAE encoder. By inducing two collapse modes in the VAE posterior—diffusion collapse for content preservation and concentration collapse for disruption—the authors present a unified KL-based loss that can switch objectives with a single parameter. PCA requires minimal white-box information (less than 4% of parameters) and offers strong transferability across VAE-based LDM variants, achieving superior protection efficiency compared to existing baselines. The approach is validated on multiple SD models, prompts, and defenses, highlighting its practical impact for safeguarding digital assets in rapid advances of generative AI.

Abstract

Recent advancements in Latent Diffusion Models (LDMs) have revolutionized image synthesis and manipulation, raising significant concerns about data misappropriation and intellectual property infringement. While adversarial attacks have been extensively explored as a protective measure against such misuse of generative AI, current approaches are severely limited by their heavy reliance on model-specific knowledge and substantial computational costs. Drawing inspiration from the posterior collapse phenomenon observed in VAE training, we propose the Posterior Collapse Attack (PCA), a novel framework for protecting images from unauthorized manipulation. Through comprehensive theoretical analysis and empirical validation, we identify two distinct collapse phenomena during VAE inference: diffusion collapse and concentration collapse. Based on this discovery, we design a unified loss function that can flexibly achieve both types of collapse through parameter adjustment, each corresponding to different protection objectives in preventing image manipulation. Our method significantly reduces dependence on model-specific knowledge by requiring access to only the VAE encoder, which constitutes less than 4\% of LDM parameters. Notably, PCA achieves prompt-invariant protection by operating on the VAE encoder before text conditioning occurs, eliminating the need for empty prompt optimization required by existing methods. This minimal requirement enables PCA to maintain adequate transferability across various VAE-based LDM architectures while effectively preventing unauthorized image editing. Extensive experiments show PCA outperforms existing techniques in protection effectiveness, computational efficiency (runtime and VRAM), and generalization across VAE-based LDM variants. Our code is available at https://github.com/ZhongliangGuo/PosteriorCollapseAttack.
Paper Structure (36 sections, 9 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 36 sections, 9 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Comparison of our proposed method with AdvDM advdm, PhotoGuard photoguard, MIST mist, and SDS sds. Our method only requires access to the encoder of the VAE, achieving equal or superior performance on multi scenarios (Fig. \ref{['fig:demo']}) with comparable image editing semantic degrade, lower runtime, and less occupied VRAM. All runtime and VRAM measurements were conducted on a NVIDIA RTX 3090 with $T=40$ iterations and 512×512 resolution.
  • Figure 3: Comparison of our method with other baselines subject to different Objectives. The first column shows editing prompts. The second column shows the original input images and expected edited outputs. The subsequent columns show images protected by different methods and their corresponding outputs. Notably, $x$ refers to the original image will be edited by LDM; $f(\cdot)$ refers to the LDM-based image editing.
  • Figure 4: Qualitative comparison of image editing results. The first row is the original image and protected images. For the rest, each row shows different methods applied to a pineapple image under various editing prompts and protection objectives. Notably, $x$ refers to the original image will be edited by LDM; $f(\cdot)$ refers to the LDM-based image editing.
  • Figure 5: Qualitative comparison of image editing results. The first row is the original image and protected images. For the rest, each row shows different methods applied to a portrait photo under various editing prompts and protection objectives. Notably, $x$ refers to the original image will be edited by LDM; $f(\cdot)$ refers to the LDM-based image editing.
  • Figure 6: Transferability results on SD2.0 and SDXL. White-box uses the target model's VAE; black-box uses SD1.5's pretrained weights as surrogate. (Left) Objective 1 results. (Right) Objective 2 results. $\uparrow/\downarrow$ indicate higher/lower is better. All subplots share the same axes: x-axis shows baseline methods; y-axis shows IQA scores.
  • ...and 3 more figures