Table of Contents
Fetching ...

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Qichen Zhao, Shengfang Zhai, Xinjian Bai, Qingni Shen, Qiqi Lin, Yansong Gao, Zhonghai Wu

Abstract

Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners cannot control downstream processing pipelines, and protections optimized for a surrogate model may fail when attackers use mismatched diffusion pipelines. Existing purification methods can weaken protections but often sacrifice image quality and rarely examine architectural mismatch. We introduce a unified post-release purification framework to evaluate protection survivability under model mismatch. We propose two practical purifiers: VAE-Trans, which corrects protected images via latent-space projection, and EditorClean, which performs instruction-guided reconstruction with a Diffusion Transformer to exploit architectural heterogeneity. Both operate without access to protected images or defense internals. Across 2,100 editing tasks and six representative protection methods, EditorClean consistently restores editability. Compared to protected inputs, it improves PSNR by 3-6 dB and reduces FID by 50-70 percent on downstream edits, while outperforming prior purification baselines by about 2 dB PSNR and 30 percent lower FID. Our results reveal a purify-once, edit-freely failure mode: once purification succeeds, the protective signal is largely removed, enabling unrestricted editing. This highlights the need to evaluate protections under model mismatch and design defenses robust to heterogeneous attackers.

Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

Abstract

Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners cannot control downstream processing pipelines, and protections optimized for a surrogate model may fail when attackers use mismatched diffusion pipelines. Existing purification methods can weaken protections but often sacrifice image quality and rarely examine architectural mismatch. We introduce a unified post-release purification framework to evaluate protection survivability under model mismatch. We propose two practical purifiers: VAE-Trans, which corrects protected images via latent-space projection, and EditorClean, which performs instruction-guided reconstruction with a Diffusion Transformer to exploit architectural heterogeneity. Both operate without access to protected images or defense internals. Across 2,100 editing tasks and six representative protection methods, EditorClean consistently restores editability. Compared to protected inputs, it improves PSNR by 3-6 dB and reduces FID by 50-70 percent on downstream edits, while outperforming prior purification baselines by about 2 dB PSNR and 30 percent lower FID. Our results reveal a purify-once, edit-freely failure mode: once purification succeeds, the protective signal is largely removed, enabling unrestricted editing. This highlights the need to evaluate protections under model mismatch and design defenses robust to heterogeneous attackers.
Paper Structure (26 sections, 8 equations, 10 figures, 8 tables)

This paper contains 26 sections, 8 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Schematic overview of our post-release purification setting and two purifiers, VAE-Trans (Section \ref{['sec:vae-trans']}) and EditorClean (Section \ref{['sec:editorclean']}). The Standard image editing pipeline in the figure denotes the defender's surrogate editor used to optimize the protective perturbation. Left (Training): Both purifiers are trained on public image datasets. Right (Inference): Given $x_{\mathrm{adv}}$, the attacker prepends a purifier to obtain a purified image $x_{\mathrm{pur}}$; for EditorClean, we additionally inject light Gaussian noise before reconstruction. The purified image is then passed to a downstream editor for editing.
  • Figure 2: Qualitative comparison of purified image quality across different protection methods and purification strategies (see Table \ref{['tab:purified_quality']}). Each row shows results for one protection method.
  • Figure 3: Qualitative downstream editing results on SD v1.5 Inpainting (left) and SD v2.0 Inpainting (right). Rows correspond to six protection methods, and columns show edits produced from unpurified protected inputs or from protected inputs after different purification strategies (see Table \ref{['tab:edited_result_comparison']}).
  • Figure 4: Real-world editor comparison under model mismatch.
  • Figure 5: Extended qualitative examples complementing Figure \ref{['fig:cross_arch_llm']} with a more complex cartoon image and instruction. Rows show clean and protected inputs (six defenses); columns show edited results from Seedream (Doubao), Qwen-Image, Step1X-Edit, Gemini Pro, and ChatGPT-4o.
  • ...and 5 more figures