Table of Contents
Fetching ...

DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing

Tarik Can Ozden, Ozgur Kara, Oguzhan Akcin, Kerem Zaman, Shashank Srivastava, Sandeep P. Chinchali, James M. Rehg

TL;DR

DiffVax tackles diffusion-based image editing by learning an optimization-free immunizer that generates imperceptible perturbations to immunize content. It trains a feed-forward model with a two-term loss to both preserve visual quality and disrupt editing attempts, enabling millisecond-scale per-image protection and extending naturally to video. The approach demonstrates strong generalization to unseen content and models, robust performance under counter-attacks, and favorable user-perceived realism, while maintaining scalability and low memory usage. The work establishes a scalable, real-time defense framework with broad applicability across editing tools and content types, and outlines directions toward universal cross-architecture immunization and temporally aware video protection.

Abstract

Current image immunization defense techniques against diffusion-based editing embed imperceptible noise into target images to disrupt editing models. However, these methods face scalability challenges, as they require time-consuming optimization for each image separately, taking hours for small batches. To address these challenges, we introduce DiffVax, a scalable, lightweight, and optimization-free framework for image immunization, specifically designed to prevent diffusion-based editing. Our approach enables effective generalization to unseen content, reducing computational costs and cutting immunization time from days to milliseconds, achieving a speedup of 250,000x. This is achieved through a loss term that ensures the failure of editing attempts and the imperceptibility of the perturbations. Extensive qualitative and quantitative results demonstrate that our model is scalable, optimization-free, adaptable to various diffusion-based editing tools, robust against counter-attacks, and, for the first time, effectively protects video content from editing. More details are available in https://diffvax.github.io/ .

DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing

TL;DR

DiffVax tackles diffusion-based image editing by learning an optimization-free immunizer that generates imperceptible perturbations to immunize content. It trains a feed-forward model with a two-term loss to both preserve visual quality and disrupt editing attempts, enabling millisecond-scale per-image protection and extending naturally to video. The approach demonstrates strong generalization to unseen content and models, robust performance under counter-attacks, and favorable user-perceived realism, while maintaining scalability and low memory usage. The work establishes a scalable, real-time defense framework with broad applicability across editing tools and content types, and outlines directions toward universal cross-architecture immunization and temporally aware video protection.

Abstract

Current image immunization defense techniques against diffusion-based editing embed imperceptible noise into target images to disrupt editing models. However, these methods face scalability challenges, as they require time-consuming optimization for each image separately, taking hours for small batches. To address these challenges, we introduce DiffVax, a scalable, lightweight, and optimization-free framework for image immunization, specifically designed to prevent diffusion-based editing. Our approach enables effective generalization to unseen content, reducing computational costs and cutting immunization time from days to milliseconds, achieving a speedup of 250,000x. This is achieved through a loss term that ensures the failure of editing attempts and the imperceptibility of the perturbations. Extensive qualitative and quantitative results demonstrate that our model is scalable, optimization-free, adaptable to various diffusion-based editing tools, robust against counter-attacks, and, for the first time, effectively protects video content from editing. More details are available in https://diffvax.github.io/ .

Paper Structure

This paper contains 56 sections, 3 equations, 21 figures, 13 tables, 1 algorithm.

Figures (21)

  • Figure 1: DiffVax is an optimization-free image immunization approach designed to protect images and videos from diffusion-based editing. DiffVax demonstrates robustness across diverse content, providing protection for both in-the-wild (a) unseen images and (b) unseen video content while effectively preventing edits across various editing methods, including inpainting (illustrated with a human in the left column and a non-human foreground object in the right column) and instruction-based edits (right column) with InstructPix2Pix brooks2023instructpix2pix.
  • Figure 2: Comparing DiffVax with existing approaches.(a) An attacker performs malicious editing on an original image. (b) Existing defenses immunize images by solving a costly optimization problem for each image individually, taking over 10 minutes per image. (c)DiffVax enables scalable protection by first training an immunizer model (green box) on a diverse dataset. Once trained, the model can immunize unseen images with a single forward pass, producing effective perturbations in approximately 70 milliseconds per image.
  • Figure 3: Overview of DiffVax. Our end-to-end training framework is illustrated in (a). The training process consists of two stages. In Stage 1, immunization is applied to the training image $\mathbf{I}$. In Stage 2, the immunized image $\mathbf{I}_{\mathrm{im}}$ is edited using a stable diffusion model $\text{SD}(\cdot)$ with the specified text prompt and mask, during which the $\mathcal{L}_\mathrm{noise}$ and $\mathcal{L}_\mathrm{edit}$ are computed. During inference (b), the trained immunizer model generates immunization noise (see Inference Stage 1 in (b)) applied to the original (target) image using an immunization mask. When a malicious user attempts to attack these immunized images with an editing mask, the editing tool (see Inference Stage 2 in (b)) is unable to produce the intended edited content.
  • Figure 4: Qualitative results with DiffVax. Our method effectively immunizes (a) seen images and generalizes to (b) unseen images with diverse text prompts. Additionally, it extends to (c) unseen human videos, demonstrating its adaptability to new content. Furthermore, it supports various poses and perspectives, from full-body shots (a) to close-up face shots (c).
  • Figure 5: Qualitative comparison of edited images across immunization methods. This figure shows the results of different immunization methods: Random Noise, PhotoGuard-E, PhotoGuard-D, DiffusionGuard, and our proposed method, DiffVax. Results for (a) seen and (b) unseen images are shown, with different prompts applied to each (right side). The first column contains the original images, while subsequent columns show the edited outputs under different settings, as depicted on the top. Note that DiffVax is substantially more effective than PhotoGuard-E, -D and DiffusionGuard in degrading the edit.
  • ...and 16 more figures