Table of Contents
Fetching ...

Raising the Cost of Malicious AI-Powered Image Editing

Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, Aleksander Madry

TL;DR

This work tackles the risk of malicious AI-powered image editing enabled by diffusion models by proposing image immunization through imperceptible adversarial perturbations. It develops two attack strategies—encoder and diffusion attacks—that render immunized images resistant to realistic edits and variations guided by text prompts. Empirical results show that immunization degrades the quality and similarity of edits, with diffusion-based perturbations generally more effective. The authors advocate a techno-policy approach, encouraging diffusion-model developers to provide forward-compatible APIs and collaboration to sustain these defenses against future model iterations.

Abstract

We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models. This immunization relies on injection of imperceptible adversarial perturbations designed to disrupt the operation of the targeted diffusion models, forcing them to generate unrealistic images. We provide two methods for crafting such perturbations, and then demonstrate their efficacy. Finally, we discuss a policy component necessary to make our approach fully effective and practical -- one that involves the organizations developing diffusion models, rather than individual users, to implement (and support) the immunization process.

Raising the Cost of Malicious AI-Powered Image Editing

TL;DR

This work tackles the risk of malicious AI-powered image editing enabled by diffusion models by proposing image immunization through imperceptible adversarial perturbations. It develops two attack strategies—encoder and diffusion attacks—that render immunized images resistant to realistic edits and variations guided by text prompts. Empirical results show that immunization degrades the quality and similarity of edits, with diffusion-based perturbations generally more effective. The authors advocate a techno-policy approach, encouraging diffusion-model developers to provide forward-compatible APIs and collaboration to sustain these defenses against future model iterations.

Abstract

We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models. This immunization relies on injection of imperceptible adversarial perturbations designed to disrupt the operation of the targeted diffusion models, forcing them to generate unrealistic images. We provide two methods for crafting such perturbations, and then demonstrate their efficacy. Finally, we discuss a policy component necessary to make our approach fully effective and practical -- one that involves the organizations developing diffusion models, rather than individual users, to implement (and support) the immunization process.
Paper Structure (41 sections, 8 equations, 17 figures, 4 tables, 2 algorithms)

This paper contains 41 sections, 8 equations, 17 figures, 4 tables, 2 algorithms.

Figures (17)

  • Figure 1: Overview of our framework. An adversary seeks to modify an image found online. The adversary describes via a textual prompt the desired changes and then uses a diffusion model to generate a realistic image that matches the prompt (top). By immunizing the original image before the adversary can access it, we disrupt their ability to successfully perform such edits (bottom).
  • Figure 2: Diffusion models offer various capabilities, such as (1) generating images using textual prompts (top left), (2) generating variations of an input image using textual prompts (top right), and (3) editing images using textual prompts (bottom).
  • Figure 3: Overview of our proposed attacks. When applying the encoder attack (left), our goal is to map the representation of the original image to the representation of a target image (gray image). Our (more complex) diffusion attack (right), on the other hand, aims to break the diffusion process by manipulating the whole process to generate image that resembles a given target image (gray image).
  • Figure 4: Given a source image (e.g., image of a white cow on the beach) and a textual prompt (e.g., "black cow on the beach"), the SDM can generate a realistic image matching the prompt while still similar to the original image (middle column). However, when the source image is immunized, the SDM fails to do so (right-most column). More examples are in Appendix \ref{['app:additional_results']}.
  • Figure 5: Given a source image (e.g., image of two men watching a tennis game) and a textual prompt (e.g., "two men in a wedding"), the SDM can edit the source image to match the prompt (second column). However, when the source image is immunized using the encoder attack, the SDM fails to do so (third column). Immunizing using the diffusion attack further reduces the quality of the edited image (forth column). More examples are in Appendix \ref{['app:additional_results']}.
  • ...and 12 more figures