Raising the Cost of Malicious AI-Powered Image Editing
Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, Aleksander Madry
TL;DR
This work tackles the risk of malicious AI-powered image editing enabled by diffusion models by proposing image immunization through imperceptible adversarial perturbations. It develops two attack strategies—encoder and diffusion attacks—that render immunized images resistant to realistic edits and variations guided by text prompts. Empirical results show that immunization degrades the quality and similarity of edits, with diffusion-based perturbations generally more effective. The authors advocate a techno-policy approach, encouraging diffusion-model developers to provide forward-compatible APIs and collaboration to sustain these defenses against future model iterations.
Abstract
We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models. This immunization relies on injection of imperceptible adversarial perturbations designed to disrupt the operation of the targeted diffusion models, forcing them to generate unrealistic images. We provide two methods for crafting such perturbations, and then demonstrate their efficacy. Finally, we discuss a policy component necessary to make our approach fully effective and practical -- one that involves the organizations developing diffusion models, rather than individual users, to implement (and support) the immunization process.
