TarPro: Targeted Protection against Malicious Image Editing

Kaixin Shen; Ruijie Quan; Jiaxu Miao; Jun Xiao; Yi Yang

TarPro: Targeted Protection against Malicious Image Editing

Kaixin Shen, Ruijie Quan, Jiaxu Miao, Jun Xiao, Yi Yang

TL;DR

TarPro addresses the risk of NSFW content in diffusion-based image editing by offering targeted protection that blocks malicious edits while preserving normal editing. It introduces a semantic-aware constraint coupled with a perturbation generator that operates in a high-dimensional parameter space, producing imperceptible perturbations $\delta$ with $\|\delta\|_{\infty} \leq \eta$ to selectively neutralize harmful prompts. The method optimizes a dual-objective $\min_{\|\delta\|_{\infty} \leq \eta}\{ M[g(x+\delta, y_{mal}), g(x, y_{nor})] + M[g(x+\delta, y_{nor}), g(x, y_{nor})] \}$ and trains a perturbation generator with $\delta_{init} = Enc_\theta(x)$ and $\delta = \tanh(\delta_{init}) \cdot \eta$, ensuring imperceptibility and adaptability. Across three diffusion models and a Midjourney gallery dataset, TarPro achieves state-of-the-art NSFW masking while preserving high SSIM and PSNR for normal edits, demonstrating robust zero-shot generalization to unseen prompts. This plug-and-play framework offers a practical, ethical safeguard for AI-generated content with potential extensions to multic-modal and language-guided editing systems.

Abstract

The rapid advancement of image editing techniques has raised concerns about their misuse for generating Not-Safe-for-Work (NSFW) content. This necessitates a targeted protection mechanism that blocks malicious edits while preserving normal editability. However, existing protection methods fail to achieve this balance, as they indiscriminately disrupt all edits while still allowing some harmful content to be generated. To address this, we propose TarPro, a targeted protection framework that prevents malicious edits while maintaining benign modifications. TarPro achieves this through a semantic-aware constraint that only disrupts malicious content and a lightweight perturbation generator that produces a more stable, imperceptible, and robust perturbation for image protection. Extensive experiments demonstrate that TarPro surpasses existing methods, achieving a high protection efficacy while ensuring minimal impact on normal edits. Our results highlight TarPro as a practical solution for secure and controlled image editing.

TarPro: Targeted Protection against Malicious Image Editing

TL;DR

Abstract

TarPro: Targeted Protection against Malicious Image Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)