Table of Contents
Fetching ...

TarPro: Targeted Protection against Malicious Image Editing

Kaixin Shen, Ruijie Quan, Jiaxu Miao, Jun Xiao, Yi Yang

TL;DR

TarPro addresses the risk of NSFW content in diffusion-based image editing by offering targeted protection that blocks malicious edits while preserving normal editing. It introduces a semantic-aware constraint coupled with a perturbation generator that operates in a high-dimensional parameter space, producing imperceptible perturbations $\delta$ with $\|\delta\|_{\infty} \leq \eta$ to selectively neutralize harmful prompts. The method optimizes a dual-objective $\min_{\|\delta\|_{\infty} \leq \eta}\{ M[g(x+\delta, y_{mal}), g(x, y_{nor})] + M[g(x+\delta, y_{nor}), g(x, y_{nor})] \}$ and trains a perturbation generator with $\delta_{init} = Enc_\theta(x)$ and $\delta = \tanh(\delta_{init}) \cdot \eta$, ensuring imperceptibility and adaptability. Across three diffusion models and a Midjourney gallery dataset, TarPro achieves state-of-the-art NSFW masking while preserving high SSIM and PSNR for normal edits, demonstrating robust zero-shot generalization to unseen prompts. This plug-and-play framework offers a practical, ethical safeguard for AI-generated content with potential extensions to multic-modal and language-guided editing systems.

Abstract

The rapid advancement of image editing techniques has raised concerns about their misuse for generating Not-Safe-for-Work (NSFW) content. This necessitates a targeted protection mechanism that blocks malicious edits while preserving normal editability. However, existing protection methods fail to achieve this balance, as they indiscriminately disrupt all edits while still allowing some harmful content to be generated. To address this, we propose TarPro, a targeted protection framework that prevents malicious edits while maintaining benign modifications. TarPro achieves this through a semantic-aware constraint that only disrupts malicious content and a lightweight perturbation generator that produces a more stable, imperceptible, and robust perturbation for image protection. Extensive experiments demonstrate that TarPro surpasses existing methods, achieving a high protection efficacy while ensuring minimal impact on normal edits. Our results highlight TarPro as a practical solution for secure and controlled image editing.

TarPro: Targeted Protection against Malicious Image Editing

TL;DR

TarPro addresses the risk of NSFW content in diffusion-based image editing by offering targeted protection that blocks malicious edits while preserving normal editing. It introduces a semantic-aware constraint coupled with a perturbation generator that operates in a high-dimensional parameter space, producing imperceptible perturbations with to selectively neutralize harmful prompts. The method optimizes a dual-objective and trains a perturbation generator with and , ensuring imperceptibility and adaptability. Across three diffusion models and a Midjourney gallery dataset, TarPro achieves state-of-the-art NSFW masking while preserving high SSIM and PSNR for normal edits, demonstrating robust zero-shot generalization to unseen prompts. This plug-and-play framework offers a practical, ethical safeguard for AI-generated content with potential extensions to multic-modal and language-guided editing systems.

Abstract

The rapid advancement of image editing techniques has raised concerns about their misuse for generating Not-Safe-for-Work (NSFW) content. This necessitates a targeted protection mechanism that blocks malicious edits while preserving normal editability. However, existing protection methods fail to achieve this balance, as they indiscriminately disrupt all edits while still allowing some harmful content to be generated. To address this, we propose TarPro, a targeted protection framework that prevents malicious edits while maintaining benign modifications. TarPro achieves this through a semantic-aware constraint that only disrupts malicious content and a lightweight perturbation generator that produces a more stable, imperceptible, and robust perturbation for image protection. Extensive experiments demonstrate that TarPro surpasses existing methods, achieving a high protection efficacy while ensuring minimal impact on normal edits. Our results highlight TarPro as a practical solution for secure and controlled image editing.

Paper Structure

This paper contains 18 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Left: Demonstration of TarPro's effectiveness in targeted protection. TarPro successfully blocks malicious edits for NSFW (Not-Safe-for-Work) content while preserving the quality and functionality of normal edits. Right: TarPro showcases a marked improvement in preventing NSFW content generation, surpassing the performance of existing untargeted protection methods. The NSFW-Ratio indicates the proportion of edited images that contain NSFW content, as detailed in §\ref{['sec metrics']}.
  • Figure 2: Framework of TarPro. A perturbation generator produces an imperceptible perturbation $\delta$ added to the original image $x$, leading to a perturbed image $x+\delta$. We use normal prompts $y_{nor}$ and malicious prompts $y_{mal}$ to edit the perturbed image and optimize the perturbation generator through a malicious blocking loss $\mathcal{L}_{adv}$ and a normal preservation loss $\mathcal{L}_{reg}$. See details in §\ref{['framework']}.
  • Figure 3: Visualization comparison between our TarPro and baseline models through malicious and normal prompts (§\ref{['sec Qualitative']}).
  • Figure 4: More visualization of TarPro against malicious editing.
  • Figure 5: Quantitative results of perturbations. Higher SSIM and PSNR values indicate minimal distortion for original images.
  • ...and 6 more figures