Table of Contents
Fetching ...

Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization

Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen

TL;DR

This work tackles the challenge of removing or replacing undesired content in diffusion-based image generation without degrading surrounding regions. It introduces Concept Replacer, comprising a few-shot trained Concept Localizer to pinpoint target regions during denoising and a training-free Dual Prompts Cross-Attention (DPCA) module to substitute the target concept using a replacement prompt. The approach achieves high localization precision and coherent, localized replacements, outperforming existing methods in both localization accuracy and content replacement while preserving non-target regions. The results suggest practical utility for safer, region-specific content control in diffusion pipelines, with potential applications in nudity or violence content mitigation and user-customized content moderation.

Abstract

As large-scale diffusion models continue to advance, they excel at producing high-quality images but often generate unwanted content, such as sexually explicit or violent content. Existing methods for concept removal generally guide the image generation process but can unintentionally modify unrelated regions, leading to inconsistencies with the original model. We propose a novel approach for targeted concept replacing in diffusion models, enabling specific concepts to be removed without affecting non-target areas. Our method introduces a dedicated concept localizer for precisely identifying the target concept during the denoising process, trained with few-shot learning to require minimal labeled data. Within the identified region, we introduce a training-free Dual Prompts Cross-Attention (DPCA) module to substitute the target concept, ensuring minimal disruption to surrounding content. We evaluate our method on concept localization precision and replacement efficiency. Experimental results demonstrate that our method achieves superior precision in localizing target concepts and performs coherent concept replacement with minimal impact on non-target areas, outperforming existing approaches.

Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization

TL;DR

This work tackles the challenge of removing or replacing undesired content in diffusion-based image generation without degrading surrounding regions. It introduces Concept Replacer, comprising a few-shot trained Concept Localizer to pinpoint target regions during denoising and a training-free Dual Prompts Cross-Attention (DPCA) module to substitute the target concept using a replacement prompt. The approach achieves high localization precision and coherent, localized replacements, outperforming existing methods in both localization accuracy and content replacement while preserving non-target regions. The results suggest practical utility for safer, region-specific content control in diffusion pipelines, with potential applications in nudity or violence content mitigation and user-customized content moderation.

Abstract

As large-scale diffusion models continue to advance, they excel at producing high-quality images but often generate unwanted content, such as sexually explicit or violent content. Existing methods for concept removal generally guide the image generation process but can unintentionally modify unrelated regions, leading to inconsistencies with the original model. We propose a novel approach for targeted concept replacing in diffusion models, enabling specific concepts to be removed without affecting non-target areas. Our method introduces a dedicated concept localizer for precisely identifying the target concept during the denoising process, trained with few-shot learning to require minimal labeled data. Within the identified region, we introduce a training-free Dual Prompts Cross-Attention (DPCA) module to substitute the target concept, ensuring minimal disruption to surrounding content. We evaluate our method on concept localization precision and replacement efficiency. Experimental results demonstrate that our method achieves superior precision in localizing target concepts and performs coherent concept replacement with minimal impact on non-target areas, outperforming existing approaches.

Paper Structure

This paper contains 12 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Framework of Our Method. Our approach is designed to replace a specified target concept during image generation within diffusion models. First, our few-shot trained concept localizer identifies the target concept’s precise location. Next, in the Dual Prompts Cross-Attention module, the target concept is replaced, guided by both the input and replacing prompts. The replacing prompt serves as conditioning specifically for the target concept’s localized area within the image features. Our Dual Prompts Cross-Attention module is training-free, seamlessly replacing the target concept during the denoising phase of diffusion models without affecting non-target regions.
  • Figure 2: Visualization of Cross-Attention Maps at Different Spatial Resolutions at Various Levels for the Target Concept. Cross-attention maps at varying spatial resolutions capture distinct types of information for the target concept. Maps $A^{l}_{cross}$ with smaller spatial dimensions primarily capture low-frequency semantic information, while maps $A^{h}_{cross}$ with larger spatial dimensions retain high-frequency, fine-grained details.
  • Figure 3: Segmentation Results on CelebAMask-HQ and Pascal-Car. Our concept localizer is compared with SLiMe for real-image segmentation, showing that our method achieves superior detail accuracy.
  • Figure 4: Concept Localization Results with the Proposed Concept Localizer. Our method effectively pinpoints target concepts during image generation, accurately identifying objects across varying sizes.
  • Figure 5: Nudity Concept Replacement Results. Unlike other methods, our approach identifies the target concept during image generation, allowing precise replacement while preserving the consistency of non-target areas with the original model.
  • ...and 3 more figures