Table of Contents
Fetching ...

VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation

Feng Han, Chao Gong, Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang

TL;DR

This work addresses the safety gap in autoregressive text-to-image generation by introducing Visual Contrast Exploitation (VCE), a framework that decouples unsafe concepts from content through a contrastive image-pair construction and a tailored VSafe-DPO training regime. The approach uses a token-drop mechanism and token-level average loss to stabilize fine-tuning and leverages refined captions generated by a multimodal LLM to produce semantically clean positive examples, enabling precise concept erasure. Across artistic style erasure, object removal, and explicit content erasure, VCE achieves state-of-the-art safety performance while preserving unrelated safe content, evidenced by strong CLIP-based metrics and substantial reductions in explicit content. The method also demonstrates transferability to diffusion-based models, suggesting broad applicability for safer AR-based image generation in practice.

Abstract

Recently, autoregressive image generation models have wowed audiences with their remarkable capability in creating surprisingly realistic images. Models such as GPT-4o and LlamaGen can not only produce images that faithfully mimic renowned artistic styles like Ghibli, Van Gogh, or Picasso, but also potentially generate Not-Safe-For-Work (NSFW) content, raising significant concerns regarding copyright infringement and ethical use. Despite these concerns, methods to safeguard autoregressive text-to-image models remain underexplored. Previous concept erasure methods, primarily designed for diffusion models that operate in denoising latent space, are not directly applicable to autoregressive models that generate images token by token. To address this critical gap, we propose Visual Contrast Exploitation (VCE), a novel framework comprising: (1) an innovative contrastive image pair construction paradigm that precisely decouples unsafe concepts from their associated content semantics, and (2) a sophisticated DPO-based training approach that enhances the model's ability to identify and leverage visual contrastive features from image pairs, enabling precise concept erasure. Our comprehensive experiments across three challenging tasks-artist style erasure, explicit content erasure, and object removal-demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts. The code and models are available at https://github.com/Maplebb/VCE.

VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation

TL;DR

This work addresses the safety gap in autoregressive text-to-image generation by introducing Visual Contrast Exploitation (VCE), a framework that decouples unsafe concepts from content through a contrastive image-pair construction and a tailored VSafe-DPO training regime. The approach uses a token-drop mechanism and token-level average loss to stabilize fine-tuning and leverages refined captions generated by a multimodal LLM to produce semantically clean positive examples, enabling precise concept erasure. Across artistic style erasure, object removal, and explicit content erasure, VCE achieves state-of-the-art safety performance while preserving unrelated safe content, evidenced by strong CLIP-based metrics and substantial reductions in explicit content. The method also demonstrates transferability to diffusion-based models, suggesting broad applicability for safer AR-based image generation in practice.

Abstract

Recently, autoregressive image generation models have wowed audiences with their remarkable capability in creating surprisingly realistic images. Models such as GPT-4o and LlamaGen can not only produce images that faithfully mimic renowned artistic styles like Ghibli, Van Gogh, or Picasso, but also potentially generate Not-Safe-For-Work (NSFW) content, raising significant concerns regarding copyright infringement and ethical use. Despite these concerns, methods to safeguard autoregressive text-to-image models remain underexplored. Previous concept erasure methods, primarily designed for diffusion models that operate in denoising latent space, are not directly applicable to autoregressive models that generate images token by token. To address this critical gap, we propose Visual Contrast Exploitation (VCE), a novel framework comprising: (1) an innovative contrastive image pair construction paradigm that precisely decouples unsafe concepts from their associated content semantics, and (2) a sophisticated DPO-based training approach that enhances the model's ability to identify and leverage visual contrastive features from image pairs, enabling precise concept erasure. Our comprehensive experiments across three challenging tasks-artist style erasure, explicit content erasure, and object removal-demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts. The code and models are available at https://github.com/Maplebb/VCE.

Paper Structure

This paper contains 22 sections, 4 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Left: Autoregressive generative models possess the capability to imitate artistic styles and generate nude and violent images. Right: Our approach precisely decouples and erases the Van Gogh style from generated images.
  • Figure 2: Framework of our VCE method. We first generate images from target concepts, which model their semantic space. A caption-and-filter process followed by image generation is adopted to generate only safe content, accuratly decoupling the unsafe content. Finally, visual contrasts from these image pairs are exploited through our VSafe-DPO training methodology to erase target concepts.
  • Figure 3: Training collapse is observed when using the vanilla DPO loss, while our VSafe-DPO loss demonstrates rapid and stable convergence.
  • Figure 4: Generated images after erasing "Picasso" style. Other artist styles such as "Van Gogh" and "Rembrandt" should be maintained.
  • Figure 5: Generated images after erasing "deer". Other objects such as “cat” and “dog” should be maintained.
  • ...and 3 more figures