Table of Contents
Fetching ...

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

TL;DR

The paper investigates how negative prompts influence diffusion-based image generation, uncovering a persistent information lag relative to positive prompts and a neutralization mechanism in latent space. By analyzing cross-attention dynamics, it identifies a critical step after which negative prompts effectively steer generation, and reveals Inducing and Momentum effects that explain observed reverse-activation phenomena. It then introduces a timing-aware, training-free inpainting method that applies negative prompts after the critical step to remove undesired objects while preserving the background, demonstrated across multiple datasets with GPT-4V and human evaluation. The work provides both theoretical insight into prompt interactions and a practical, scalable approach for controllable image editing in diffusion models.

Abstract

The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative prompts take effect. Our extensive empirical analysis identifies two primary behaviors of negative prompts. Delayed Effect: The impact of negative prompts is observed after positive prompts render corresponding content. Deletion Through Neutralization: Negative prompts delete concepts from the generated image through a mutual cancellation effect in latent space with positive prompts. These insights reveal significant potential real-world applications; for example, we demonstrate that negative prompts can facilitate object inpainting with minimal alterations to the background via a simple adaptive algorithm. We believe our findings will offer valuable insights for the community in capitalizing on the potential of negative prompts.

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

TL;DR

The paper investigates how negative prompts influence diffusion-based image generation, uncovering a persistent information lag relative to positive prompts and a neutralization mechanism in latent space. By analyzing cross-attention dynamics, it identifies a critical step after which negative prompts effectively steer generation, and reveals Inducing and Momentum effects that explain observed reverse-activation phenomena. It then introduces a timing-aware, training-free inpainting method that applies negative prompts after the critical step to remove undesired objects while preserving the background, demonstrated across multiple datasets with GPT-4V and human evaluation. The work provides both theoretical insight into prompt interactions and a practical, scalable approach for controllable image editing in diffusion models.

Abstract

The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative prompts take effect. Our extensive empirical analysis identifies two primary behaviors of negative prompts. Delayed Effect: The impact of negative prompts is observed after positive prompts render corresponding content. Deletion Through Neutralization: Negative prompts delete concepts from the generated image through a mutual cancellation effect in latent space with positive prompts. These insights reveal significant potential real-world applications; for example, we demonstrate that negative prompts can facilitate object inpainting with minimal alterations to the background via a simple adaptive algorithm. We believe our findings will offer valuable insights for the community in capitalizing on the potential of negative prompts.
Paper Structure (13 sections, 6 equations, 9 figures, 1 table)

This paper contains 13 sections, 6 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Illustration on when the negative prompts attend to the "right" place. For example, we consider the face of the person as the "right place" for the "glasses" token. Every row represents an independent diffusion process where the first and the third rows show the tokens in the positive prompt and the second and fourth rows visualize those in the negative prompt. The positive prompt(+), negative prompt(-), and the corresponding token of the attention map are listed on top of each of the rows. Every column denotes the different diffusion steps used to visualize the cross-attention heat maps. We also enclose the feature map which attends to the "right" place for the first time, with a square box .
  • Figure 2: Illustration: Reverse activation. Each column shows an image generated by applying negative prompts in some specific steps which is shown at the top of the picture. In these two examples, the diffusion process without applying a negative prompt does not produce the object mentioned in the negative prompt. But interestingly, introducing a negative prompt in the early stages results in the generation of the specified object, which is marked with .
  • Figure 3: Illustration of Effectiveness of Negative Prompts Over Time. The x-axis represents the time step. The y-axis denotes the strength of the negative prompt. In the left figure, there is a peak at the 5th step for the noun-based negative prompt, indicating the critical step. Meanwhile, In the right figure, we observe a plateau around the 10th, as the object have been generated and the negative prompt begins to take effect.
  • Figure 4: Illustration: Heat maps showcasing the outcomes of object removal using negative prompts, with both successes and failures. Successful removals are placed in the first and third rows, while the failed attempts occupy the second and fourth rows. The first column shows the pictures without applying negative prompts contrasted by the second column, which features images with negative prompts. Notably, the feature map that first targets the relevant location is marked by a red square box . It's evident that the successful cases exhibit earlier attention to the target areas.
  • Figure 5: Illustration: The energy function in the image generation dynamics. The value at the pixel represents the energy of the point in the data distribution. We mark the background region, clear object region, and blurred object outline region by circles. To generate an object from the background, the model should overcome the energy barrier of the blurred object outline region.
  • ...and 4 more figures