Table of Contents
Fetching ...

Improving image synthesis with diffusion-negative sampling

Alakh Desai, Nuno Vasconcelos

TL;DR

A new diffusion-negative prompting (DNP) strategy is proposed, based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS).

Abstract

For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting (DNP) strategy. DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n*. The pair (p, n*) is finally used to prompt the DM. DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively and can be easily combined with several DM variants.

Improving image synthesis with diffusion-negative sampling

TL;DR

A new diffusion-negative prompting (DNP) strategy is proposed, based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS).

Abstract

For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting (DNP) strategy. DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n*. The pair (p, n*) is finally used to prompt the DM. DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively and can be easily combined with several DM variants.

Paper Structure

This paper contains 17 sections, 20 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: DNP improves quality of synthesis for prompts $\bf p$ (green) of SD's images (top-right). A diffusion-negative image, $\bar{\mathcal{I}}$, is sampled using DNS, enabling the user to visualize the negation of $\bf p$ under DM's distribution. The user translates the $\bar{\mathcal{I}}$ into a negative prompt $\bf n^*$ (red), by a process denoted as DNP, and the DM is prompted with the pair $({\bf p},{\bf n^*})$. This increases compliance and quality of the synthesized image (bottom-right). Replacing the user with a captioning model is denoted as auto-DNP.
  • Figure 2: Example of negative prompting for both semantic and non-semantic scenarios. The positive prompt, ${\bf p}$ and negative prompts, ${\bf n}$ are on top of each image.
  • Figure 3: Images synthesized by SD (left) for prompt ${\bf p}$ vs. SD+auto-DNP (right) for prompt pair (${\bf p}$, ${\bf n}^*$), where ${\bf n}^*$ is the DNP estimated from the DNS image. DNP produces negative prompts that are not intuitive for humans but improve quality and adherence.
  • Figure 4: Exploring Semantic Gap: From left to right: 1) images generated for prompt ${\bf p}$, 2) DNS images generated by the DM and resulting caption ${\bf n}^*$ by DNP, 3) images synthesized with ${\bf n}^*$ as the negative prompt alone, and 4) with prompt pair (${\bf p}$, ${\bf n}^*$).