Table of Contents
Fetching ...

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

Abstract

Text-to-image diffusion models allow users control over the content of generated images. Still, text-to-image generation occasionally leads to generation failure requiring users to generate dozens of images under the same text prompt before they obtain a satisfying result. We formulate the lottery ticket hypothesis in denoising: randomly initialized Gaussian noise images contain special pixel blocks (winning tickets) that naturally tend to be denoised into specific content independently. The generation failure in standard text-to-image synthesis is caused by the gap between optimal and actual spatial distribution of winning tickets in initial noisy images. To this end, we implement semantic-driven initial image construction creating initial noise from known winning tickets for each concept mentioned in the prompt. We conduct a series of experiments that verify the properties of winning tickets and demonstrate their generalizability across images and prompts. Our results show that aggregating winning tickets into the initial noise image effectively induce the model to generate the specified object at the corresponding location. Project Page: https://ut-mao.github.io/noise.github.io

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Abstract

Text-to-image diffusion models allow users control over the content of generated images. Still, text-to-image generation occasionally leads to generation failure requiring users to generate dozens of images under the same text prompt before they obtain a satisfying result. We formulate the lottery ticket hypothesis in denoising: randomly initialized Gaussian noise images contain special pixel blocks (winning tickets) that naturally tend to be denoised into specific content independently. The generation failure in standard text-to-image synthesis is caused by the gap between optimal and actual spatial distribution of winning tickets in initial noisy images. To this end, we implement semantic-driven initial image construction creating initial noise from known winning tickets for each concept mentioned in the prompt. We conduct a series of experiments that verify the properties of winning tickets and demonstrate their generalizability across images and prompts. Our results show that aggregating winning tickets into the initial noise image effectively induce the model to generate the specified object at the corresponding location. Project Page: https://ut-mao.github.io/noise.github.io
Paper Structure (18 sections, 5 equations, 8 figures, 2 tables)

This paper contains 18 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The concept of the winning ticket (special noisy pixel blocks) in denoising. These pixel blocks are naturally susceptible to being denoised to specific concepts. In this paper, we investigate and verify the property of winning tickets. We find that winning tickets collected from different images can be used to create new initial images and show significant advantages under different prompts.
  • Figure 2: We create a collection containing a large number of pixel blocks and their scores on each possible category. The scores are extracted from the cross-attention maps calculated by the pre-trained stable diffusion. We will select winning tickets from this collection in the next phase.
  • Figure 3: The collection contains scores for all lottery tickets, contrasting each category against others ($c_i$ (vs. $c_j$)). These scores serve as critical metrics used in the selection of the winning ticket.
  • Figure 4: Quality samples in our on-the-fly experiments, under the same specified regions and different prompts. By merely replacing the constructed initial noise image, without interfering with the generation process, the model spontaneously generates the specified concepts in regions consisting of the winning tickets corresponding to these concepts after dozens of denoising steps.
  • Figure 5: Results using the same prompt and different specified regions. The region composed of the winning tickets effectively induces the model to generate objects in it.
  • ...and 3 more figures