The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Jiafeng Mao; Xueting Wang; Kiyoharu Aizawa

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa

Abstract

Text-to-image diffusion models allow users control over the content of generated images. Still, text-to-image generation occasionally leads to generation failure requiring users to generate dozens of images under the same text prompt before they obtain a satisfying result. We formulate the lottery ticket hypothesis in denoising: randomly initialized Gaussian noise images contain special pixel blocks (winning tickets) that naturally tend to be denoised into specific content independently. The generation failure in standard text-to-image synthesis is caused by the gap between optimal and actual spatial distribution of winning tickets in initial noisy images. To this end, we implement semantic-driven initial image construction creating initial noise from known winning tickets for each concept mentioned in the prompt. We conduct a series of experiments that verify the properties of winning tickets and demonstrate their generalizability across images and prompts. Our results show that aggregating winning tickets into the initial noise image effectively induce the model to generate the specified object at the corresponding location. Project Page: https://ut-mao.github.io/noise.github.io

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Abstract

Paper Structure (18 sections, 5 equations, 8 figures, 2 tables)

This paper contains 18 sections, 5 equations, 8 figures, 2 tables.

Introduction
Related Work
Winning Tickets in Denoising
Cross-Attention Layer and Winning Tickets
Lottery Tickets Collection
Winning Tickets Selection
Property Verification of Winning Tickets
Semantic-Driven Initial Image Construction
Collection On the Fly
Collection in Advance
Control Strength Evaluation
Implementation
Combination with related methods
Results
Hyper Parameters
...and 3 more sections

Figures (8)

Figure 1: The concept of the winning ticket (special noisy pixel blocks) in denoising. These pixel blocks are naturally susceptible to being denoised to specific concepts. In this paper, we investigate and verify the property of winning tickets. We find that winning tickets collected from different images can be used to create new initial images and show significant advantages under different prompts.
Figure 2: We create a collection containing a large number of pixel blocks and their scores on each possible category. The scores are extracted from the cross-attention maps calculated by the pre-trained stable diffusion. We will select winning tickets from this collection in the next phase.
Figure 3: The collection contains scores for all lottery tickets, contrasting each category against others ($c_i$ (vs. $c_j$)). These scores serve as critical metrics used in the selection of the winning ticket.
Figure 4: Quality samples in our on-the-fly experiments, under the same specified regions and different prompts. By merely replacing the constructed initial noise image, without interfering with the generation process, the model spontaneously generates the specified concepts in regions consisting of the winning tickets corresponding to these concepts after dozens of denoising steps.
Figure 5: Results using the same prompt and different specified regions. The region composed of the winning tickets effectively induces the model to generate objects in it.
...and 3 more figures

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Abstract

The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization

Authors

Abstract

Table of Contents

Figures (8)