Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

Katherine Xu; Lingzhi Zhang; Jianbo Shi

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

Katherine Xu, Lingzhi Zhang, Jianbo Shi

TL;DR

The paper addresses how random seeds influence diffusion-based text-to-image generation, revealing seeds can steer initial noise and reparameterization, with substantial effects on quality and interpretability. It uses a massive dataset of over 46 million images across two models and 22,512 prompts, and trains a 1,024-way seed classifier to demonstrate seeds are highly distinguishable from the outputs. It further shows seeds drive interpretable dimensions such as style, layout, grayscale tendency, sky regions, borders, and text artifacts in inpainting, enabling seed-aware control. Practically, it proposes golden seeds for high-fidelity inference, diverse seed-based sampling for style/composition variety, and seed selection to reduce inpainting artifacts, all without extra training.

Abstract

Recent advances in text-to-image (T2I) diffusion models have facilitated creative and photorealistic image synthesis. By varying the random seeds, we can generate many images for a fixed text prompt. Technically, the seed controls the initial noise and, in multi-step diffusion inference, the noise used for reparameterization at intermediate timesteps in the reverse diffusion process. However, the specific impact of the random seed on the generated images remains relatively unexplored. In this work, we conduct a large-scale scientific study into the impact of random seeds during diffusion inference. Remarkably, we reveal that the best 'golden' seed achieved an impressive FID of 21.60, compared to the worst 'inferior' seed's FID of 31.97. Additionally, a classifier can predict the seed number used to generate an image with over 99.9% accuracy in just a few epochs, establishing that seeds are highly distinguishable based on generated images. Encouraged by these findings, we examined the influence of seeds on interpretable visual dimensions. We find that certain seeds consistently produce grayscale images, prominent sky regions, or image borders. Seeds also affect image composition, including object location, size, and depth. Moreover, by leveraging these 'golden' seeds, we demonstrate improved image generation such as high-fidelity inference and diversified sampling. Our investigation extends to inpainting tasks, where we uncover some seeds that tend to insert unwanted text artifacts. Overall, our extensive analyses highlight the importance of selecting good seeds and offer practical utility for image generation.

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 16 figures, 2 tables)

This paper contains 19 sections, 2 equations, 16 figures, 2 tables.

Introduction
Related Work
Understanding Diffusion Seeds
What do seeds control in the reverse diffusion process?
Data Generation
How discriminative are seeds based on their generated images?
Impact of Seeds on Interpretable Dimensions
Practical Applications
High-Fidelity Inference
Controlling Diversity in Style and Composition
Improved Text-based Inpainting
Conclusion
Data Generation
Synthetic Prompts for Image Composition Analysis
Dataset for Inpainting Applications
...and 4 more sections

Figures (16)

Figure 1: Left: Our study reveals that the seed number influences various visual elements in text-to-image generation, such as image quality and style. Right: Certain seeds result in more inserted text in text-based inpainting tasks like object removal.
Figure 2: Left: Overview of how the seed controls the initial noise $x_T$ and intermediate $x_t$ via the sampled noise in multi-step diffusion inference. Right: We swap the seed number at early, mid, and late timesteps of the reverse diffusion process, showing an example with seeds 0 and 1. Interestingly, the seed mostly influences the initial noisy latent, rather than intermediate timesteps.
Figure 3: A visualization of three different types of text prompts used in our study.
Figure 4: Grad-CAM jacobgilpytorchcamselvaraju2017gradcam of our classifier trained to predict the seed used to create an image.
Figure 5: We compare the top three best and worst seeds for SD 2.0 using FID heusel2017gans_fid and HPS v2 wu2023human_hpsv2.
...and 11 more figures

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

TL;DR

Abstract

Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)