Correcting Diffusion Generation through Resampling

Yujian Liu; Yang Zhang; Tommi Jaakkola; Shiyu Chang

Correcting Diffusion Generation through Resampling

Yujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang

TL;DR

A particle filtering framework that can effectively ad-dress both problems by explicitly reducing the distributional discrepancies by relying on a set of ex-ternal guidance to gauge the distribution gap, and then design the resampling weight accordingly to correct the gap.

Abstract

Despite diffusion models' superior capabilities in modeling complex distributions, there are still non-trivial distributional discrepancies between generated and ground-truth images, which has resulted in several notable problems in image generation, including missing object errors in text-to-image generation and low image quality. Existing methods that attempt to address these problems mostly do not tend to address the fundamental cause behind these problems, which is the distributional discrepancies, and hence achieve sub-optimal results. In this paper, we propose a particle filtering framework that can effectively address both problems by explicitly reducing the distributional discrepancies. Specifically, our method relies on a set of external guidance, including a small set of real images and a pre-trained object detector, to gauge the distribution gap, and then design the resampling weight accordingly to correct the gap. Experiments show that our methods can effectively correct missing object errors and improve image quality in various image generation tasks. Notably, our method outperforms the existing strongest baseline by 5% in object occurrence and 1.0 in FID on MS-COCO. Our code is publicly available at https://github.com/UCSB-NLP-Chang/diffusion_resampling.git.

Correcting Diffusion Generation through Resampling

TL;DR

Abstract

Paper Structure (34 sections, 25 equations, 15 figures, 12 tables)

This paper contains 34 sections, 25 equations, 15 figures, 12 tables.

Introduction
Related Works
Methodology
Background and Notation
Problem Formulation
Initial Exploration: A Naive Approach
A Particle Filtering Framework
A Discriminator-Based Approach
A Hybrid Approach
Generalization to Other Generation Settings
Experiments
Text-to-Image Generation
Unconditional & Class-conditioned Generation
Ablation Study
Conclusion
...and 19 more sections

Figures (15)

Figure 1: Illustration of our particle filtering framework.
Figure 2: Calculation of the correction term $\phi_t(\bm X_t | \bm C)$.
Figure 3: FID ($\downarrow$) vs. Object occurrence ($\uparrow$) for all methods. Ideal points should scatter at the bottom right corner. Object occurrence is measured on GPT-Synthetic (left) and MS-COCO (right), and FID is measured on MS-COCO. $K=5, 10, 15$ images are generated for sample selection methods, and the sizes of points indicate the value of $K$ (larger $K$ has larger points). The method that achieves the best combined performance is highlighted in red.
Figure 4: Visualization of generated samples. Missing objects are highlighted in red. Unnatural objects are highlighted with underline.
Figure 5: FID (average of 3 runs) on ImageNet-64 (left) and FFHQ (right). Error bars indicate standard deviations.
...and 10 more figures

Correcting Diffusion Generation through Resampling

TL;DR

Abstract

Correcting Diffusion Generation through Resampling

Authors

TL;DR

Abstract

Table of Contents

Figures (15)