Several questions of visual generation in 2024

Shuyang Gu

Several questions of visual generation in 2024

Shuyang Gu

TL;DR

This paper outlines various problems in the field of visual generation based on the author's personal understanding in how to decompose visual signals, with the core of these problems lies in how to decompose visual signals.

Abstract

This paper does not propose any new algorithms but instead outlines various problems in the field of visual generation based on the author's personal understanding. The core of these problems lies in how to decompose visual signals, with all other issues being closely related to this central problem and stemming from unsuitable approaches to signal decomposition. This paper aims to draw researchers' attention to the significance of Visual Signal Decomposition.

Several questions of visual generation in 2024

TL;DR

Abstract

Paper Structure (13 sections, 6 equations, 3 figures, 1 table)

This paper contains 13 sections, 6 equations, 3 figures, 1 table.

Question 1: What's the goal of generative models?
Question 2: The problem of visual signal decomposition.
Question 3: The tokenization problem.
Question 4: Is diffusion model a maximize likelihood model?
Question 5: For diffusion model, how to balance the conflicts among different SNRs?
Question 6: Is there a scaling law for diffusion models?
Acknowledgements

Figures (3)

Figure 1: Illustration of Language and Visual Signal Decomposition. The language decomposition is equivariant, while most image decomposition are non-equivariant.
Figure 2: Illustration of the invalid encoding issue. The top two figure represent the reconstruction FID of RQVAE on ImageNet with different depths, the bottom figure shows the visualization results on FFHQ. The reconstruction will not continue to improve after the dominate stage.
Figure 3: The diffusion scaling law rely on importance weighted loss. For the left figure, $y=0.369*x^{-0.030}$. For the right figure, $y=0.303+1.4*10^5*x^{-0.140}$.

Theorems & Definitions (1)

Definition 1

Several questions of visual generation in 2024

TL;DR

Abstract

Several questions of visual generation in 2024

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (1)