Table of Contents
Fetching ...

Several questions of visual generation in 2024

Shuyang Gu

TL;DR

This paper outlines various problems in the field of visual generation based on the author's personal understanding in how to decompose visual signals, with the core of these problems lies in how to decompose visual signals.

Abstract

This paper does not propose any new algorithms but instead outlines various problems in the field of visual generation based on the author's personal understanding. The core of these problems lies in how to decompose visual signals, with all other issues being closely related to this central problem and stemming from unsuitable approaches to signal decomposition. This paper aims to draw researchers' attention to the significance of Visual Signal Decomposition.

Several questions of visual generation in 2024

TL;DR

This paper outlines various problems in the field of visual generation based on the author's personal understanding in how to decompose visual signals, with the core of these problems lies in how to decompose visual signals.

Abstract

This paper does not propose any new algorithms but instead outlines various problems in the field of visual generation based on the author's personal understanding. The core of these problems lies in how to decompose visual signals, with all other issues being closely related to this central problem and stemming from unsuitable approaches to signal decomposition. This paper aims to draw researchers' attention to the significance of Visual Signal Decomposition.
Paper Structure (13 sections, 6 equations, 3 figures, 1 table)

This paper contains 13 sections, 6 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Illustration of Language and Visual Signal Decomposition. The language decomposition is equivariant, while most image decomposition are non-equivariant.
  • Figure 2: Illustration of the invalid encoding issue. The top two figure represent the reconstruction FID of RQVAE on ImageNet with different depths, the bottom figure shows the visualization results on FFHQ. The reconstruction will not continue to improve after the dominate stage.
  • Figure 3: The diffusion scaling law rely on importance weighted loss. For the left figure, $y=0.369*x^{-0.030}$. For the right figure, $y=0.303+1.4*10^5*x^{-0.140}$.

Theorems & Definitions (1)

  • Definition 1