SMaRt: Improving GANs with Score Matching Regularity

Mengfei Xia; Yujun Shen; Ceyuan Yang; Ran Yi; Wenping Wang; Yong-Jin Liu

SMaRt: Improving GANs with Score Matching Regularity

Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-Jin Liu

TL;DR

GANs suffer from gradient vanishing when generated samples lie on low-dimensional manifolds, due to positive-measure gaps between the generated and real data supports. The authors propose SMaRt, a plug-in score-matching regularity that leverages pre-trained diffusion probabilistic models to provide a persistent gradient signal guiding out-of-manifold samples toward the real data manifold. They derive theoretical insights about generator loss optimality and the role of score matching, and implement a practical, finite-step regularization with lazy updates and narrowed timesteps to keep training feasible. Empirically, SMaRt improves synthesis quality across StyleGAN2, BigGAN, and Aurora on CIFAR10, LSUN Bedroom, and ImageNet, including a notable FID reduction from 8.87 to 7.11 for Aurora on ImageNet 64×64, on par with diffusion-based one-step methods.

Abstract

Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex. In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of \textit{subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold}. Instead, we find that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold. We thereby propose to improve the optimization of GANs with score matching regularity (SMaRt). Regarding the empirical evidences, we first design a toy example to show that training GANs by the aid of a ground-truth score function can help reproduce the real data distribution more accurately, and then confirm that our approach can consistently boost the synthesis performance of various state-of-the-art GANs on real-world datasets with pre-trained diffusion models acting as the approximate score function. For instance, when training Aurora on the ImageNet $64\times64$ dataset, we manage to improve FID from 8.87 to 7.11, on par with the performance of one-step consistency model. Code is available at \href{https://github.com/thuxmf/SMaRt}{https://github.com/thuxmf/SMaRt}.

SMaRt: Improving GANs with Score Matching Regularity

TL;DR

Abstract

dataset, we manage to improve FID from 8.87 to 7.11, on par with the performance of one-step consistency model. Code is available at \href{https://github.com/thuxmf/SMaRt}{https://github.com/thuxmf/SMaRt}.

Paper Structure (23 sections, 8 theorems, 37 equations, 10 figures, 7 tables)

This paper contains 23 sections, 8 theorems, 37 equations, 10 figures, 7 tables.

Introduction
Related work
Method
Background on GANs and DPMs
Revisiting GAN Training
Score Matching Regularity
Training Strategy
Experiments
Experimental Setups
Toy Example on Self-designed Dataset
Results on Real Datasets
Analyses
Discussion
Conclusion
Proofs and derivations
...and 8 more sections

Key Result

Theorem 3.1

Let $A,B$ be sets with positive $d$-dimensional Lebesgue measure, i.e., $\mu_d(A)>0,\mu_d(B)>0$. Denote by $q_A(\mathbf x),q_B(\mathbf x)$ two distributions supported on $A,B$, respectively, i.e., $\mathrm{supp}\;q_A=\{\mathbf x\mid q_A(\mathbf x)\neq0\}=A$, $\mathrm{supp}\;q_B=B$. Let $X\backslash

Figures (10)

Figure 1: Motivation scheme of SMaRt. Red and blue surfaces denote the generated and real data manifolds, respectively. The positive-Lebesgue-measure subset of out-of-manifold generated samples leads to non-optimal constant generator loss, annihilating the gradient for generator. However, the proposed score matching regularity ($\mathcal{L}_{score}$ in \ref{['eq:diffusion_loss']}) provides complementary guidance, urging such a subset to move towards the real data manifold. In this case, generator loss regains to exert effective guidance aiding the generator distribution to converge to the real distribution.
Figure 2: Visualization of discrete distribution example. The demonstrated toy data is simulated by a mixture of 49 2-dimensional Gaussian distributions with extremely low variance. Each data sample is a 2-dimensional feature tensor. Following wang2022diffusiongan, we train a small GAN model, whose generator and discriminator are both parameterized by MLPs, with two 128-unit hidden layers and Leaky ReLU activation functions. We show (a) the true data samples, (b) the generated samples from vanilla GAN, (c) the generated samples from DiffusionGAN wang2022diffusiongan, and (d) the generated samples from our SMaRt. As is demonstrated, vanilla GAN and DiffusionGAN fail to address all samples onto the data manifold discretely, i.e., the generated samples tend to be continuous and out of the grids. As a comparison, our SMaRt can successfully synthesize discrete samples, whose distribution coincides with the ground-truth.
Figure 3: Diverse results generated by SMaRt upon StyleGAN2 Karras2019AnalyzingAI trained on LSUN Bedroom 256x256 dataset yu15lsun. We randomly sample the global latent code $\mathbf z$ for each image.
Figure 4: Diverse results generated by SMaRt upon (a) Aurora zhu2023aurora on ImageNet 64x64 dataset dengjia2009 and (b) BigGAN Brock2018LargeSG on ImageNet 128x128 dataset dengjia2009. We randomly sample four global latent codes $\mathbf z$ for each label condition $c$, demonstrated in each row.
Figure 5: Visualization of latent interpolation results on (a) LSUN Bedroom 256x256, and (b) ImageNet 64x64. We employ StyleGAN2 Karras2019AnalyzingAI and Consistency Model (CM) song2023consistency on LSUN Bedroom 256x256 dataset, interpolating in the disentangled latent space. As for interpolation on ImageNet 64x64 dataset, we introduce Aurora zhu2023aurora and CM song2023consistency. We fix the label condition $c$, and only interpolate in the disentangled latent space $\mathcal{W}$. It is noteworthy that both StyleGAN2 and Aurora are strongly capable of synthesizing correct interpolation results, due to the extremely smooth and well-studied latent spaces. However, CM fails to generate interpolation results, due to poor semantic continuity in the latent space.
...and 5 more figures

Theorems & Definitions (13)

Theorem 3.1
Theorem 3.2
Theorem 3.3
Proposition 1.1
Theorem 1.2
proof
Theorem 1.3
proof
Theorem 1.4
proof
...and 3 more

SMaRt: Improving GANs with Score Matching Regularity

TL;DR

Abstract

SMaRt: Improving GANs with Score Matching Regularity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (13)