SMaRt: Improving GANs with Score Matching Regularity
Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-Jin Liu
TL;DR
GANs suffer from gradient vanishing when generated samples lie on low-dimensional manifolds, due to positive-measure gaps between the generated and real data supports. The authors propose SMaRt, a plug-in score-matching regularity that leverages pre-trained diffusion probabilistic models to provide a persistent gradient signal guiding out-of-manifold samples toward the real data manifold. They derive theoretical insights about generator loss optimality and the role of score matching, and implement a practical, finite-step regularization with lazy updates and narrowed timesteps to keep training feasible. Empirically, SMaRt improves synthesis quality across StyleGAN2, BigGAN, and Aurora on CIFAR10, LSUN Bedroom, and ImageNet, including a notable FID reduction from 8.87 to 7.11 for Aurora on ImageNet 64×64, on par with diffusion-based one-step methods.
Abstract
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex. In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of \textit{subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold}. Instead, we find that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold. We thereby propose to improve the optimization of GANs with score matching regularity (SMaRt). Regarding the empirical evidences, we first design a toy example to show that training GANs by the aid of a ground-truth score function can help reproduce the real data distribution more accurately, and then confirm that our approach can consistently boost the synthesis performance of various state-of-the-art GANs on real-world datasets with pre-trained diffusion models acting as the approximate score function. For instance, when training Aurora on the ImageNet $64\times64$ dataset, we manage to improve FID from 8.87 to 7.11, on par with the performance of one-step consistency model. Code is available at \href{https://github.com/thuxmf/SMaRt}{https://github.com/thuxmf/SMaRt}.
