Table of Contents
Fetching ...

Nested Annealed Training Scheme for Generative Adversarial Networks

Chang Wan, Ming-Hsuan Yang, Minglu Li, Yunliang Jiang, Zhonglong Zheng

TL;DR

This work establishes a theoretical link between GANs and score-based models by formulating the composite-functional-gradient GAN (CFG) and deriving the CFG discriminator gradient as the difference of real and generated score functions. It then introduces an explicit annealed weighting (annealed CFG) and a nested training scheme (NATS) that preserves the CFG gradient structure while remaining adaptable to diverse GAN architectures. Empirically, both annealed CFG and NATS yield improvements in image quality and diversity (IS and FID) across standard benchmarks and even bolster state-of-the-art models, albeit with some limitations in discriminator analysis and computational demands. Overall, the proposed framework provides a theoretically grounded, practically effective approach to stabilize and enhance GAN training through score-function–driven, annealed, and nested optimization strategies.

Abstract

Recently, researchers have proposed many deep generative models, including generative adversarial networks(GANs) and denoising diffusion models. Although significant breakthroughs have been made and empirical success has been achieved with the GAN, its mathematical underpinnings remain relatively unknown. This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG)[1]. Specifically, we reveal the theoretical connection between the CFG model and score-based models. We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x). The optimal gradient of D(x) differentiates the integral of the differences between the score functions of real and synthesized samples. Conversely, training the CFG generator involves finding an optimal G(x) that minimizes this difference. In this paper, we aim to derive an annealed weight preceding the weight of the CFG discriminator. This new explicit theoretical explanation model is called the annealed CFG method. To overcome the limitation of the annealed CFG method, as the method is not readily applicable to the SOTA GAN model, we propose a nested annealed training scheme (NATS). This scheme keeps the annealed weight from the CFG method and can be seamlessly adapted to various GAN models, no matter their structural, loss, or regularization differences. We conduct thorough experimental evaluations on various benchmark datasets for image generation. The results show that our annealed CFG and NATS methods significantly improve the quality and diversity of the synthesized samples. This improvement is clear when comparing the CFG method and the SOTA GAN models.

Nested Annealed Training Scheme for Generative Adversarial Networks

TL;DR

This work establishes a theoretical link between GANs and score-based models by formulating the composite-functional-gradient GAN (CFG) and deriving the CFG discriminator gradient as the difference of real and generated score functions. It then introduces an explicit annealed weighting (annealed CFG) and a nested training scheme (NATS) that preserves the CFG gradient structure while remaining adaptable to diverse GAN architectures. Empirically, both annealed CFG and NATS yield improvements in image quality and diversity (IS and FID) across standard benchmarks and even bolster state-of-the-art models, albeit with some limitations in discriminator analysis and computational demands. Overall, the proposed framework provides a theoretically grounded, practically effective approach to stabilize and enhance GAN training through score-function–driven, annealed, and nested optimization strategies.

Abstract

Recently, researchers have proposed many deep generative models, including generative adversarial networks(GANs) and denoising diffusion models. Although significant breakthroughs have been made and empirical success has been achieved with the GAN, its mathematical underpinnings remain relatively unknown. This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG)[1]. Specifically, we reveal the theoretical connection between the CFG model and score-based models. We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x). The optimal gradient of D(x) differentiates the integral of the differences between the score functions of real and synthesized samples. Conversely, training the CFG generator involves finding an optimal G(x) that minimizes this difference. In this paper, we aim to derive an annealed weight preceding the weight of the CFG discriminator. This new explicit theoretical explanation model is called the annealed CFG method. To overcome the limitation of the annealed CFG method, as the method is not readily applicable to the SOTA GAN model, we propose a nested annealed training scheme (NATS). This scheme keeps the annealed weight from the CFG method and can be seamlessly adapted to various GAN models, no matter their structural, loss, or regularization differences. We conduct thorough experimental evaluations on various benchmark datasets for image generation. The results show that our annealed CFG and NATS methods significantly improve the quality and diversity of the synthesized samples. This improvement is clear when comparing the CFG method and the SOTA GAN models.
Paper Structure (16 sections, 4 theorems, 18 equations, 6 figures, 12 tables, 3 algorithms)

This paper contains 16 sections, 4 theorems, 18 equations, 6 figures, 12 tables, 3 algorithms.

Key Result

Proposition 1

Letting $p_*(\mathbf{x})$ and $p_g(\mathbf{x})$ denoting the distribution of real samples and synthesis samples, respectively, we obtain

Figures (6)

  • Figure 1: Intuitive understanding of our annealed weight mechanisms.
  • Figure 2: The figure illustrates the concept derived from a score-based model song2019generative, where the score represents the gradient of the logarithmic density function of real samples. This indicates the direction through which noise samples evolve towards real samples.
  • Figure 3: (a) A failure synthesis sample of CFG and gradient of discriminator in each $M$ Step in 256$\times$256 resolution LSUN Church. (b) Synthesized sample of Annealed CFG and gradient of discriminator in each $M$ Step in 256$\times$256 resolution LSUN Church The visual effect in the $M=1$ column is more ambiguous than in the $M=15$ column. In the CFG method, each $M$ step shares the same gradient weight. When these gradients accumulate together, the features in $M=1$ will be covered by features in $M=15$. The performance of our Annealed CFG has been improved because a geometry annealed weight is set for each M step, with a condition that $\frac{\mathbf{w}_{1}(\mathbf{x})}{\mathbf{w}_{2}(\mathbf{x})}=\cdots=\frac{\mathbf{w}_{m-1}(\mathbf{x})}{\mathbf{w}_{m}(\mathbf{x})} > 1$, with $M=15$, $\mathbf{w}_{1}(\mathbf{x})=20$, and $\mathbf{w}_{15}(\mathbf{x})=1$. Each $M$ step's gradient is aligned to maximize the retention of features in each $M$ step. The results agree implicitly with the score functions. In the initial steps of the score function, the distribution gradient is ambiguous, and thus we can use a large step size (a larger weight). When the steps come closer to the read data distribution, the distribution gradient is clear, and thus we should be careful and take a smaller step size (a smaller weight).
  • Figure 4: Best FID scores for various types of GAN architectures: original GAN, LSGAN, WGAN, and HingeGAN – under different settings for $N_d$. The values of $N_d$ considered are 5, 10, 15, and 20. The red line represents the performance of NATS. The other colored lines represent the performance of NTS.
  • Figure 5: Performance of BigGAN trained on the CIFAR10 dataset using different training schemes. Column (a): This column displays synthetic samples generated by BigGAN trained with the CTS. Column (b): Here, we present the synthetic samples from BigGAN when trained using the NTS. Column (c): This column showcases the synthetic samples from BigGAN trained with the NATS.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4