Table of Contents
Fetching ...

Scaling-based Data Augmentation for Generative Models and its Theoretical Extension

Yoshitaka Koike, Takumi Nakagawa, Hiroki Waida, Takafumi Kanamori

TL;DR

It is theoretically prove that data scaling controls the bias-variance trade-off of the estimation error bound and proposed a learning algorithm, Scale-GAN, that uses data scaling and variance-based regularization.

Abstract

This paper studies stable learning methods for generative models that enable high-quality data generation. Noise injection is commonly used to stabilize learning. However, selecting a suitable noise distribution is challenging. Diffusion-GAN, a recently developed method, addresses this by using the diffusion process with a timestep-dependent discriminator. We investigate Diffusion-GAN and reveal that data scaling is a key component for stable learning and high-quality data generation. Building on our findings, we propose a learning algorithm, Scale-GAN, that uses data scaling and variance-based regularization. Furthermore, we theoretically prove that data scaling controls the bias-variance trade-off of the estimation error bound. As a theoretical extension, we consider GAN with invertible data augmentations. Comparative evaluations on benchmark datasets demonstrate the effectiveness of our method in improving stability and accuracy.

Scaling-based Data Augmentation for Generative Models and its Theoretical Extension

TL;DR

It is theoretically prove that data scaling controls the bias-variance trade-off of the estimation error bound and proposed a learning algorithm, Scale-GAN, that uses data scaling and variance-based regularization.

Abstract

This paper studies stable learning methods for generative models that enable high-quality data generation. Noise injection is commonly used to stabilize learning. However, selecting a suitable noise distribution is challenging. Diffusion-GAN, a recently developed method, addresses this by using the diffusion process with a timestep-dependent discriminator. We investigate Diffusion-GAN and reveal that data scaling is a key component for stable learning and high-quality data generation. Building on our findings, we propose a learning algorithm, Scale-GAN, that uses data scaling and variance-based regularization. Furthermore, we theoretically prove that data scaling controls the bias-variance trade-off of the estimation error bound. As a theoretical extension, we consider GAN with invertible data augmentations. Comparative evaluations on benchmark datasets demonstrate the effectiveness of our method in improving stability and accuracy.

Paper Structure

This paper contains 37 sections, 12 theorems, 98 equations, 13 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that $\mathcal{X}$ is $[0,1]^d$ or $\mathbb{R}^d$. Let $q({\bm y}), {\bm y}\in\mathcal{X}$ be the probability density of the generated sample ${\bm y}=G({\bm z}), {\bm z}\sim p_z$. Suppose $\|p_0\|_\infty<\infty$ and $\frac{p_0}{p_0+q}\in \mathcal{U}_\delta$ for a $\delta\in[0,1/2)$. Let $s_ Furthermore, the optimal discriminator satisfies $\widetilde{D}(s_t{\bm x},t)=\widetilde{D}({\bm x}

Figures (13)

  • Figure 1: For each data scaling, (a) precision, (b) recall, and (c) averaged norm of discriminator's gradient are depicted. The upper and lower panels correspond to two different seeds.
  • Figure 2: The discriminator's predictions for $s=1.5$. The number of repetitions ranges from 9200 to 9700 from top left to bottom right. Blue (resp. Red) dots represent the data (resp. generated samples). Modes with high values change in an oscillatory manner, and the oscillations become more intense, leading to mode collapse.
  • Figure 3: (a) precision, and (b) recall for each scaling strategy, "fix", "linear const", and "adaptive".
  • Figure 4: (a) Learning without scaling: the training data (red dots), generated data (blue dots), and the discriminator's outputs (heatmap) are depicted. Upper panels: $\sigma_{\text{noise}}=0$. Lower panels: $\sigma_{\text{noise}}=0.2$. The number of training iterations ranges from 5000 to 20000 at every 5000 from right to left. (b) Learning with scaling: the training data (red dots), generated data (blue dots), and the discriminator's outputs (heatmap) are depicted. Upper panels: "adaptive" scaling with $\sigma_{\text{noise}}=0$. Lower panels: Diffusion-GAN with "adaptive" scaling and $\sigma_{\text{noise}}=0.2$. The number of training iterations ranges from 10000 to 40000 at every 10000 from right to left.
  • Figure 5: Averaged cosine similarity between $\nabla{D}(\widetilde{{\bm x}},t)$ and $\nabla{D}({\bm x},0)$ at each iteration on CIFAR-10. Blue solid line is Diffusion-GAN with $\sigma_{\mathrm{noise}}=0.05$, and orange solid line is Scale-GAN.
  • ...and 8 more figures

Theorems & Definitions (28)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Theorem 3
  • Example 1: Scaling
  • Example 2: $\pi/2$ rotation
  • Example 3: Saturation
  • Lemma D.4: Existence and uniqueness of optimal solution
  • proof
  • Lemma D.5
  • ...and 18 more