Table of Contents
Fetching ...

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji

TL;DR

This work addresses whether GAN optimization yields gradients that truly reduce the distance to a target distribution. It develops a theoretical framework—Functional Mean Divergence ${FM}^*$ and max-Augmented Sliced Wasserstein (max-ASW)—to characterize when a discriminator can metrize the distance between distributions, even beyond optimal-discriminator assumptions. The authors prove metrizable conditions (direction optimality, separability, injectivity) and introduce Slicing Adversarial Network (SAN), which enforces these conditions via two minimal modifications to the discriminator and a new maximization objective. Empirically, SAN improves mode coverage and sample quality across synthetic MoG and image-generation tasks, including achieving state-of-the-art FID for conditional generation on ImageNet 256×256 with StyleSAN-XL. The approach is simple to apply to a broad class of GANs, with public code, offering a practical path to more stable and effective generative models.

Abstract

Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its distribution close to the target distribution. We derive metrizable conditions, sufficient conditions for the discriminator to serve as the distance between the distributions by connecting the GAN formulation with the concept of sliced optimal transport. Furthermore, by leveraging these theoretical results, we propose a novel GAN training scheme, called slicing adversarial network (SAN). With only simple modifications, a broad class of existing GANs can be converted to SANs. Experiments on synthetic and image datasets support our theoretical results and the SAN's effectiveness as compared to usual GANs. Furthermore, we also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score amongst GANs for class conditional generation on ImageNet 256$\times$256. Our implementation is available on https://ytakida.github.io/san.

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer

TL;DR

This work addresses whether GAN optimization yields gradients that truly reduce the distance to a target distribution. It develops a theoretical framework—Functional Mean Divergence and max-Augmented Sliced Wasserstein (max-ASW)—to characterize when a discriminator can metrize the distance between distributions, even beyond optimal-discriminator assumptions. The authors prove metrizable conditions (direction optimality, separability, injectivity) and introduce Slicing Adversarial Network (SAN), which enforces these conditions via two minimal modifications to the discriminator and a new maximization objective. Empirically, SAN improves mode coverage and sample quality across synthetic MoG and image-generation tasks, including achieving state-of-the-art FID for conditional generation on ImageNet 256×256 with StyleSAN-XL. The approach is simple to apply to a broad class of GANs, with public code, offering a practical path to more stable and effective generative models.

Abstract

Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its distribution close to the target distribution. We derive metrizable conditions, sufficient conditions for the discriminator to serve as the distance between the distributions by connecting the GAN formulation with the concept of sliced optimal transport. Furthermore, by leveraging these theoretical results, we propose a novel GAN training scheme, called slicing adversarial network (SAN). With only simple modifications, a broad class of existing GANs can be converted to SANs. Experiments on synthetic and image datasets support our theoretical results and the SAN's effectiveness as compared to usual GANs. Furthermore, we also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score amongst GANs for class conditional generation on ImageNet 256256. Our implementation is available on https://ytakida.github.io/san.
Paper Structure (44 sections, 10 theorems, 42 equations, 17 figures, 10 tables, 1 algorithm)

This paper contains 44 sections, 10 theorems, 42 equations, 17 figures, 10 tables, 1 algorithm.

Key Result

Proposition 3.2

For ${\mathcal{F}}(X)\in L^\infty(X,{\mathbb{R}})$, $\textit{IPM}_{{\mathcal{F}}}(\cdot,\cdot):=\max_{f\in{\mathcal{F}}}d_{f}(\cdot,\cdot)\in\mathscr{D}^{\text{FM}}_1$.

Figures (17)

  • Figure 1: Discriminator decomposition into inner-product form $\langle\omega,h(x)\rangle$. Direction $\omega$ projects $h(x)$ onto $\mathbb{R}$. In this figure, $h$ is separable because $F_{\mu_0}^{h,\omega^*}(\xi)\leq F_{\mu_\theta}^{h,\omega^*}(\xi)$ for all $\xi\in\mathbb{R}$ (see Definition \ref{['def:separable_fswd']}).
  • Figure 2: Outline of Sec. \ref{['sec:fmh_is_distance']}. Proposition \ref{['pr:fm_is_distance_if_injective_and_separable']} is a major step toward our main theorem.
  • Figure 3: An example with Wasserstein GAN loss: the metrizable conditions (direction optimality, separability, and injectivity) ensure that Wasserstein GAN loss evaluates the distance between data and generator distributions.
  • Figure 4: Converting GAN to SAN requires only simple modifications to discriminators.
  • Figure 5: Comparison of the learned distributions (at 10,000 iterations) between GAN and SAN with various objectives. In all cases, SANs cover all modes whereas mode collapse occurs in some GAN cases.
  • ...and 12 more figures

Theorems & Definitions (24)

  • Definition 1.1: Metrizable discriminator
  • Definition 3.1: Functional Mean Divergence (FM)
  • Proposition 3.2
  • Definition 3.3: Functional Mean Divergence${}^*$ (FM${}^*$)
  • Proposition 3.5: Direction optimality connects FM${}^*$ and ${\mathcal{J}}_{\text{W}}$
  • Definition 4.1: Maximum Augmented Sliced Wasserstein Divergence (max-ASW)
  • Definition 4.2: Separable
  • Lemma 4.3
  • Lemma 4.4
  • Proposition 4.5
  • ...and 14 more