Table of Contents
Fetching ...

Feature Quantization Improves GAN Training

Yang Zhao, Chunyuan Li, Ping Yu, Jianfeng Gao, Changyou Chen

TL;DR

Extensive experimental results show that the proposed FQ-GAN can improve the FID scores of baseline methods by a large margin on a variety of tasks, achieving new state-of-the-art performance.

Abstract

The instability in GAN training has been a long-standing problem despite remarkable research efforts. We identify that instability issues stem from difficulties of performing feature matching with mini-batch statistics, due to a fragile balance between the fixed target distribution and the progressively generated distribution. In this work, we propose Feature Quantization (FQ) for the discriminator, to embed both true and fake data samples into a shared discrete space. The quantized values of FQ are constructed as an evolving dictionary, which is consistent with feature statistics of the recent distribution history. Hence, FQ implicitly enables robust feature matching in a compact space. Our method can be easily plugged into existing GAN models, with little computational overhead in training. We apply FQ to 3 representative GAN models on 9 benchmarks: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation. Extensive experimental results show that the proposed FQ-GAN can improve the FID scores of baseline methods by a large margin on a variety of tasks, achieving new state-of-the-art performance.

Feature Quantization Improves GAN Training

TL;DR

Extensive experimental results show that the proposed FQ-GAN can improve the FID scores of baseline methods by a large margin on a variety of tasks, achieving new state-of-the-art performance.

Abstract

The instability in GAN training has been a long-standing problem despite remarkable research efforts. We identify that instability issues stem from difficulties of performing feature matching with mini-batch statistics, due to a fragile balance between the fixed target distribution and the progressively generated distribution. In this work, we propose Feature Quantization (FQ) for the discriminator, to embed both true and fake data samples into a shared discrete space. The quantized values of FQ are constructed as an evolving dictionary, which is consistent with feature statistics of the recent distribution history. Hence, FQ implicitly enables robust feature matching in a compact space. Our method can be easily plugged into existing GAN models, with little computational overhead in training. We apply FQ to 3 representative GAN models on 9 benchmarks: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation. Extensive experimental results show that the proposed FQ-GAN can improve the FID scores of baseline methods by a large margin on a variety of tasks, achieving new state-of-the-art performance.

Paper Structure

This paper contains 50 sections, 9 equations, 21 figures, 11 tables, 1 algorithm.

Figures (21)

  • Figure 1: The proposed FQ-GAN generates images by leveraging quantized features from a dictionary, rather than producing arbitrary features in a continuous space when judged by the discriminator. The odd columns show images of the same class (real on the top row, fake at the bottom row), whose corresponding quantized feature maps are shown in the right even column, respectively. The dictionary items are visualized in 1D as the color-bar using t-SNE maaten2008visualizing. Image regions with similar semantics utilize the same/similar dictionary items. For example, bird neck is in dark red, sky or clear background is in shallow blue, grass is in orange.
  • Figure 2: Illustration of FQ-GAN: (a) The neural network architecture. A feature quantization (i.e., dictionary look-up) step $f_{\rm Q}$ is injected into the discriminator of the standard GANs. (b) A visualization example of dictionary ${{\bf E}}$ and the look-up procedure. Each circle "" indicates a quantization centroid. The true sample features ${\boldsymbol{h}}$ ("$\blacksquare$") and fake sample features $\tilde{{\boldsymbol{h}}}$ ("$\blacktriangle$") are quantized into their nearest centroids ${\boldsymbol{e}}\xspace_k$ (represented in the same color in this example), and thus performing implicit feature matching.
  • Figure 3: Illustration of FQ construction in CNNs. In this example, the dictionary has 5 items, and feature map is ${\boldsymbol{h}} \in {\mathbb{R}^{5\times 5 \times 5}}$. The feature vector at each position is quantized into a dictionary item, e.g., the back-right feature is quantized into a red item.
  • Figure 4: Ablation studies on the impact of hyper-parameters. The image generation quality is measured with FID $\downarrow$ and IS $\uparrow$. (a) Dictionary size $K=2^P$. (b) The positions to apply FQ to discriminator, layer ID is shown on the horizontal axis. (c) The decay hyper-parameter $\lambda$ in dictionary update. (d) The weight $\alpha$ to incorporate FQ, the dashed horizon lines are standard GAN baseline $\alpha = 0$.
  • Figure 5: Learning curves on CIFAR-100.
  • ...and 16 more figures