Table of Contents
Fetching ...

Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data

Yuta Mimura

TL;DR

The paper tackles the data-inefficiency problem of training generative models on small datasets by introducing phased data augmentation, which gradually tightens augmentation to nudge the model toward the true data distribution. It applies this strategy to PC-VQ2, a likelihood-based model combining PixelCNNs with VQ-VAE-2, and demonstrates consistent improvements in both quantitative (FID) and qualitative assessments across multiple datasets. The work shows that phased augmentation offers a robust, GAN-free avenue for data-efficient training of complex autoregressive generators, potentially widening the applicability of augmentation techniques beyond GANs. Overall, the approach provides a practical method to improve image synthesis when data are costly or scarce, with broad relevance to likelihood-based architectures.

Abstract

Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges, especially in domains where data collection is costly or challenging. Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models. Our study introduces "phased data augmentation" as a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution. By limiting the augmentation intensity throughout the learning phases, our method enhances the model's ability to learn from limited data, thus maintaining fidelity. Applied to a model integrating PixelCNNs with VQ-VAE-2, our approach demonstrates superior performance in both quantitative and qualitative evaluations across diverse datasets. This represents an important step forward in the efficient training of likelihood-based models, extending the usefulness of data augmentation techniques beyond just GANs.

Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data

TL;DR

The paper tackles the data-inefficiency problem of training generative models on small datasets by introducing phased data augmentation, which gradually tightens augmentation to nudge the model toward the true data distribution. It applies this strategy to PC-VQ2, a likelihood-based model combining PixelCNNs with VQ-VAE-2, and demonstrates consistent improvements in both quantitative (FID) and qualitative assessments across multiple datasets. The work shows that phased augmentation offers a robust, GAN-free avenue for data-efficient training of complex autoregressive generators, potentially widening the applicability of augmentation techniques beyond GANs. Overall, the approach provides a practical method to improve image synthesis when data are costly or scarce, with broad relevance to likelihood-based architectures.

Abstract

Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges, especially in domains where data collection is costly or challenging. Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models. Our study introduces "phased data augmentation" as a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution. By limiting the augmentation intensity throughout the learning phases, our method enhances the model's ability to learn from limited data, thus maintaining fidelity. Applied to a model integrating PixelCNNs with VQ-VAE-2, our approach demonstrates superior performance in both quantitative and qualitative evaluations across diverse datasets. This represents an important step forward in the efficient training of likelihood-based models, extending the usefulness of data augmentation techniques beyond just GANs.
Paper Structure (15 sections, 3 equations, 7 figures, 1 table)

This paper contains 15 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Overview of a PC-VQ2 training structure and application of phased augmentation to it. The upper figure illustrates the training of VQ-VAE-2, and the lower figure illustrates the individual training of PixelCNNs.
  • Figure 2: Graphical representation of phased data augmentation.
  • Figure 3: Images reconstructed in each phase through a bottom-level PixelCNN training. Each element in the 'Change' row indicates the specific limitation imposed relative to the previous phase, with more detailed explanations provided in Section 4.3.
  • Figure 4: Generated human-face and cat-face images by the trained PC-VQ2 models.
  • Figure 5: Generated images by the trained PC-VQ2 model with no data augmentation.
  • ...and 2 more figures