Style Quantization for Data-Efficient GAN Training
Jian Wang, Xin Lan, Jizhe Zhou, Yuxin Tian, Jiancheng Lv
TL;DR
SQ-GAN introduces a style-space quantization framework that converts a sparse input latent space into a compact, discrete proxy $\\mathcal{W}^q$ by partitioning $\\mathcal{W}$ into $s$ sub-vectors and quantizing each with a learnable codebook. By integrating a knowledge-enhanced codebook initialization via optimal transport and CLIP-based semantic alignment, the method embeds external knowledge into the codebook to produce a semantically rich vocabulary for limited-data GAN training. The approach couples adversarial losses with a suite of quantization and CR losses, including a novel uniformity regularization to prevent codebook collapse, and a quantization-based CR to stabilize discriminator evaluations under perturbations. Experimental results on four datasets demonstrate substantial gains in FID, IS, and KID, with ablations confirming the benefits of code dimension, uniformity, and CBI. Overall, SQ-GAN provides a data-efficient pathway to robustly leverage the latent space for high-quality image synthesis under data scarcity.
Abstract
Under limited data setting, GANs often struggle to navigate and effectively exploit the input latent space. Consequently, images generated from adjacent variables in a sparse input latent space may exhibit significant discrepancies in realism, leading to suboptimal consistency regularization (CR) outcomes. To address this, we propose \textit{SQ-GAN}, a novel approach that enhances CR by introducing a style space quantization scheme. This method transforms the sparse, continuous input latent space into a compact, structured discrete proxy space, allowing each element to correspond to a specific real data point, thereby improving CR performance. Instead of direct quantization, we first map the input latent variables into a less entangled ``style'' space and apply quantization using a learnable codebook. This enables each quantized code to control distinct factors of variation. Additionally, we optimize the optimal transport distance to align the codebook codes with features extracted from the training data by a foundation model, embedding external knowledge into the codebook and establishing a semantically rich vocabulary that properly describes the training dataset. Extensive experiments demonstrate significant improvements in both discriminator robustness and generation quality with our method.
