CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
Yao Ni, Piotr Koniusz
TL;DR
Data-scarce GANs suffer from discriminator overfitting and training instability. CHAIN reimagines Batch Normalization by replacing the centering step with zero-mean regularization and enforcing a Lipschitz constraint on scaling through Adaptive Root Mean Square normalization, paired with adaptive interpolation between normalized and unnormalized features. The approach is underpinned by a PAC-Bayesian, IPM-based generalization analysis that connects reduced gradient norms to improved generalization, and is validated by extensive experiments showing state-of-the-art results on CIFAR-10/100, ImageNet, and several low-shot and high-resolution few-shot datasets. CHAIN proves to be a simple, architecture-agnostic technique that stabilizes GAN training under limited data while delivering substantial performance gains, accompanied by public code.
Abstract
Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data. In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training. Batch Normalization (BN), despite being known for enhancing generalization and training stability, has rarely been used in the discriminator of Data-Efficient GANs. Our work addresses this gap by identifying a critical flaw in BN: the tendency for gradient explosion during the centering and scaling steps. To tackle this issue, we present CHAIN (lipsCHitz continuity constrAIned Normalization), which replaces the conventional centering step with zero-mean regularization and integrates a Lipschitz continuity constraint in the scaling step. CHAIN further enhances GAN training by adaptively interpolating the normalized and unnormalized features, effectively avoiding discriminator overfitting. Our theoretical analyses firmly establishes CHAIN's effectiveness in reducing gradients in latent features and weights, improving stability and generalization in GAN training. Empirical evidence supports our theory. CHAIN achieves state-of-the-art results in data-limited scenarios on CIFAR-10/100, ImageNet, five low-shot and seven high-resolution few-shot image datasets. Code: https://github.com/MaxwellYaoNi/CHAIN
