Dynamically Masked Discriminator for Generative Adversarial Networks
Wentian Zhang, Haozhe Liu, Bing Li, Jinheng Xie, Yawen Huang, Yuexiang Li, Yefeng Zheng, Bernard Ghanem
TL;DR
The paper addresses the instability in GAN training caused by the discriminator lagging behind the evolving distribution of generated data. It frames discriminator learning as online continual learning and introduces Dynamically Masked Discriminator (DMD), which detects when the discriminator slows down and dynamically masks its features to force fast adaptation. Through retardation detection and adaptive masking, DMD yields state-of-the-art FID improvements across StyleGAN-V2/StyleGAN-V3/BigGAN benchmarks and on large-scale datasets, demonstrating the practical benefit of online adaptation for GANs. The approach is plug-and-play and highlights the importance of accommodating non-stationary data distributions in guiding generator training, with broader implications for robust generative modeling.
Abstract
Training Generative Adversarial Networks (GANs) remains a challenging problem. The discriminator trains the generator by learning the distribution of real/generated data. However, the distribution of generated data changes throughout the training process, which is difficult for the discriminator to learn. In this paper, we propose a novel method for GANs from the viewpoint of online continual learning. We observe that the discriminator model, trained on historically generated data, often slows down its adaptation to the changes in the new arrival generated data, which accordingly decreases the quality of generated results. By treating the generated data in training as a stream, we propose to detect whether the discriminator slows down the learning of new knowledge in generated data. Therefore, we can explicitly enforce the discriminator to learn new knowledge fast. Particularly, we propose a new discriminator, which automatically detects its retardation and then dynamically masks its features, such that the discriminator can adaptively learn the temporally-vary distribution of generated data. Experimental results show our method outperforms the state-of-the-art approaches.
