Table of Contents
Fetching ...

Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions

Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi, Mohamad H. Danesh, Li Fuxin

TL;DR

The paper tackles the challenge of generating high-quality images for tail classes in long-tailed, multi-class data with class-conditional GANs. It introduces UTLO, a two-path generative framework where the generator’s lower-resolution path is trained unconditionally to learn class-agnostic features, while higher-resolution layers remain class-conditioned to synthesize detailed, tail-specific outputs. Modifications to both generator and discriminator enable end-to-end training with a combined objective that balances conditional and unconditional signals, and the authors propose tailored evaluation metrics (FID-FS/KID-FS) for tail performance. Across six long-tail benchmarks and multiple GAN architectures, UTLO yields consistent gains in both fidelity and diversity for tail classes, addressing mode collapse and reducing reliance on early stopping, with results suggesting the approach is broadly applicable to other GAN designs.

Abstract

Despite extensive research on training generative adversarial networks (GANs) with limited training data, learning to generate images from long-tailed training distributions remains fairly unexplored. In the presence of imbalanced multi-class training data, GANs tend to favor classes with more samples, leading to the generation of low-quality and less diverse samples in tail classes. In this study, we aim to improve the training of class-conditional GANs with long-tailed data. We propose a straightforward yet effective method for knowledge sharing, allowing tail classes to borrow from the rich information from classes with more abundant training data. More concretely, we propose modifications to existing class-conditional GAN architectures to ensure that the lower-resolution layers of the generator are trained entirely unconditionally while reserving class-conditional generation for the higher-resolution layers. Experiments on several long-tail benchmarks and GAN architectures demonstrate a significant improvement over existing methods in both the diversity and fidelity of the generated images. The code is available at https://github.com/khorrams/utlo.

Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions

TL;DR

The paper tackles the challenge of generating high-quality images for tail classes in long-tailed, multi-class data with class-conditional GANs. It introduces UTLO, a two-path generative framework where the generator’s lower-resolution path is trained unconditionally to learn class-agnostic features, while higher-resolution layers remain class-conditioned to synthesize detailed, tail-specific outputs. Modifications to both generator and discriminator enable end-to-end training with a combined objective that balances conditional and unconditional signals, and the authors propose tailored evaluation metrics (FID-FS/KID-FS) for tail performance. Across six long-tail benchmarks and multiple GAN architectures, UTLO yields consistent gains in both fidelity and diversity for tail classes, addressing mode collapse and reducing reliance on early stopping, with results suggesting the approach is broadly applicable to other GAN designs.

Abstract

Despite extensive research on training generative adversarial networks (GANs) with limited training data, learning to generate images from long-tailed training distributions remains fairly unexplored. In the presence of imbalanced multi-class training data, GANs tend to favor classes with more samples, leading to the generation of low-quality and less diverse samples in tail classes. In this study, we aim to improve the training of class-conditional GANs with long-tailed data. We propose a straightforward yet effective method for knowledge sharing, allowing tail classes to borrow from the rich information from classes with more abundant training data. More concretely, we propose modifications to existing class-conditional GAN architectures to ensure that the lower-resolution layers of the generator are trained entirely unconditionally while reserving class-conditional generation for the higher-resolution layers. Experiments on several long-tail benchmarks and GAN architectures demonstrate a significant improvement over existing methods in both the diversity and fidelity of the generated images. The code is available at https://github.com/khorrams/utlo.
Paper Structure (22 sections, 3 equations, 16 figures, 13 tables)

This paper contains 22 sections, 3 equations, 16 figures, 13 tables.

Figures (16)

  • Figure 1: Generating images from rare tail classes in the Flowers-LT with only two training images. Our proposed approach allows for a more diverse set of features such as backgrounds, colors, poses, and object layouts to be infused into the tail classes.
  • Figure 2: Convergence of different methods on CIFAR100-LT ($\rho=100$), where tail classes have as few as 5 training examples. Incorporating our framework into the baseline alleviates overfitting as a result of knowledge sharing from head to rare tail classes.
  • Figure 3: The proposed framework, UTLO, illustrated for the StyleGAN2-ADA architecture. Low and high resolution image pathways are used for unconditional and class-conditional objectives, respectively. $z$ is the input latent code, $y$ indicates the class embeddings, and $c$ is a constant input. Separate style vectors, $w_z$ (class-independent) and $w_{z,y}$ (class-conditional), are generated using the same z and a shared style-mapping network $G^{}_{\text{map}}$ which are then passed to $G_l$ and $G_h$, respectively. The high-resolution generated image $\hat{x} \in \mathbb{R}^{3\times H \times H}$ is passed through the discriminator $D = D_l \circ D_h$ to calculate the conditional objective $\mathcal{L}_{c}$ while the low-resolution image $\hat{x}_l \in \mathbb{R}^{3\times L \times L}$ is passed only through $D^{}_l$ to calculate the unconditional objective $\mathcal{L}_{uc}$. The final objective $\mathcal{L}$ is the combination of the two. While $D^{}_l$ is shared, two separate prediction heads ($\text{FC}$ layers) are used for unconditional and conditional objectives. The $\text{fromRGB}$ is designed to increase the dimensionality of RGB channels to match the input channels of the $D^{}_{l}$.
  • Figure 4: Different class-conditional images generated given the same unconditional low-resolution images (left-most column).
  • Figure 5: Generated images from LSUN5-LT dataset. Despite only 50 training instances for the tail class kitchen, the proposed UTLO framework produces diverse, high-fidelity images.
  • ...and 11 more figures