Table of Contents
Fetching ...

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

TL;DR

This work tackles the fidelity bias toward head classes in class-conditioned diffusion models trained on long-tailed data. It introduces DiffROP, a Probabilistic Contrastive Learning framework that penalizes distribution overlap between classes by minimizing $D_{KL}\left[p_{\theta}(x_{t-1}|x_t,\mathbf{c}^i)\|p_{\theta}(x_{t-1}|x_t,\mathbf{c}^j)\right]$, and integrates it with the standard diffusion objective along with time-dependent weighting and hinge-like margins. The approach is modular and compatible with classifier-free guidance, and it demonstrates substantial gains in FID, Recall, and Inception Score on CIFAR10LT and CIFAR100LT, particularly reducing tail-class overlap with head classes. The results also show improved utility for data augmentation in downstream long-tailed classification, highlighting practical impact for fairer, more data-efficient generative modeling. Overall, DiffROP provides a scalable, distribution-level contrastive mechanism to enhance diffusion models under long-tailed data regimes, with broad applicability beyond image synthesis.

Abstract

Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes. We show variants of our probabilistic contrastive learning method can be applied to any class conditional diffusion model. We show significant improvement in image synthesis using our loss for multiple datasets with long-tailed distribution. Extensive experimental results demonstrate that the proposed method can effectively handle imbalanced data for diffusion-based generation and classification models. Our code and datasets will be publicly available at https://github.com/yanliang3612/DiffROP.

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

TL;DR

This work tackles the fidelity bias toward head classes in class-conditioned diffusion models trained on long-tailed data. It introduces DiffROP, a Probabilistic Contrastive Learning framework that penalizes distribution overlap between classes by minimizing , and integrates it with the standard diffusion objective along with time-dependent weighting and hinge-like margins. The approach is modular and compatible with classifier-free guidance, and it demonstrates substantial gains in FID, Recall, and Inception Score on CIFAR10LT and CIFAR100LT, particularly reducing tail-class overlap with head classes. The results also show improved utility for data augmentation in downstream long-tailed classification, highlighting practical impact for fairer, more data-efficient generative modeling. Overall, DiffROP provides a scalable, distribution-level contrastive mechanism to enhance diffusion models under long-tailed data regimes, with broad applicability beyond image synthesis.

Abstract

Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes. We show variants of our probabilistic contrastive learning method can be applied to any class conditional diffusion model. We show significant improvement in image synthesis using our loss for multiple datasets with long-tailed distribution. Extensive experimental results demonstrate that the proposed method can effectively handle imbalanced data for diffusion-based generation and classification models. Our code and datasets will be publicly available at https://github.com/yanliang3612/DiffROP.
Paper Structure (33 sections, 1 theorem, 18 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 33 sections, 1 theorem, 18 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Proposition 3.1

The original training objective in eq:kl_loss for DDPM can be rewritten as

Figures (8)

  • Figure 1: Motivation of our method.
  • Figure 2: Loss visualization w.r.t. the two estimated means of Guassians $m_1$ and $m_2$. With our Hinge-based PCL loss, solutions with large distribution overlap (when $m_1$ is close to $m_2$) will be penalized, while the global optima $(0,2)$ is preserved.
  • Figure 3: Qualitative results for tail classes. Our method is better at creating clear and realistic images for less common classes compared to other basic methods.
  • Figure 4: a). We conducted a thorough analysis of the impact that varying the time-dependent parameter, $\tau$, has on the CIFAR10LT and CIFAR100LT datasets. This analysis was performed through a detailed ablation study. 'TS' refers to the DiffROP model with time-dependent $\tau$. The term 'Vanilla' signifies the standard Vanilla DDPM. b). We demonstrate the impact of the strength of classifier-free guidance, denoted as $\omega$, on the efficacy of the DiffROP sampling process.
  • Figure 5: Qualitative results for tail classes in CIFAR10LT. Our method is better at creating clear and realistic images for less common classes compared to other basic methods.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Proposition 3.1: Weight-Biased Decomposition of DDPM Loss Function