Table of Contents
Fetching ...

DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition

Jie Shao, Ke Zhu, Hanxiao Zhang, Jianxin Wu

TL;DR

DiffuLT tackles long-tail recognition by training a diffusion model from scratch on the in-domain long-tailed dataset to synthesize balanced tail-class samples. It then filters generated data and retrains a classifier using a weighted loss that downweights synthetic samples, achieving state-of-the-art results on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT without external data. The method reveals that diffusion models can act as a dataset-wide lecturer, transferring information across classes, and that generative model quality (FID/IS) correlates with downstream accuracy. This data-centric approach offers a practical, generalizable path for long-tail tasks where external data or pretrained weights are unavailable or undesirable.

Abstract

This paper proposes a new pipeline for long-tail (LT) recognition. Instead of re-weighting or re-sampling, we utilize the long-tailed dataset itself to generate a balanced proxy that can be optimized through cross-entropy (CE). Specifically, a randomly initialized diffusion model, trained exclusively on the long-tailed dataset, is employed to synthesize new samples for underrepresented classes. Then, we utilize the inherent information in the original dataset to filter out harmful samples and keep the useful ones. Our strategy, Diffusion model for Long-Tail recognition (DiffuLT), represents a pioneering utilization of generative models in long-tail recognition. DiffuLT achieves state-of-the-art results on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT, surpassing the best competitors with non-trivial margins. Abundant ablations make our pipeline interpretable, too. The whole generation pipeline is done without any external data or pre-trained model weights, making it highly generalizable to real-world long-tailed settings.

DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition

TL;DR

DiffuLT tackles long-tail recognition by training a diffusion model from scratch on the in-domain long-tailed dataset to synthesize balanced tail-class samples. It then filters generated data and retrains a classifier using a weighted loss that downweights synthetic samples, achieving state-of-the-art results on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT without external data. The method reveals that diffusion models can act as a dataset-wide lecturer, transferring information across classes, and that generative model quality (FID/IS) correlates with downstream accuracy. This data-centric approach offers a practical, generalizable path for long-tail tasks where external data or pretrained weights are unavailable or undesirable.

Abstract

This paper proposes a new pipeline for long-tail (LT) recognition. Instead of re-weighting or re-sampling, we utilize the long-tailed dataset itself to generate a balanced proxy that can be optimized through cross-entropy (CE). Specifically, a randomly initialized diffusion model, trained exclusively on the long-tailed dataset, is employed to synthesize new samples for underrepresented classes. Then, we utilize the inherent information in the original dataset to filter out harmful samples and keep the useful ones. Our strategy, Diffusion model for Long-Tail recognition (DiffuLT), represents a pioneering utilization of generative models in long-tail recognition. DiffuLT achieves state-of-the-art results on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT, surpassing the best competitors with non-trivial margins. Abundant ablations make our pipeline interpretable, too. The whole generation pipeline is done without any external data or pre-trained model weights, making it highly generalizable to real-world long-tailed settings.
Paper Structure (15 sections, 7 equations, 2 figures, 7 tables)

This paper contains 15 sections, 7 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Our approach centers on the dataset, employing the long-tailed dataset to train a diffusion model, thereby creating a balanced proxy for direct optimization.
  • Figure 2: The overall pipeline of our method DiffuLT.