Latent-based Diffusion Model for Long-tailed Recognition

Pengxiao Han; Changkun Ye; Jieming Zhou; Jing Zhang; Jie Hong; Xuesong Li

Latent-based Diffusion Model for Long-tailed Recognition

Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li

TL;DR

This paper tackles long-tailed recognition by introducing LDMLR, a three-stage approach that augments minority-class representations with diffusion-generated latent features. By operating in the latent feature space, LDMLR uses a class-conditional DDIM/LDM to produce pseudo-features and then jointly trains a classifier on real and generated embeddings. Empirical results on CIFAR-LT and ImageNet-LT show consistent improvements over strong baselines, with latent augmentation outperforming image-space diffusion and focused tail-class augmentation providing the largest gains. The method is efficient due to latent-space diffusion and demonstrates the potential of diffusion models for enhancing imbalanced visual recognition in practical settings.

Abstract

Long-tailed imbalance distribution is a common issue in practical computer vision applications. Previous works proposed methods to address this problem, which can be categorized into several classes: re-sampling, re-weighting, transfer learning, and feature augmentation. In recent years, diffusion models have shown an impressive generation ability in many sub-problems of deep computer vision. However, its powerful generation has not been explored in long-tailed problems. We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue. First, we encode the imbalanced dataset into features using the baseline model. Then, we train a Denoising Diffusion Implicit Model (DDIM) using these encoded features to generate pseudo-features. Finally, we train the classifier using the encoded and pseudo-features from the previous two steps. The model's accuracy shows an improvement on the CIFAR-LT and ImageNet-LT datasets by using the proposed method.

Latent-based Diffusion Model for Long-tailed Recognition

TL;DR

Abstract

Paper Structure (16 sections, 12 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 12 equations, 3 figures, 5 tables, 1 algorithm.

Introduction
Related Works
Long-tailed Recognition
Generative Models for Feature Augmentation in Long-tailed Recognition
Diffusion Model
Approach
Preliminaries
Stage 1: Image Encoding
Stage 2: Representation Generation
Stage 3: Classifier Training
Experiments
Setup
Results on CIFAR-LT
Results on ImageNet-LT
Analysis
...and 1 more sections

Figures (3)

Figure 1: Overview of the proposed framework, LDMLR. The figure describes the training of the framework: (a) obtain encoded features by a pre-training convolutional neural network on the long-tailed training set, (b) Generate pseudo-features by the diffusion model using encoded features, and (c) Train the fully connected layers using encoded and pseudo-features. The encoder from (a) and the classifier from (c) are used to predict long-tailed data in the evaluation stage.
Figure 2: The impact of generation ratio on classification accuracy. The evaluation is conducted on CIFAR-10-LT and CIFAR-100-LT with $\mathrm{IF}=10$.
Figure 3: The encoded and generated features of tail class (class 9) in CIFAR-10-LT during the model training. From the figure, the generated features (blue points) can overlay the encoded features (red points) from the original training dataset while slightly enriching the feature space.

Latent-based Diffusion Model for Long-tailed Recognition

TL;DR

Abstract

Latent-based Diffusion Model for Long-tailed Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (3)