Reviving Undersampling for Long-Tailed Learning

Hao Yu; Yingxiao Du; Jianxin Wu

Reviving Undersampling for Long-Tailed Learning

Hao Yu, Yingxiao Du, Jianxin Wu

TL;DR

This work tackles long-tailed recognition by shifting focus to worst-performing classes using harmonic and geometric means. It introduces Balanced Training and Merging (BTM), a plug-in pipeline that fine-tunes multiple few-shot balanced subsets and merges them by averaging to improve worst-case per-class accuracy with little or no loss in average accuracy. Across Places-LT, ImageNet-LT, and iNaturalist2018, BTM yields substantial gains in harmonic and geometric means, and can be combined with methods like GML for further improvements, all while preserving inference efficiency. The approach is lightweight, broadly compatible with existing decoupling strategies, and supported by public code, making it practical for real-world long-tailed learning deployments.

Abstract

The training datasets used in long-tailed recognition are extremely unbalanced, resulting in significant variation in per-class accuracy across categories. Prior works mostly used average accuracy to evaluate their algorithms, which easily ignores those worst-performing categories. In this paper, we aim to enhance the accuracy of the worst-performing categories and utilize the harmonic mean and geometric mean to assess the model's performance. We revive the balanced undersampling idea to achieve this goal. In few-shot learning, balanced subsets are few-shot and will surely under-fit, hence it is not used in modern long-tailed learning. But, we find that it produces a more equitable distribution of accuracy across categories with much higher harmonic and geometric mean accuracy, and, but lower average accuracy. Moreover, we devise a straightforward model ensemble strategy, which does not result in any additional overhead and achieves improved harmonic and geometric mean while keeping the average accuracy almost intact when compared to state-of-the-art long-tailed learning methods. We validate the effectiveness of our approach on widely utilized benchmark datasets for long-tailed learning. Our code is at \href{https://github.com/yuhao318/BTM/}{https://github.com/yuhao318/BTM/}.

Reviving Undersampling for Long-Tailed Learning

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 3 figures, 14 tables)

This paper contains 19 sections, 3 equations, 3 figures, 14 tables.

Introduction
Related work
Re-sampling and re-weighting methods.
Decoupling methods.
Ensemble methods.
Other methods.
Method
Harmonic Mean is the Preferred Evaluation Metric
Can We Revive the Undersampling Strategy?
Balanced Training and Merging
Experiments
Datasets, Metrics, and Implementation Details
Comparison with Other Methods
Ablation Studies
Conclusions, Limitations and Future Work
...and 4 more sections

Figures (3)

Figure 1: \ref{['fig:hmean_ori_ft']} and \ref{['fig:gmean_ori_ft']} present the harmonic and geometric mean of interpolated models between the raw model $f$ ($\lambda=0$) and the fine-tuned model $f^{ D}$ ($\lambda=1$), respectively.
Figure 2: The blue curves in \ref{['fig:hmean_ft_ft']} and \ref{['fig:gmean_ft_ft']} present the harmonic and geometric mean of interpolated models between the fine-tuned model $f^{ D_A}$ ($\lambda=0$) and the fine-tuned model $f^{ D_B}$ ($\lambda=1$), respectively. The yellow curves mean the harmonic and geometric mean of $f^{ D_{A\cup B}}$.
Figure 3: Visualization of the change in the distribution of per-class recall (i.e., accuracy). (\ref{['fig:per-class-acc-stage1-merged']}) shows that by performing balanced training on our sampled few-shot datasets and later merging all models together, we are able to greatly improve the performance of the model. (\ref{['fig:per-class-acc-stage2-final']}) is the comparison of per-class accuracy between our final model and MiSLAS.

Reviving Undersampling for Long-Tailed Learning

TL;DR

Abstract

Reviving Undersampling for Long-Tailed Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)