On The Relationship Between Continual Learning and Long-Tailed Recognition
Mahdiyar Molahasani, Michael Greenspan, Ali Etemad
TL;DR
The paper addresses the challenge of long-tailed recognition by proving that, under imbalance, model weights learned on the full dataset remain in a bounded neighborhood of the Head-only solution, with radius $O\left(1/\sqrt{IF}\right)$. It then reframes LTR as a continual learning problem and introduces CLTR, a method that sequentially learns Head then Tail using standard CL techniques to avoid forgetting Head, with a theoretical guarantee that the CL objective upper-bounds the balanced LTR objective. The authors establish extensions to nonconvex deep nets via the KL property, generalize to multiple partitions, and provide a general guarantee that off-the-shelf CL methods improve LTR performance. Empirically, CLTR achieves strong results on CIFAR100-LT, CIFAR10-LT, ImageNet-LT, Caltech256, and LT-CIL benchmarks, corroborating the theory and highlighting the practical value of bridging LTR and CL.
Abstract
Real-world datasets often exhibit long-tailed distributions, where a few dominant "Head" classes have abundant samples while most "Tail" classes are severely underrepresented, leading to biased learning and poor generalization for the Tail. We present a theoretical framework that reveals a previously undescribed connection between Long-Tailed Recognition (LTR) and Continual Learning (CL), the process of learning sequential tasks without forgetting prior knowledge. Our analysis demonstrates that, for models trained on imbalanced datasets, the weights converge to a bounded neighborhood of those trained exclusively on the Head, with the bound scaling as the inverse square root of the imbalance factor. Leveraging this insight, we introduce Continual Learning for Long-Tailed Recognition (CLTR), a principled approach that employs standard off-the-shelf CL methods to address LTR problems by sequentially learning Head and Tail classes without forgetting the Head. Our theoretical analysis further suggests that CLTR mitigates gradient saturation and improves Tail learning while maintaining strong Head performance. Extensive experiments on CIFAR100-LT, CIFAR10-LT, ImageNet-LT, and Caltech256 validate our theoretical predictions, achieving strong results across various LTR benchmarks. Our work bridges the gap between LTR and CL, providing a principled way to tackle imbalanced data challenges with standard existing CL strategies.
