On The Relationship Between Continual Learning and Long-Tailed Recognition

Mahdiyar Molahasani; Michael Greenspan; Ali Etemad

On The Relationship Between Continual Learning and Long-Tailed Recognition

Mahdiyar Molahasani, Michael Greenspan, Ali Etemad

TL;DR

The paper addresses the challenge of long-tailed recognition by proving that, under imbalance, model weights learned on the full dataset remain in a bounded neighborhood of the Head-only solution, with radius $O\left(1/\sqrt{IF}\right)$. It then reframes LTR as a continual learning problem and introduces CLTR, a method that sequentially learns Head then Tail using standard CL techniques to avoid forgetting Head, with a theoretical guarantee that the CL objective upper-bounds the balanced LTR objective. The authors establish extensions to nonconvex deep nets via the KL property, generalize to multiple partitions, and provide a general guarantee that off-the-shelf CL methods improve LTR performance. Empirically, CLTR achieves strong results on CIFAR100-LT, CIFAR10-LT, ImageNet-LT, Caltech256, and LT-CIL benchmarks, corroborating the theory and highlighting the practical value of bridging LTR and CL.

Abstract

Real-world datasets often exhibit long-tailed distributions, where a few dominant "Head" classes have abundant samples while most "Tail" classes are severely underrepresented, leading to biased learning and poor generalization for the Tail. We present a theoretical framework that reveals a previously undescribed connection between Long-Tailed Recognition (LTR) and Continual Learning (CL), the process of learning sequential tasks without forgetting prior knowledge. Our analysis demonstrates that, for models trained on imbalanced datasets, the weights converge to a bounded neighborhood of those trained exclusively on the Head, with the bound scaling as the inverse square root of the imbalance factor. Leveraging this insight, we introduce Continual Learning for Long-Tailed Recognition (CLTR), a principled approach that employs standard off-the-shelf CL methods to address LTR problems by sequentially learning Head and Tail classes without forgetting the Head. Our theoretical analysis further suggests that CLTR mitigates gradient saturation and improves Tail learning while maintaining strong Head performance. Extensive experiments on CIFAR100-LT, CIFAR10-LT, ImageNet-LT, and Caltech256 validate our theoretical predictions, achieving strong results across various LTR benchmarks. Our work bridges the gap between LTR and CL, providing a principled way to tackle imbalanced data challenges with standard existing CL strategies.

On The Relationship Between Continual Learning and Long-Tailed Recognition

TL;DR

. It then reframes LTR as a continual learning problem and introduces CLTR, a method that sequentially learns Head then Tail using standard CL techniques to avoid forgetting Head, with a theoretical guarantee that the CL objective upper-bounds the balanced LTR objective. The authors establish extensions to nonconvex deep nets via the KL property, generalize to multiple partitions, and provide a general guarantee that off-the-shelf CL methods improve LTR performance. Empirically, CLTR achieves strong results on CIFAR100-LT, CIFAR10-LT, ImageNet-LT, Caltech256, and LT-CIL benchmarks, corroborating the theory and highlighting the practical value of bridging LTR and CL.

Abstract

Paper Structure (35 sections, 11 theorems, 79 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 35 sections, 11 theorems, 79 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Proposed Approach
Training on Long-Tailed Distributions
Convex Case: Baseline Analysis
Extension to Feedforward Deep Networks
Extension to Multiple Head–Tail Partitions
Continual Learning for Long-Tailed Recognition
General Guarantees for Effectiveness of CLTR
Experiments and Results
Experiment Setup
Results
Conclusion and Future Work
Notation
Proofs
...and 20 more sections

Key Result

Theorem 3.3

Given Assumption assumption_1, if a model is trained in an LTR setting (Definition def), then the weights of the model after training ($\theta^*$) will lie within the bounded neighborhood of the model's weight if solely trained on Head ($\theta^*_H$), with the scale of:

Figures (8)

Figure 1: Overview of our framework: (left) theoretical analysis of convergence under the LTR setup, showing how imbalance drives convergence toward head-class solutions; (middle) our proposed CLTR reformulating LTR as a CL problem; and (right) optimization guarantee demonstrating that minimizing CL objective provides an upper bound that systematically improves the balanced LTR objective.
Figure 2: Illustration of the Head loss landscape ($\mathcal{L}_H$). Left: convex setting (Assumption \ref{['assumption_1']}); the unique minimizer ($\theta_H^\star$) is marked in orange. Right: Feedforward deep network, KL((1/2)) setting (Assumption \ref{['assumption_non']}); multiple local minimizers (${\theta_{H,k}^\star}$) are shown in orange. In both panels, the red region denotes the provable neighborhood that contains the parameter ($\theta^\star$) obtained when training on the full long-tailed dataset; its radius scales as ($\mathcal{O}(1/\sqrt{\operatorname{IF}})$) by Theorems \ref{['theorem1']} and \ref{['theorem_non']}. (Top: surface; bottom: contour projection.)
Figure 3: Overview of learning under the LTR scenario and our proposed CLTR approach (symbols described in the text).
Figure 4: Empirical support for Theorems \ref{['theorem1']} and \ref{['theorem_non']}. Left: logistic regression on MNIST-LT with varying imbalance factor and different $L^2$ strengths $\mu$. Right: the same analysis for ResNet-18 on CIFAR-100, with and without weight decay.
Figure A1: Class cardinality of (a) MNIST-LT, (b) CIFAR100-LT, (c) CIFAR10-LT, (d) ImageNet-LT, and (e) Caltech256.
...and 3 more figures

Theorems & Definitions (18)

Definition 3.1
Theorem 3.3
Theorem 3.5
Theorem 3.7
Proposition 3.8
Theorem 3.10
Theorem : Theorem \ref{['theorem1']} (restated)
proof : Proof of Theorem \ref{['theorem1']}
Lemma B.1
proof
...and 8 more

On The Relationship Between Continual Learning and Long-Tailed Recognition

TL;DR

Abstract

On The Relationship Between Continual Learning and Long-Tailed Recognition

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (18)