Table of Contents
Fetching ...

Decoupling Representation and Classifier for Long-Tailed Recognition

Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis

TL;DR

This paper tackles long-tailed recognition by decoupling the learning of representations from the classifier. It systematically analyzes how different sampling strategies affect representation learning and introduces several decoupled classifier mechanisms (cRT, NCM, tau-normalized, LWS) to rebalance decision boundaries without retraining representations. Across ImageNet-LT, Places-LT, and iNaturalist, decoupled learning with instance-balanced representations and classifier balancing achieves state-of-the-art results, often surpassing methods that rely on specialized losses or memory modules. The work provides practical guidance for handling imbalanced data and suggests that simple, well-balanced classifiers can unlock strong tail-class performance. The accompanying code is released to facilitate reproducibility and adoption.

Abstract

The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem. Existing solutions usually involve class-balancing strategies, e.g., by loss re-weighting, data re-sampling, or transfer learning from head- to tail-classes, but most of them adhere to the scheme of jointly learning representations and classifiers. In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. The findings are surprising: (1) data imbalance might not be an issue in learning high-quality representations; (2) with representations learned with the simplest instance-balanced (natural) sampling, it is also possible to achieve strong long-tailed recognition ability by adjusting only the classifier. We conduct extensive experiments and set new state-of-the-art performance on common long-tailed benchmarks like ImageNet-LT, Places-LT and iNaturalist, showing that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification. Our code is available at https://github.com/facebookresearch/classifier-balancing.

Decoupling Representation and Classifier for Long-Tailed Recognition

TL;DR

This paper tackles long-tailed recognition by decoupling the learning of representations from the classifier. It systematically analyzes how different sampling strategies affect representation learning and introduces several decoupled classifier mechanisms (cRT, NCM, tau-normalized, LWS) to rebalance decision boundaries without retraining representations. Across ImageNet-LT, Places-LT, and iNaturalist, decoupled learning with instance-balanced representations and classifier balancing achieves state-of-the-art results, often surpassing methods that rely on specialized losses or memory modules. The work provides practical guidance for handling imbalanced data and suggests that simple, well-balanced classifiers can unlock strong tail-class performance. The accompanying code is released to facilitate reproducibility and adoption.

Abstract

The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem. Existing solutions usually involve class-balancing strategies, e.g., by loss re-weighting, data re-sampling, or transfer learning from head- to tail-classes, but most of them adhere to the scheme of jointly learning representations and classifiers. In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. The findings are surprising: (1) data imbalance might not be an issue in learning high-quality representations; (2) with representations learned with the simplest instance-balanced (natural) sampling, it is also possible to achieve strong long-tailed recognition ability by adjusting only the classifier. We conduct extensive experiments and set new state-of-the-art performance on common long-tailed benchmarks like ImageNet-LT, Places-LT and iNaturalist, showing that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification. Our code is available at https://github.com/facebookresearch/classifier-balancing.

Paper Structure

This paper contains 19 sections, 6 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: The performance of different classifiers for each split on ImageNet-LT with ResNeXt-50. Colored markers denote the sampling strategies used to learn the representations.
  • Figure 2: Left: Classifier weight norms for ImageNet-LT validation set when classes are sorted by descending values of $n_j$. Blue line: classifier weights learned with instance-balanced sampling. Green line: weights after fine-tuning with class-balanced sampling. Gold line: after $\tau$ normalization. Brown line: weights by learnable weight scaling. Right: Accuracy with different values of the normalization parameter $\tau$.
  • Figure 3: Sampling weights $p_j$ for ImageNet-LT. Classes are ordered with decreasing $n_j$ on the x-axis. Left: instance-balanced, class-balanced and square-root sampling. Right: Progressively-balanced sampling; as epochs progress, sampling goes from instance-balanced to class-balanced sampling.
  • Figure 4: Illustrations on different classifiers and their corresponding decision boundaries, where $w_i$ and $w_j$ denote the classification weight for class $i$ and $j$ respectively, $\mathcal{C}_i$ is the classification cone belongs to class $i$ in the feature space, $m_i$ is the feature mean for class $i$. From left to right: $\tau$-normalized classifiers with $\tau \rightarrow 0$: the classifier with larger weights have wider decision boundaries; $\tau$-normalized classifiers with $\tau \rightarrow 1$: the decision boundaries are more balanced for different classes; NCM with cosine-similarity whose decision boundary is independent of the classifier weights; NCM with Euclidean-similarity whose decision boundaries partition the feature space into Voronoi cells.
  • Figure 5: Accuracy on ImageNet-LT for different backbones