Table of Contents
Fetching ...

Class-Balanced Loss Based on Effective Number of Samples

Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge Belongie

TL;DR

The paper tackles long-tailed data distributions by introducing the effective number of samples, a measure that accounts for data overlap and diminishing returns as more samples are added. It designs a class-balanced loss that re-weights each class by the inverse of its effective sample count, parameterized by β to smoothly transition from no re-weighting to inverse-frequency weighting. The authors instantiate this framework across softmax CE, sigmoid CE, and focal loss, and demonstrate significant performance gains on long-tailed CIFAR and large-scale datasets like iNaturalist and ImageNet. This approach provides a generic, loss-agnostic mechanism to mitigate class imbalance without heavy re-sampling, with practical impact for real-world, highly imbalanced vision tasks.

Abstract

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula $(1-β^{n})/(1-β)$, where $n$ is the number of samples and $β\in [0,1)$ is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

Class-Balanced Loss Based on Effective Number of Samples

TL;DR

The paper tackles long-tailed data distributions by introducing the effective number of samples, a measure that accounts for data overlap and diminishing returns as more samples are added. It designs a class-balanced loss that re-weights each class by the inverse of its effective sample count, parameterized by β to smoothly transition from no re-weighting to inverse-frequency weighting. The authors instantiate this framework across softmax CE, sigmoid CE, and focal loss, and demonstrate significant performance gains on long-tailed CIFAR and large-scale datasets like iNaturalist and ImageNet. This approach provides a generic, loss-agnostic mechanism to mitigate class imbalance without heavy re-sampling, with practical impact for real-world, highly imbalanced vision tasks.

Abstract

With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula , where is the number of samples and is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.

Paper Structure

This paper contains 15 sections, 1 theorem, 13 equations, 10 figures, 3 tables.

Key Result

Proposition 1

$E_n = (1-\beta^n)/(1-\beta)$, where $\beta = (N-1)/N$.

Figures (10)

  • Figure 1: Two classes, one from the head and one from the tail of a long-tailed dataset (iNaturalist 2017 inaturalist in this example), have drastically different number of samples. Models trained on these samples are biased toward dominant classes (black solid line). Re-weighing the loss by inverse class frequency usually yields poor performance (red dashed line) on real-world data with high class imbalance. We propose a theoretical framework to quantify the effective number of samples by taking data overlap into consideration. A class-balanced term is designed to re-weight the loss by inverse effective number of samples. We show in experiments that the performance of a model can be improved when trained with the proposed class-balanced loss (blue dashed line).
  • Figure 2: Giving the set of all possible data with volume $N$ and the set of previously sampled data, a new sample with volume $1$ has the probability of $p$ being overlapped with previous data and the probability of $1-p$ not being overlapped.
  • Figure 3: Visualization of the proposed class-balanced term $(1 - \beta)/(1 - \beta^{n_y})$, where $n_y$ is the number of samples in the ground-truth class. Both axes are in log scale. For a long-tailed dataset where major classes have significantly more samples than minor classes, setting $\beta$ properly re-balances the relative loss across classes and reduces the drastic imbalance of re-weighing by inverse class frequency.
  • Figure 4: Number of training samples per class in artificially created long-tailed CIFAR-100 datasets with different imbalance factors.
  • Figure 5: Classification error rate when trained with and without the class-balanced term. On CIFAR-10, class-balanced loss yields consistent improvement across different $\beta$ and the larger the $\beta$ is, the larger the improvement is. On CIFAR-100, $\beta = 0.99$ or $\beta = 0.999$ improves the original loss, whereas a larger $\beta$ hurts the performance.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Definition 1: Effective Number
  • Proposition 1: Effective Number
  • proof
  • proof