Class Uncertainty: A Measure to Mitigate Class Imbalance
Z. S. Baltaci, K. Oksuz, S. Kuzucu, K. Tezoren, B. K. Konar, A. Ozkan, E. Akbas, S. Kalkan
TL;DR
This work argues that class imbalance cannot be fully captured by class cardinality alone and introduces Class Uncertainty, defined as the average predictive uncertainty of training examples estimated via Deep Ensembles. By aggregating per-example uncertainty into class-level measures, the authors demonstrate that Class Uncertainty correlates more strongly with class-wise difficulty and is robust to duplicates, enabling it to guide resampling, reweighting, margin adjustment, and multi-stage training across ten methods. Extensive experiments on long-tailed CIFAR-10/100 and a semantically imbalanced SVCI-20 dataset show that incorporating Class Uncertainty yields substantial or competitive gains versus cardinality-based baselines, with notable improvements on hard, under-represented or semantically difficult classes. The results suggest a practical path to more robust imbalance mitigation by focusing on predictive uncertainty, and the authors provide code and datasets to support further research, despite the higher computational cost of Deep Ensembles.
Abstract
Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available.
