Identifying Key Challenges of Hardness-Based Resampling
Pawel Pukowski, Venet Osmani
TL;DR
This work interrogates hardness-based resampling as a means to reduce class-wise performance disparities by aligning class training data with estimated hardness via sample complexity. It assesses model-based hardness estimators (AUM, EL2N, Forgetting) on CIFAR-10/100, implementing undersampling and four oversampling schemes with a tunable imbalance factor, and evaluates robustness across ensemble sizes. Across extensive experiments, hardness-based resampling yields negligible, non-systematic gains in class-level performance or gap reduction on balanced data, challenging the practical applicability of the approach. The authors identify core obstacles—no ground-truth hardness, instability of hardness rankings across estimators and datasets, and oversampling's limited expressive power—and demonstrate, in a pruning case study, that carefully structured imbalance can sometimes improve overall accuracy and fairness, pointing to promising directions such as advanced data generation and alternative regularization strategies.
Abstract
Performance gap across classes remains a persistent challenge in machine learning, often attributed to variations in class hardness. One way to quantify class hardness is through sample complexity - the minimum number of samples required to effectively learn a given class. Sample complexity theory suggests that class hardness is driven by differences in the amount of data required for generalization. That is, harder classes need substantially more samples to achieve generalization. Therefore, hardness-based resampling is a promising approach to mitigate these performance disparities. While resampling has been studied extensively in data-imbalanced settings, its impact on balanced datasets remains unexplored. This raises the fundamental question whether resampling is effective because it addresses data imbalance or hardness imbalance. We begin addressing this question by introducing class imbalance into balanced datasets and evaluate its effect on performance disparities. We oversample hard classes and undersample easy classes to bring hard classes closer to their sample complexity requirements while maintaining a constant dataset size for fairness. We estimate class-level hardness using the Area Under the Margin (AUM) hardness estimator and leverage it to compute resampling ratios. Using these ratios, we perform hardness-based resampling on the well-known CIFAR-10 and CIFAR-100 datasets. Contrary to theoretical expectations, our results show that hardness-based resampling does not meaningfully affect class-wise performance disparities. To explain this discrepancy, we conduct detailed analyses to identify key challenges unique to hardness-based imbalance, distinguishing it from traditional data-based imbalance. Our insights help explain why theoretical sample complexity expectations fail to translate into practical performance gains and we provide guidelines for future research.
