Table of Contents
Fetching ...

Mix from Failure: Confusion-Pairing Mixup for Long-Tailed Recognition

Youngseok Yoon, Sangwoo Hong, Hyungjun Joo, Yao Qin, Haewon Jeong, Jungwoo Lee

TL;DR

This work tackles long-tailed image recognition by addressing model confusion rather than solely adjusting losses or architectures. It introduces Confusion-Pairing Mixup (CP-Mix), which estimates real-time confusion distributions and augments data by mixing samples from confusion pairs, with an imbalance-aware labeling strategy. Through extensive experiments on CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist 2018, CP-Mix consistently improves minority-class performance and reduces misclassification between confusing class pairs, while remaining compatible with ensemble methods. The approach is simple to implement, model-agnostic, and offers a practical augmentation technique to enhance generalization under severe class imbalance.

Abstract

Long-tailed image recognition is a computer vision problem considering a real-world class distribution rather than an artificial uniform. Existing methods typically detour the problem by i) adjusting a loss function, ii) decoupling classifier learning, or iii) proposing a new multi-head architecture called experts. In this paper, we tackle the problem from a different perspective to augment a training dataset to enhance the sample diversity of minority classes. Specifically, our method, namely Confusion-Pairing Mixup (CP-Mix), estimates the confusion distribution of the model and handles the data deficiency problem by augmenting samples from confusion pairs in real-time. In this way, CP-Mix trains the model to mitigate its weakness and distinguish a pair of classes it frequently misclassifies. In addition, CP-Mix utilizes a novel mixup formulation to handle the bias in decision boundaries that originated from the imbalanced dataset. Extensive experiments demonstrate that CP-Mix outperforms existing methods for long-tailed image recognition and successfully relieves the confusion of the classifier.

Mix from Failure: Confusion-Pairing Mixup for Long-Tailed Recognition

TL;DR

This work tackles long-tailed image recognition by addressing model confusion rather than solely adjusting losses or architectures. It introduces Confusion-Pairing Mixup (CP-Mix), which estimates real-time confusion distributions and augments data by mixing samples from confusion pairs, with an imbalance-aware labeling strategy. Through extensive experiments on CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist 2018, CP-Mix consistently improves minority-class performance and reduces misclassification between confusing class pairs, while remaining compatible with ensemble methods. The approach is simple to implement, model-agnostic, and offers a practical augmentation technique to enhance generalization under severe class imbalance.

Abstract

Long-tailed image recognition is a computer vision problem considering a real-world class distribution rather than an artificial uniform. Existing methods typically detour the problem by i) adjusting a loss function, ii) decoupling classifier learning, or iii) proposing a new multi-head architecture called experts. In this paper, we tackle the problem from a different perspective to augment a training dataset to enhance the sample diversity of minority classes. Specifically, our method, namely Confusion-Pairing Mixup (CP-Mix), estimates the confusion distribution of the model and handles the data deficiency problem by augmenting samples from confusion pairs in real-time. In this way, CP-Mix trains the model to mitigate its weakness and distinguish a pair of classes it frequently misclassifies. In addition, CP-Mix utilizes a novel mixup formulation to handle the bias in decision boundaries that originated from the imbalanced dataset. Extensive experiments demonstrate that CP-Mix outperforms existing methods for long-tailed image recognition and successfully relieves the confusion of the classifier.

Paper Structure

This paper contains 43 sections, 16 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) The model's tendency to misclassify the examples in the minorities (truck, ship, and dog) to their similar majorities (car, plane, and cat). The model captures the similarities between classes and wrongly exploits its capacity. (b) The confusion matrix of the model trained using the imbalanced dataset. $x$ and $y$ axis denote true labels and predictions, respectively, and each number in a cell denotes the number of predictions. There are clear relationships between semantically similar classes.
  • Figure 2: (a) 2D 4-way toy example with 1000 and 50 samples in majority and minority classes. The majority classes red and purple are similar to minorities grey and yellow, respectively. (b-d) Decision boundaries of the classifiers on a balanced test dataset. (b, c) The ERM classifier has a biased boundary toward minorities, and the Mixup classifier has a more restricted region for minorities since Mixup mainly occurs between and within red and purple points. (d) The ERM classifier regularized by a mixup objective where similar majority and minority classes are mixed. Mixup occurs between red and grey points or purple and yellow points for the regularization. It successfully improves the decision boundary while maintaining the structure of the boundary. (e) Accuracy and confusion of ERM classifier as imbalance factors vary. Confusion denotes the sum of confusion values between two pairs of adjacent majority and minority, grey points in red region and yellow points in purple region. Although Mixup improves generalization in small imbalances, it fails as the imbalance factor increases.
  • Figure 3: (a, b) Confusion matrices of ERM and Mixup classifiers trained on CIFAR100-LT dataset, respectively. The imbalance factor is 200, and classes are grouped into 10 subgroups for better visualization. Mixup exacerbates the confusion of model, and only improves the generalization of majorities. (c) Histogram of confusion values between pairs of classes for balanced CIFAR100 and CIFAR100-LT-200 datasets. Among 100 samples in each class, the maximum confusion value between two classes increases from 20 to more than 50 as $\rho$ increases to $200$.
  • Figure 4: (a) Confusion matrix of the CP-Mix classifier trained on CIFAR100-LT-200 dataset. It successfully reduces high confusion in the upper right region by sacrificing low confusion in the lower left region, which indicates the number of majorities misclassified as minorities. This results in more balanced accuracies among categories. (b, c) Class-wise accuracy of the classifiers trained on CIFAR100-LT-200 and CIFAR10-LT-200 datasets. CP-Mix significantly improves the accuracies on minority classes, resulting in more balanced sub-group accuracies.
  • Figure 5: Confusion matrices of the ERM, Mixup and CP-Mix classifiers trained on CIFAR100-LT datasets.
  • ...and 1 more figures