DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

Jae Soon Baik; In Young Yoon; Jun Won Choi

DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

Jae Soon Baik, In Young Yoon, Jun Won Choi

TL;DR

This work tackles long-tailed visual recognition by addressing both minority-class underrepresentation and classifier bias. It introduces DBN-Mix, which couples bilateral mixup—synthetic samples formed from a uniform and a re-balanced data stream—with class-wise temperature scaling, integrated into a dual-branch network and trained end-to-end. Empirical results across CIFAR-LT, ImageNet-LT, and iNaturalist 2018 demonstrate substantial gains over baselines and competitive state-of-the-art performance, with ablations confirming the complementary contributions of bilateral mixup and per-class temperature scaling. The approach is simple to implement, scalable, and adaptable to single-branch architectures, offering practical impact for real-world long-tailed recognition tasks.

Abstract

There is growing interest in the challenging visual perception task of learning from long-tailed class distributions. The extreme class imbalance in the training dataset biases the model to prefer recognizing majority class data over minority class data. Furthermore, the lack of diversity in minority class samples makes it difficult to find a good representation. In this paper, we propose an effective data augmentation method, referred to as bilateral mixup augmentation, which can improve the performance of long-tailed visual recognition. The bilateral mixup augmentation combines two samples generated by a uniform sampler and a re-balanced sampler and augments the training dataset to enhance the representation learning for minority classes. We also reduce the classifier bias using class-wise temperature scaling, which scales the logits differently per class in the training phase. We apply both ideas to the dual-branch network (DBN) framework, presenting a new model, named dual-branch network with bilateral mixup (DBN-Mix). Experiments on popular long-tailed visual recognition datasets show that DBN-Mix improves performance significantly over baseline and that the proposed method achieves state-of-the-art performance in some categories of benchmarks.

DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

TL;DR

Abstract

Paper Structure (39 sections, 9 equations, 5 figures, 9 tables)

This paper contains 39 sections, 9 equations, 5 figures, 9 tables.

Introduction
Related Work
Re-sampling and Re-weighting.
Two-stage Training Strategy.
Ensemble-based Approach
Data Augmentation.
Proposed Method
Overview of DBN-Mix
Bilateral Mixup Augmentation
Class-wise Temperature Scaling
Model Inference
Application to Single-Branch Network
Experiments
Long-Tailed Recognition Datasets.
Long-tailed CIFAR
...and 24 more sections

Figures (5)

Figure 1: Outputs of the classifiers trained by (a) empirical risk minimization (ERM), (b) mixup, and (c) bilateral mixup: Three-layer neural networks were trained on the imbalanced two half-moon dataset with the imbalance ratio of 100. Red and blue cross marks correspond to minority class samples and majority class samples, respectively. The black cross marks indicate the training samples generated by the data augmentation.
Figure 2: Overview of the proposed method: Our method use two bilateral mixup samples $(\hat{x}_c, \hat{y}_c)$ and $(\hat{x}_r, \hat{y}_r)$ to train two branch networks, conventional learning branch and re-balancing branch. In the training phase, we use $(\hat{x}_c, \hat{y}_c)$ for conventional learning branch and $(\hat{x}_r, \hat{y}_r)$ for re-balancing branch. For the inference phase, two prediction logits from each branch are averaged to return the final output.
Figure 3: Performance versus hyperparameters: (a) $\eta$ and $\epsilon$ for temperature scaling, (b) $\alpha$ for bilateral mixup augmentation, and (c) $\gamma$ for re-balanced sampler. CIFAR-LT-10 (100) dataset was used for evaluation.
Figure 4: Performance versus hyperparameters: (a) $\eta$ and $\epsilon$ for temperature scaling, (b) $\alpha$ for bilateral mixup augmentation, and (c) $\gamma$ for re-balanced sampler. CIFAR-LT-100 (100) dataset was used for evaluation.
Figure 5: T-SNE maaten2008visualizing illustrates the feature at the penultimate layer of the conventional learning branch and re-balancing branch. The feature of the conventional learning branch (first column) and the re-balancing branch (second column) in DBN are trained by DBN-Mix (first row) and BBN (second row) zhou2020bbn, respectively.

DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

TL;DR

Abstract

DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation for Long-Tailed Visual Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (5)