Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning
Kai Gan, Tong Wei, Min-Ling Zhang
TL;DR
BOAT tackles long-tailed semi-supervised learning with unlabeled data from unknown distributions by employing a dual-branch framework: a standard branch that concentrates on head-class accuracy and a balanced branch that mitigates tail bias via logit adjustment. Through training-time interactions—alignment of the standard branch to a decoupled unlabeled distribution and the balanced branch to pseudo-labels derived from the standard branch, plus a post-hoc enhancement at inference—BOAT achieves robust performance across diverse unlabeled data distributions. Extensive experiments on CIFAR-10-LT, CIFAR-100-LT, STL10-LT, and ImageNet-127 show state-of-the-art gains, particularly under distribution mismatch, and ablations confirm the contributions of alignment, balanced pseudo-labels, and weighting. The work also demonstrates the potential of PEFT-based fine-tuning for further improvements, indicating practical relevance for real-world LTSSL tasks and future research directions.
Abstract
While long-tailed semi-supervised learning (LTSSL) has received tremendous attention in many real-world classification problems, existing LTSSL algorithms typically assume that the class distributions of labeled and unlabeled data are almost identical. Those LTSSL algorithms built upon the assumption can severely suffer when the class distributions of labeled and unlabeled data are mismatched since they utilize biased pseudo-labels from the model. To alleviate this problem, we propose a new simple method that can effectively utilize unlabeled data from unknown class distributions through Boosting cOnsistency in duAl Training (BOAT). Specifically, we construct the standard and balanced branch to ensure the performance of the head and tail classes, respectively. Throughout the training process, the two branches incrementally converge and interact with each other, eventually resulting in commendable performance across all classes. Despite its simplicity, we show that BOAT achieves state-of-the-art performance on a variety of standard LTSSL benchmarks, e.g., an averaged 2.7% absolute increase in test accuracy against existing algorithms when the class distributions of labeled and unlabeled data are mismatched. Even when the class distributions are identical, BOAT consistently outperforms many sophisticated LTSSL algorithms. We carry out extensive ablation studies to tease apart the factors that are the most important to the success of BOAT. The source code is available at https://github.com/Gank0078/BOAT.
