Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

Chengcheng Ma; Ismail Elezi; Jiankang Deng; Weiming Dong; Changsheng Xu

Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

Chengcheng Ma, Ismail Elezi, Jiankang Deng, Weiming Dong, Changsheng Xu

TL;DR

This work tackles long-tailed semi-supervised learning (LTSSL) by addressing the mismatch between labeled and unlabeled data distributions that biases pseudo-labels toward head classes. It introduces ComPlementary Experts (CPE), a multi-head framework where three experts with distinct logit adjustments model different distribution shapes, complemented by Classwise Batch Normalization (CBN) to stabilize tail-class features. The method achieves state-of-the-art performance on CIFAR-10-LT, CIFAR-100-LT, and STL-10-LT across consistent, uniform, and inverse unlabeled distributions, with notable gains when distributions diverge. The approach demonstrates that combining distribution-specific experts with class-aware normalization yields more reliable pseudo-labels and improved representation learning in LTSSL, with only modest training-time overhead; code is released at the provided repository.

Abstract

We address the challenging problem of Long-Tailed Semi-Supervised Learning (LTSSL) where labeled data exhibit imbalanced class distribution and unlabeled data follow an unknown distribution. Unlike in balanced SSL, the generated pseudo-labels are skewed towards head classes, intensifying the training bias. Such a phenomenon is even amplified as more unlabeled data will be mislabeled as head classes when the class distribution of labeled and unlabeled datasets are mismatched. To solve this problem, we propose a novel method named ComPlementary Experts (CPE). Specifically, we train multiple experts to model various class distributions, each of them yielding high-quality pseudo-labels within one form of class distribution. Besides, we introduce Classwise Batch Normalization for CPE to avoid performance degradation caused by feature distribution mismatch between head and non-head classes. CPE achieves state-of-the-art performances on CIFAR-10-LT, CIFAR-100-LT, and STL-10-LT dataset benchmarks. For instance, on CIFAR-10-LT, CPE improves test accuracy by over 2.22% compared to baselines. Code is available at https://github.com/machengcheng2016/CPE-LTSSL.

Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

TL;DR

Abstract

Paper Structure (24 sections, 5 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 24 sections, 5 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Semi-supervised learning
Long-tailed learning
Long-tailed Semi-supervised learning
Methodology
Preliminaries
ComPlementary Experts (CPE)
Classwise Batch Normalization for CPE
Experiments
Experimental setting
Implementation details
Main results
Ablation studies and analysis
Conclusion
...and 9 more sections

Figures (7)

Figure 1: Comparison of F1 score of pseudo-label predictions between ACR ACR and our CPE method under the {"consistent", "uniform", "inverse"} cases. The dataset is CIFAR-10-LT with imbalance ratio $\gamma_l$ being 100. One of experts in our CPE can generate pseudo-labels with higher quality.
Figure 2: Overview of our CPE algorithm together with the CBN mechanism. I$_{\text{MT}}$ and I$_{\text{T}}$ represent the class mask in Eq. (\ref{['eq: unsup loss with CBN']}). $L_{bal}(\tau)$ is short for balanced cross entropy loss in Eq. (\ref{['eq: logit adjustment']}). $L_{ce}(\hat{y})$ denotes the cross entropy loss with pseudo-label $\hat{y}$.
Figure 3: Statistics of extractor features within head and tail classes in CIFAR-10-LT with $(\gamma_l,\gamma_u)=(100,1/100)$. Each dot represents the running mean and standard deviation of a channel in the BN layer. After pseudo-labeling, features of unlabeled data within tail classes have a higher variance and follow a different distribution from that of the head classes.
Figure 4: Case study: The first head trained with regular loss generates imbalanced pseudo-labels (low recall on tail classes), leads to a degeneration on the representation ability of tail classes, and finally hurt the test accuracy on tail classes. In contrast, low precision on head classes has much less impacts on the test accuracy.
Figure 5: T-SNE visualization of extracted features of unlabeled data in CIFAR-10-LT with $\gamma_l=100$ and $\gamma_u=1/100$. We can see that the Classwise Batch Normalization (CBN) mechanism can lead to more compact feature clusters, which can be classified easier (highlighted by red circles).
...and 2 more figures

Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

TL;DR

Abstract

Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)