BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng, Linyuan Zhou, Han Li, Jinming Su, Xiaoming Wei, Xiaoming Xu
TL;DR
The paper addresses LTSSL by tackling both data quantity imbalance and class-wise uncertainty. It proposes Balanced and Entropy-based Mix (BEM), which combines CamMix for localized data mixing, a Class Balanced Mix Bank (CBMB) driven by the effective number $E_c$, and an entropy-based learning (EL) module to balance per-class uncertainty through entropy-based sampling, masking, and a class-balanced loss. Key contributions include the CAM-based mixing region (CamMix), a principled sampling scheme leveraging $E_c$ and EMA-estimated class distributions, and an entropy-integrated training objective that jointly accounts for data quantity and uncertainty. Empirically, BEM consistently improves LTSSL baselines across CIFAR10/100-LT, STL10-LT, and ImageNet-127, achieving state-of-the-art results and proving its versatility as a complementary component to existing re-balancing methods.
Abstract
Data mixing methods play a crucial role in semi-supervised learning (SSL), but their application is unexplored in long-tailed semi-supervised learning (LTSSL). The primary reason is that the in-batch mixing manner fails to address class imbalance. Furthermore, existing LTSSL methods mainly focus on re-balancing data quantity but ignore class-wise uncertainty, which is also vital for class balance. For instance, some classes with sufficient samples might still exhibit high uncertainty due to indistinguishable features. To this end, this paper introduces the Balanced and Entropy-based Mix (BEM), a pioneering mixing approach to re-balance the class distribution of both data quantity and uncertainty. Specifically, we first propose a class balanced mix bank to store data of each class for mixing. This bank samples data based on the estimated quantity distribution, thus re-balancing data quantity. Then, we present an entropy-based learning approach to re-balance class-wise uncertainty, including entropy-based sampling strategy, entropy-based selection module, and entropy-based class balanced loss. Our BEM first leverages data mixing for improving LTSSL, and it can also serve as a complement to the existing re-balancing methods. Experimental results show that BEM significantly enhances various LTSSL frameworks and achieves state-of-the-art performances across multiple benchmarks.
