LayerMatch: Do Pseudo-labels Benefit All Layers?
Chaoqi Liang, Guanglei Yang, Lifeng Qiao, Zitong Huang, Hongliang Yan, Yunchao Wei, Wangmeng Zuo
TL;DR
LayerMatch challenges the assumption that pseudo-labels uniformly benefit all layers in SSL by revealing distinct learning dynamics between the feature extractor and the linear classifier. It introduces Grad-ReLU to block unsupervised-gradient influence on the classifier while preserving it for the feature extractor, and Avg-Clustering to EMA-smooth feature clustering centers, yielding a cohesive LayerMatch objective. Empirically, LayerMatch delivers consistent improvements across CIFAR-10/100, STL-10, and ImageNet-100, achieving an average gain of $2.44\%$ over SOTA and $10.38\%$ over FixMatch. This layer-aware approach highlights the importance of tailoring pseudo-label usage to the learning role of each network component, with practical impact on improving SSL performance under limited labeled data.
Abstract
Deep neural networks have achieved remarkable performance across various tasks when supplied with large-scale labeled data. However, the collection of labeled data can be time-consuming and labor-intensive. Semi-supervised learning (SSL), particularly through pseudo-labeling algorithms that iteratively assign pseudo-labels for self-training, offers a promising solution to mitigate the dependency of labeled data. Previous research generally applies a uniform pseudo-labeling strategy across all model layers, assuming that pseudo-labels exert uniform influence throughout. Contrasting this, our theoretical analysis and empirical experiment demonstrate feature extraction layer and linear classification layer have distinct learning behaviors in response to pseudo-labels. Based on these insights, we develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering. Grad-ReLU mitigates the impact of noisy pseudo-labels by removing the gradient detrimental effects of pseudo-labels in the linear classification layer. Avg-Clustering accelerates the convergence of feature extraction layer towards stable clustering centers by integrating consistent outputs. Our approach, LayerMatch, which integrates these two strategies, can avoid the severe interference of noisy pseudo-labels in the linear classification layer while accelerating the clustering capability of the feature extraction layer. Through extensive experimentation, our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks, achieving a significant improvement of 10.38% over baseline method and a 2.44% increase compared to state-of-the-art methods.
