Semi-Supervised Learning with Multi-Head Co-Training
Mingcai Chen, Yuntao Du, Yi Zhang, Shuwei Qian, Chongjun Wang
TL;DR
This work tackles the practicality barrier of single-view co-training in semi-supervised learning by introducing Multi-Head Co-Training, which replaces multiple separate models with a single shared feature extractor and multiple classification heads. Diversity among heads is induced implicitly via a strong augmentation regime (Weak vs. Strong Augmentation) and pseudo-labels are produced from peer-head predictions to reduce confirmation bias, with an EMA ensemble used for stable evaluation. The approach achieves state-of-the-art or competitive results on CIFAR-10/100, SVHN, and Mini-ImageNet while greatly reducing parameter count and training time compared with traditional co-training, and it includes a calibration analysis showing improved probability estimates. The method's efficiency and robustness suggest it can scale SSL to more realistic settings and modalities, with potential extensions to other data domains through modality-agnostic augmentation.
Abstract
Co-training, extended from self-training, is one of the frameworks for semi-supervised learning. Without natural split of features, single-view co-training works at the cost of training extra classifiers, where the algorithm should be delicately designed to prevent individual classifiers from collapsing into each other. To remove these obstacles which deter the adoption of single-view co-training, we present a simple and efficient algorithm Multi-Head Co-Training. By integrating base learners into a multi-head structure, the model is in a minimal amount of extra parameters. Every classification head in the unified model interacts with its peers through a "Weak and Strong Augmentation" strategy, in which the diversity is naturally brought by the strong data augmentation. Therefore, the proposed method facilitates single-view co-training by 1). promoting diversity implicitly and 2). only requiring a small extra computational overhead. The effectiveness of Multi-Head Co-Training is demonstrated in an empirical study on standard semi-supervised learning benchmarks.
