Table of Contents
Fetching ...

Semi-Supervised Learning with Multi-Head Co-Training

Mingcai Chen, Yuntao Du, Yi Zhang, Shuwei Qian, Chongjun Wang

TL;DR

This work tackles the practicality barrier of single-view co-training in semi-supervised learning by introducing Multi-Head Co-Training, which replaces multiple separate models with a single shared feature extractor and multiple classification heads. Diversity among heads is induced implicitly via a strong augmentation regime (Weak vs. Strong Augmentation) and pseudo-labels are produced from peer-head predictions to reduce confirmation bias, with an EMA ensemble used for stable evaluation. The approach achieves state-of-the-art or competitive results on CIFAR-10/100, SVHN, and Mini-ImageNet while greatly reducing parameter count and training time compared with traditional co-training, and it includes a calibration analysis showing improved probability estimates. The method's efficiency and robustness suggest it can scale SSL to more realistic settings and modalities, with potential extensions to other data domains through modality-agnostic augmentation.

Abstract

Co-training, extended from self-training, is one of the frameworks for semi-supervised learning. Without natural split of features, single-view co-training works at the cost of training extra classifiers, where the algorithm should be delicately designed to prevent individual classifiers from collapsing into each other. To remove these obstacles which deter the adoption of single-view co-training, we present a simple and efficient algorithm Multi-Head Co-Training. By integrating base learners into a multi-head structure, the model is in a minimal amount of extra parameters. Every classification head in the unified model interacts with its peers through a "Weak and Strong Augmentation" strategy, in which the diversity is naturally brought by the strong data augmentation. Therefore, the proposed method facilitates single-view co-training by 1). promoting diversity implicitly and 2). only requiring a small extra computational overhead. The effectiveness of Multi-Head Co-Training is demonstrated in an empirical study on standard semi-supervised learning benchmarks.

Semi-Supervised Learning with Multi-Head Co-Training

TL;DR

This work tackles the practicality barrier of single-view co-training in semi-supervised learning by introducing Multi-Head Co-Training, which replaces multiple separate models with a single shared feature extractor and multiple classification heads. Diversity among heads is induced implicitly via a strong augmentation regime (Weak vs. Strong Augmentation) and pseudo-labels are produced from peer-head predictions to reduce confirmation bias, with an EMA ensemble used for stable evaluation. The approach achieves state-of-the-art or competitive results on CIFAR-10/100, SVHN, and Mini-ImageNet while greatly reducing parameter count and training time compared with traditional co-training, and it includes a calibration analysis showing improved probability estimates. The method's efficiency and robustness suggest it can scale SSL to more realistic settings and modalities, with potential extensions to other data domains through modality-agnostic augmentation.

Abstract

Co-training, extended from self-training, is one of the frameworks for semi-supervised learning. Without natural split of features, single-view co-training works at the cost of training extra classifiers, where the algorithm should be delicately designed to prevent individual classifiers from collapsing into each other. To remove these obstacles which deter the adoption of single-view co-training, we present a simple and efficient algorithm Multi-Head Co-Training. By integrating base learners into a multi-head structure, the model is in a minimal amount of extra parameters. Every classification head in the unified model interacts with its peers through a "Weak and Strong Augmentation" strategy, in which the diversity is naturally brought by the strong data augmentation. Therefore, the proposed method facilitates single-view co-training by 1). promoting diversity implicitly and 2). only requiring a small extra computational overhead. The effectiveness of Multi-Head Co-Training is demonstrated in an empirical study on standard semi-supervised learning benchmarks.

Paper Structure

This paper contains 22 sections, 11 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Diagram of Multi-Head Co-Training with three heads. Images are fed into a shared module (blue box) followed by three classification heads (green boxes). Among them, weakly augmented images (orange lines) are for pseudo-labeling. The pseudo-labels are to guide the predictions on strongly augmented examples (red line). Here, pseudo-labels for the bottom head are generated and selected according to the other two heads’ predicted classes on the weakly augmentation images. Note that only the co-training process of the bottom head is shown here. The weakly and strongly augmented images are in fact simultaneously fed into all three heads.
  • Figure 2: The error rate and the number of parameters brought by different heads.
  • Figure 3: Reliability diagrams (top) and confidence histograms (bottom). The models are trained on CIFAR-100 with 10000 labels, and their predictions on the test set are grouped into 10 interval bins (horizontal axis). Reliability diagram presents the true accuracy, the expected accuracy, and the gap between them of each bin. confidence histogram presents the percentage of examples that falls into each bin. The accuracy and average confidence are indicated by the solid and dashed lines, respectively.