MaskMatch: Boosting Semi-Supervised Learning Through Mask Autoencoder-Driven Feature Learning

Wenjin Zhang; Keyi Li; Sen Yang; Chenyang Gao; Wanzhao Yang; Sifan Yuan; Ivan Marsic

MaskMatch: Boosting Semi-Supervised Learning Through Mask Autoencoder-Driven Feature Learning

Wenjin Zhang, Keyi Li, Sen Yang, Chenyang Gao, Wanzhao Yang, Sifan Yuan, Ivan Marsic

TL;DR

MaskMatch tackles the underutilization of unlabeled data in semi-supervised learning by integrating a Masked Autoencoder (MAE) reconstruction objective and synthetic data training with a class-specific threshold. It leverages all unlabeled data, including uncertain samples, to learn more robust representations and better decision boundaries, achieving state-of-the-art error rates on CIFAR-100 (2 labels/class), STL-10 (4 labels/class), and Euro-SAT (2 labels/class). Ablation studies show MAE and SDT contribute substantially to accuracy with modest additional computation. The results highlight the value of combining self-supervised MAE signals with SSL and suggest directions for simplifying and extending the approach to other architectures.

Abstract

Conventional methods in semi-supervised learning (SSL) often face challenges related to limited data utilization, mainly due to their reliance on threshold-based techniques for selecting high-confidence unlabeled data during training. Various efforts (e.g., FreeMatch) have been made to enhance data utilization by tweaking the thresholds, yet none have managed to use 100% of the available data. To overcome this limitation and improve SSL performance, we introduce \algo, a novel algorithm that fully utilizes unlabeled data to boost semi-supervised learning. \algo integrates a self-supervised learning strategy, i.e., Masked Autoencoder (MAE), that uses all available data to enforce the visual representation learning. This enables the SSL algorithm to leverage all available data, including samples typically filtered out by traditional methods. In addition, we propose a synthetic data training approach to further increase data utilization and improve generalization. These innovations lead \algo to achieve state-of-the-art results on challenging datasets. For instance, on CIFAR-100 with 2 labels per class, STL-10 with 4 labels per class, and Euro-SAT with 2 labels per class, \algo achieves low error rates of 18.71%, 9.47%, and 3.07%, respectively. The code will be made publicly available.

MaskMatch: Boosting Semi-Supervised Learning Through Mask Autoencoder-Driven Feature Learning

TL;DR

Abstract

Paper Structure (15 sections, 9 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 5 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Semi-supervised Learning
Self-supervised Learning
Preliminaries
MaskMatch
MAE Reconstruction Loss
Synthetic Data Training
Modified Class-specific Threshold
Experiment
Setup
Overall Result
Ablation Study of Loss Terms
Configuration Study for MAE
Conclusion and Future Work

Figures (5)

Figure 1: Conventional SSL algorithms with confidence-based thresholding only make use of a fraction of unlabeled images, as they filter data based on predefined thresholds.
Figure 2: Unlabeled data utilization ratio of popular SSL algorithms on CIFAR-100 dataset with 2 labels per class
Figure 3: Diagram of MaskMatch. An in-training model (i.e., Transformer) is trained with three loss terms from unlabeled images. First, the unsupervised loss is defined as the divergence between class probabilities generated from strongly augmented images and the corresponding predictions aligned with one-hot pseudo-labels derived from weakly augmented confident images. Second, MAE reconstruction loss is computed from all images. We patchify and randomly mask the images. The in-training model without an MLP classifier head (i.e., encoder) and decoder are trained by reconstructing these masked patches. The decoder is an auxiliary transformer for assisting encoder training only. Lastly, synthetic data loss is calculated by training the model on a synthetic dataset, a mixture of unlabeled and labeled images.
Figure 4: Data utilization on CIFAR-100 with 2 lables per class. MaskMatch-Act. and MaskMatch-Theor. represent the actual and theoretical data utilization, respectively.
Figure 5: Error rate on Euro-SAT with 2 labels per class and Semi-Aves when varying the masking ratio. A suitable masking ratio is related to image size.

MaskMatch: Boosting Semi-Supervised Learning Through Mask Autoencoder-Driven Feature Learning

TL;DR

Abstract

MaskMatch: Boosting Semi-Supervised Learning Through Mask Autoencoder-Driven Feature Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)