Table of Contents
Fetching ...

Reinforcement Learning-Guided Semi-Supervised Learning

Marzi Heidari, Hanping Zhang, Yuhong Guo

TL;DR

This work tackles the challenge of limited labeled data in semi-supervised learning by introducing RL-Guided SSL (RLGSSL), which formulates SSL as a one-armed bandit and uses a mixup-based reward to adaptively guide pseudo-labeling. A KL-weighted RL loss, combined with a teacher-student EMA framework and standard supervised/consistency losses, enables dynamic, data-driven learning that balances labeled and unlabeled information. Extensive experiments on CIFAR-10, CIFAR-100, and SVHN show RLGSSL achieving consistent improvements over state-of-the-art SSL methods, especially in low-label regimes, across CNN-13 and WRN-28 backbones. The method demonstrates that reinforcement learning can effectively steer pseudo-label generation and learning stability in SSL, offering a scalable and robust approach for real-world semi-supervised problems.

Abstract

In recent years, semi-supervised learning (SSL) has gained significant attention due to its ability to leverage both labeled and unlabeled data to improve model performance, especially when labeled data is scarce. However, most current SSL methods rely on heuristics or predefined rules for generating pseudo-labels and leveraging unlabeled data. They are limited to exploiting loss functions and regularization methods within the standard norm. In this paper, we propose a novel Reinforcement Learning (RL) Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem and deploys an innovative RL loss based on weighted reward to adaptively guide the learning process of the prediction model. RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance. A semi-supervised teacher-student framework is further deployed to increase the learning stability. We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.

Reinforcement Learning-Guided Semi-Supervised Learning

TL;DR

This work tackles the challenge of limited labeled data in semi-supervised learning by introducing RL-Guided SSL (RLGSSL), which formulates SSL as a one-armed bandit and uses a mixup-based reward to adaptively guide pseudo-labeling. A KL-weighted RL loss, combined with a teacher-student EMA framework and standard supervised/consistency losses, enables dynamic, data-driven learning that balances labeled and unlabeled information. Extensive experiments on CIFAR-10, CIFAR-100, and SVHN show RLGSSL achieving consistent improvements over state-of-the-art SSL methods, especially in low-label regimes, across CNN-13 and WRN-28 backbones. The method demonstrates that reinforcement learning can effectively steer pseudo-label generation and learning stability in SSL, offering a scalable and robust approach for real-world semi-supervised problems.

Abstract

In recent years, semi-supervised learning (SSL) has gained significant attention due to its ability to leverage both labeled and unlabeled data to improve model performance, especially when labeled data is scarce. However, most current SSL methods rely on heuristics or predefined rules for generating pseudo-labels and leveraging unlabeled data. They are limited to exploiting loss functions and regularization methods within the standard norm. In this paper, we propose a novel Reinforcement Learning (RL) Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem and deploys an innovative RL loss based on weighted reward to adaptively guide the learning process of the prediction model. RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance. A semi-supervised teacher-student framework is further deployed to increase the learning stability. We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.
Paper Structure (20 sections, 7 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 7 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of the RLGSSL Framework. The prediction networks ($\theta_S, \theta_T$) serve as the policy functions, and pseudo-labeling ($P_{\theta_T}(X^u)$) acts as the actions. The model has three loss terms in total: RL loss ($\mathcal{L}_{\text{rl}}$), supervised loss ($\mathcal{L}_{\text{sup}}$), and consistency loss ($\mathcal{L}_{\text{cons}}$). The teacher policy function is used to execute the actions and compute the consistency loss, while the student policy function is used for all other aspects.
  • Figure 2: Sensitivity analysis for four hyperparameters $\lambda_1$ and $\lambda_2$ CIFAR-100 using 10000 labeled samples (a) $\lambda_1$, (b) $\lambda_2$.