Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

Payam Karisani

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

Payam Karisani

TL;DR

This work tackles semi-supervised text classification under limited labeled data by addressing semantic drift and overconfident pseudo-labels in self-training. It introduces Robust Self-Training (RST), which uses a hierarchical, iteration-aware pseudo-label strategy and a subsampling-based Score(d) that accounts for prediction uncertainty to select high-quality pseudo-labels. Empirical results on five benchmarks show that RST outperforms ten baselines and provides additive gains when combined with domain-specific language model pretraining, highlighting its practical value for data-scarce NLP tasks. The approach offers a general framework for robust semi-supervised learning with clear mechanisms to stabilize bootstrapping and calibrate confidence in predictions, with potential extensions to cross-lingual scenarios.

Abstract

We propose a semi-supervised text classifier based on self-training using one positive and one negative property of neural networks. One of the weaknesses of self-training is the semantic drift problem, where noisy pseudo-labels accumulate over iterations and consequently the error rate soars. In order to tackle this challenge, we reshape the role of pseudo-labels and create a hierarchical order of information. In addition, a crucial step in self-training is to use the classifier confidence prediction to select the best candidate pseudo-labels. This step cannot be efficiently done by neural networks, because it is known that their output is poorly calibrated. To overcome this challenge, we propose a hybrid metric to replace the plain confidence measurement. Our metric takes into account the prediction uncertainty via a subsampling technique. We evaluate our model in a set of five standard benchmarks, and show that it significantly outperforms a set of ten diverse baseline models. Furthermore, we show that the improvement achieved by our model is additive to language model pretraining, which is a widely used technique for using unlabeled documents. Our code is available at https://github.com/p-karisani/RST.

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

TL;DR

Abstract

Paper Structure (14 sections, 5 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 14 sections, 5 equations, 3 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Proposed Method
Overcoming Semantic Drift
Addressing Overconfidence
Computational Complexity
Experimental Setup
Datasets
Baselines
Experimental Details
Results and Analysis
Main Results
Empirical Analysis
Conclusions

Figures (3)

Figure 1: \ref{['fig:curve-unlabeled']}) F1 of RST and Self-pretraining at varying unlabeled set sizes. \ref{['fig:subsample-ratio']}) The sensitivity of RST to the sample ratio.
Figure 2: The sensitivity of RST to the number of classifiers. We see that our model reaches the highest performance when three classifiers are used.
Figure 3: \ref{['fig:lambda']}) The sensitivity of RST to the penalty term $\lambda$. \ref{['fig:convergence']}) The convergence rate of RST when we use the regular cross entropy instead of our loss function. The modified method is denoted by RST (CE).

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

TL;DR

Abstract

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

Authors

TL;DR

Abstract

Table of Contents

Figures (3)