Table of Contents
Fetching ...

Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

Qianren Mao, Weifeng Jiang, Junnan Liu, Chenghua Lin, Qian Li, Xianqing Wen, Jianxin Li, Jinhu Lu

TL;DR

PS-NET tackles semi-supervised learning for lightweight NLP models under severe label scarcity by combining online distillation from a teacher, mutual learning among multiple small student networks, and curriculum adversarial perturbations to progressively generalize. The framework distills knowledge from a deeper teacher into two-layer or small student networks and trains them jointly on labeled and unlabeled data, reinforced by peer interaction and self-augmentation. Empirical results on SSL text classification and extractive summarization show PS-NET outperforms state-of-the-art lightweight SSL baselines such as FLiText and DisCo, while using far fewer parameters. This approach enables efficient deployment on resource-constrained devices and provides a scalable path for SSL in compact models across tasks.

Abstract

The semi-supervised learning (SSL) strategy in lightweight models requires reducing annotated samples and facilitating cost-effective inference. However, the constraint on model parameters, imposed by the scarcity of training labels, limits the SSL performance. In this paper, we introduce PS-NET, a novel framework tailored for semi-supervised text mining with lightweight models. PS-NET incorporates online distillation to train lightweight student models by imitating the Teacher model. It also integrates an ensemble of student peers that collaboratively instruct each other. Additionally, PS-NET implements a constant adversarial perturbation schema to further self-augmentation by progressive generalizing. Our PS-NET, equipped with a 2-layer distilled BERT, exhibits notable performance enhancements over SOTA lightweight SSL frameworks of FLiText and DisCo in SSL text classification with extremely rare labelled data.

Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence

TL;DR

PS-NET tackles semi-supervised learning for lightweight NLP models under severe label scarcity by combining online distillation from a teacher, mutual learning among multiple small student networks, and curriculum adversarial perturbations to progressively generalize. The framework distills knowledge from a deeper teacher into two-layer or small student networks and trains them jointly on labeled and unlabeled data, reinforced by peer interaction and self-augmentation. Empirical results on SSL text classification and extractive summarization show PS-NET outperforms state-of-the-art lightweight SSL baselines such as FLiText and DisCo, while using far fewer parameters. This approach enables efficient deployment on resource-constrained devices and provides a scalable path for SSL in compact models across tasks.

Abstract

The semi-supervised learning (SSL) strategy in lightweight models requires reducing annotated samples and facilitating cost-effective inference. However, the constraint on model parameters, imposed by the scarcity of training labels, limits the SSL performance. In this paper, we introduce PS-NET, a novel framework tailored for semi-supervised text mining with lightweight models. PS-NET incorporates online distillation to train lightweight student models by imitating the Teacher model. It also integrates an ensemble of student peers that collaboratively instruct each other. Additionally, PS-NET implements a constant adversarial perturbation schema to further self-augmentation by progressive generalizing. Our PS-NET, equipped with a 2-layer distilled BERT, exhibits notable performance enhancements over SOTA lightweight SSL frameworks of FLiText and DisCo in SSL text classification with extremely rare labelled data.

Paper Structure

This paper contains 27 sections, 14 equations, 3 figures, 10 tables, 1 algorithm.

Figures (3)

  • Figure 1: Framework of PS-NET. It integrates online distillation within an SSL framework, following phased steps of supervised knowledge optimization and unsupervised knowledge distillation. PS-NET allows the student networks to improve generalization through DML in a peer collaboration manner. In each step, PS-NET utilizes CAT, which iteratively generates adversarial noise using gradient-based methods, facilitating continuous self-improvement of the lightweight models.
  • Figure 2: The visualization of the Center Kernel Alignment (CKA ZhuW21a) scores of PS-NET in Subfigures (c) and (d), along with its ablation variant, SingleStudent, shown in Subfigures (a) and (b). All models utilize a 6-layer BERT on the AG News dataset for evaluation, with 10 labelled examples per class.
  • Figure 3: Match manner of KL divergence on PS-NET teacher and students. A smaller KL divergence value indicates less mismatch. Strategy 1 (STR-1) employs two 2-layer students in PS-NET, utilizing the first 2 layers and the last 2 layers of BERT teacher, respectively. Strategy 2 (STR-2) involves two 6-layer students in PS-NET. Strategy 3 (STR-3) utilizes six 2-layer students in PS-NET. Strategy 4.1 (STR-4.1) and Strategy 4.2 (STR-4.2) incorporate two SingleStudent models and BERT teacher, with two students corresponding to the first 2 layers (STR-4.1) and the last 2 layers (STR-4.2) of BERT teacher.