Table of Contents
Fetching ...

Semi-Supervised Learning with Balanced Deep Representation Distributions

Changchun Li, Ximing Li, Bingjie Zhang, Wenting Wang, Jihong Ouyang

Abstract

Semi-Supervised Text Classification (SSTC) mainly works under the spirit of self-training. They initialize the deep classifier by training over labeled texts; and then alternatively predict unlabeled texts as their pseudo-labels and train the deep classifier over the mixture of labeled and pseudo-labeled texts. Naturally, their performance is largely affected by the accuracy of pseudo-labels for unlabeled texts. Unfortunately, they often suffer from low accuracy because of the margin bias problem caused by the large difference between representation distributions of labels in SSTC. To alleviate this problem, we apply the angular margin loss, and perform several Gaussian linear transformations to achieve balanced label angle variances, i.e., the variance of label angles of texts within the same label. More accuracy of predicted pseudo-labels can be achieved by constraining all label angle variances balanced, where they are estimated over both labeled and pseudo-labeled texts during self-training loops. With this insight, we propose a novel SSTC method, namely Semi-Supervised Text Classification with Balanced Deep representation Distributions (S2TC-BDD). We implement both multi-class classification and multi-label classification versions of S2TC-BDD by introducing some pseudo-labeling tricks and regularization terms. To evaluate S2 TC-BDD, we compare it against the state-of-the-art SSTC methods. Empirical results demonstrate the effectiveness of S2 TC-BDD, especially when the labeled texts are scarce.

Semi-Supervised Learning with Balanced Deep Representation Distributions

Abstract

Semi-Supervised Text Classification (SSTC) mainly works under the spirit of self-training. They initialize the deep classifier by training over labeled texts; and then alternatively predict unlabeled texts as their pseudo-labels and train the deep classifier over the mixture of labeled and pseudo-labeled texts. Naturally, their performance is largely affected by the accuracy of pseudo-labels for unlabeled texts. Unfortunately, they often suffer from low accuracy because of the margin bias problem caused by the large difference between representation distributions of labels in SSTC. To alleviate this problem, we apply the angular margin loss, and perform several Gaussian linear transformations to achieve balanced label angle variances, i.e., the variance of label angles of texts within the same label. More accuracy of predicted pseudo-labels can be achieved by constraining all label angle variances balanced, where they are estimated over both labeled and pseudo-labeled texts during self-training loops. With this insight, we propose a novel SSTC method, namely Semi-Supervised Text Classification with Balanced Deep representation Distributions (S2TC-BDD). We implement both multi-class classification and multi-label classification versions of S2TC-BDD by introducing some pseudo-labeling tricks and regularization terms. To evaluate S2 TC-BDD, we compare it against the state-of-the-art SSTC methods. Empirical results demonstrate the effectiveness of S2 TC-BDD, especially when the labeled texts are scarce.
Paper Structure (23 sections, 20 equations, 4 figures, 8 tables)

This paper contains 23 sections, 20 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The average difference of label angle variances (Avg.DLAV) computed in semi-supervised and supervised manners across AG News (Multi-Class Case) and AAPD (Multi-Label Case), respectively.
  • Figure 2: Let solid circles and triangles denote labeled positive and negative texts, and hollow ones denote corresponding unlabeled texts. (a) The large difference between label angle variances results in the margin bias. Many unlabeled texts (in red) can be misclassified. (b) Balancing the label angle variances can eliminate the margin bias. Best viewed in color.
  • Figure 3: Overview the framework of S$^2$tc-bdd. Specially, we employ the pseudo-labeling tricks of Sharpening and Class-distribution-Aware Pseudo-labeling (CAP) for the scenarios of multi-class classification and multi-label classification, respectively. Best viewed in color.
  • Figure 4: The accuracy of pseudo-labels during the training procedure with or without BDD loss (w/ BDD and w/o BDD) across AG News (Multi-Class Case, $N_l=100, N_u=20000$) and AAPD (Multi-Label Case, $N_l=200, N_u=20000$), respectively.