Table of Contents
Fetching ...

Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning

Minoo Jafarlou, Mario M. Kubek

TL;DR

This study employs a transductive label propagation method based on the manifold assumption for text classification to generate pseudo-labels for unlabeled data for text classification task, which are then used to train deep neural networks.

Abstract

Labeling datasets is a noteworthy challenge in machine learning, both in terms of cost and time. This research, however, leverages an efficient answer. By exploring label propagation in semi-supervised learning, we can significantly reduce the number of labels required compared to traditional methods. We employ a transductive label propagation method based on the manifold assumption for text classification. Our approach utilizes a graph-based method to generate pseudo-labels for unlabeled data for the text classification task, which are then used to train deep neural networks. By extending labels based on cosine proximity within a nearest neighbor graph from network embeddings, we combine unlabeled data into supervised learning, thereby reducing labeling costs. Based on previous successes in other domains, this study builds and evaluates this approach's effectiveness in sentiment analysis, presenting insights into semi-supervised learning.

Reducing Labeling Costs in Sentiment Analysis via Semi-Supervised Learning

TL;DR

This study employs a transductive label propagation method based on the manifold assumption for text classification to generate pseudo-labels for unlabeled data for text classification task, which are then used to train deep neural networks.

Abstract

Labeling datasets is a noteworthy challenge in machine learning, both in terms of cost and time. This research, however, leverages an efficient answer. By exploring label propagation in semi-supervised learning, we can significantly reduce the number of labels required compared to traditional methods. We employ a transductive label propagation method based on the manifold assumption for text classification. Our approach utilizes a graph-based method to generate pseudo-labels for unlabeled data for the text classification task, which are then used to train deep neural networks. By extending labels based on cosine proximity within a nearest neighbor graph from network embeddings, we combine unlabeled data into supervised learning, thereby reducing labeling costs. Based on previous successes in other domains, this study builds and evaluates this approach's effectiveness in sentiment analysis, presenting insights into semi-supervised learning.

Paper Structure

This paper contains 14 sections, 1 equation, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Performance comparison of different word embeddings across Baseline, Fully Supervised, and Label Propagation. The metrics compared are Accuracy, F1 Score, and AUC-ROC. The x-axis represents the type of word embedding used, while the y-axis represents the score for each metric.
  • Figure 2: Number of Tokens
  • Figure 3: Performance Metrics of 1D CNN, BiLSTM, and BiGRU Models Using All Three Methods
  • Figure 4: Radar Charts of 1D CNN, BiLSTM, and BiGRU Models for All Three Methods
  • Figure 5: Heatmap of Performance Metrics for Different Configurations on BiGRU
  • ...and 2 more figures