Table of Contents
Fetching ...

Revisiting the Role of Label Smoothing in Enhanced Text Sentiment Classification

Yijie Gao, Shijing Si, Hua Luo, Haixia Sun, Yugui Zhang

TL;DR

This work tackles how label smoothing (LS) affects text sentiment classification. It conducts a systematic, cross-architecture evaluation across eight datasets using three architectures (TextCNN, BERT, RoBERTa) under scratch and fine-tuning regimes, applying four LS levels with KL-divergence loss and soft targets $D_i' = (1 - k\lambda) D_i + \lambda \mathbf{1}$. The findings show LS consistently improves accuracy, accelerates convergence, and yields more separable hidden representations, with LS1 frequently delivering top performance. These results offer practical guidance for applying LS to sentiment tasks and highlight LS as a tool for better calibration and generalization in NLP models. The study underscores the potential of LS to enhance robustness and efficiency in sentiment analysis while suggesting future work to learn more precise sentiment label distributions.

Abstract

Label smoothing is a widely used technique in various domains, such as text classification, image classification and speech recognition, known for effectively combating model overfitting. However, there is little fine-grained analysis on how label smoothing enhances text sentiment classification. To fill in the gap, this article performs a set of in-depth analyses on eight datasets for text sentiment classification and three deep learning architectures: TextCNN, BERT, and RoBERTa, under two learning schemes: training from scratch and fine-tuning. By tuning the smoothing parameters, we can achieve improved performance on almost all datasets for each model architecture. We further investigate the benefits of label smoothing, finding that label smoothing can accelerate the convergence of deep models and make samples of different labels easily distinguishable.

Revisiting the Role of Label Smoothing in Enhanced Text Sentiment Classification

TL;DR

This work tackles how label smoothing (LS) affects text sentiment classification. It conducts a systematic, cross-architecture evaluation across eight datasets using three architectures (TextCNN, BERT, RoBERTa) under scratch and fine-tuning regimes, applying four LS levels with KL-divergence loss and soft targets . The findings show LS consistently improves accuracy, accelerates convergence, and yields more separable hidden representations, with LS1 frequently delivering top performance. These results offer practical guidance for applying LS to sentiment tasks and highlight LS as a tool for better calibration and generalization in NLP models. The study underscores the potential of LS to enhance robustness and efficiency in sentiment analysis while suggesting future work to learn more precise sentiment label distributions.

Abstract

Label smoothing is a widely used technique in various domains, such as text classification, image classification and speech recognition, known for effectively combating model overfitting. However, there is little fine-grained analysis on how label smoothing enhances text sentiment classification. To fill in the gap, this article performs a set of in-depth analyses on eight datasets for text sentiment classification and three deep learning architectures: TextCNN, BERT, and RoBERTa, under two learning schemes: training from scratch and fine-tuning. By tuning the smoothing parameters, we can achieve improved performance on almost all datasets for each model architecture. We further investigate the benefits of label smoothing, finding that label smoothing can accelerate the convergence of deep models and make samples of different labels easily distinguishable.
Paper Structure (13 sections, 6 equations, 3 figures, 2 tables)

This paper contains 13 sections, 6 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Workflow of our sentiment classification with label smoothing. Here the deep architecture is TextCNN, but the workflow for BERT and RoBERTa is similar.
  • Figure 2: Accuracy on the validation set at different epochs using BERT with varying smoothing parameters.
  • Figure 3: t-SNE plots from BERT baseline versus label smoothing methods.