Revisiting the Role of Label Smoothing in Enhanced Text Sentiment Classification
Yijie Gao, Shijing Si, Hua Luo, Haixia Sun, Yugui Zhang
TL;DR
This work tackles how label smoothing (LS) affects text sentiment classification. It conducts a systematic, cross-architecture evaluation across eight datasets using three architectures (TextCNN, BERT, RoBERTa) under scratch and fine-tuning regimes, applying four LS levels with KL-divergence loss and soft targets $D_i' = (1 - k\lambda) D_i + \lambda \mathbf{1}$. The findings show LS consistently improves accuracy, accelerates convergence, and yields more separable hidden representations, with LS1 frequently delivering top performance. These results offer practical guidance for applying LS to sentiment tasks and highlight LS as a tool for better calibration and generalization in NLP models. The study underscores the potential of LS to enhance robustness and efficiency in sentiment analysis while suggesting future work to learn more precise sentiment label distributions.
Abstract
Label smoothing is a widely used technique in various domains, such as text classification, image classification and speech recognition, known for effectively combating model overfitting. However, there is little fine-grained analysis on how label smoothing enhances text sentiment classification. To fill in the gap, this article performs a set of in-depth analyses on eight datasets for text sentiment classification and three deep learning architectures: TextCNN, BERT, and RoBERTa, under two learning schemes: training from scratch and fine-tuning. By tuning the smoothing parameters, we can achieve improved performance on almost all datasets for each model architecture. We further investigate the benefits of label smoothing, finding that label smoothing can accelerate the convergence of deep models and make samples of different labels easily distinguishable.
