Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performance Enhancement
Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu
TL;DR
Addressing severe class-imbalance in multi-label sentiment classification on the GoEmotions dataset, the work builds a balanced corpus by augmenting the GoEmotions data with Sentiment140 samples labeled by a RoBERTa-go-emotions classifier and 20k GPT-4 mini-generated texts. It introduces a unified CNN–BiLSTM–attention architecture using pre-trained FastText embeddings and a sigmoid multi-label output, with mixed-precision training and per-label thresholds for $28$ emotion categories. The key contributions are (i) a robust data-balancing pipeline that improves minority-emotion recall and F1, and (ii) a lightweight architecture that rivals transformer baselines while reducing compute. The results demonstrate improved performance on multiple metrics and offer practical applicability for fine-grained sentiment monitoring in real-world settings.
Abstract
Multi-label sentiment classification plays a vital role in natural language processing by detecting multiple emotions within a single text. However, existing datasets like GoEmotions often suffer from severe class imbalance, which hampers model performance, especially for underrepresented emotions. To address this, we constructed a balanced multi-label sentiment dataset by integrating the original GoEmotions data, emotion-labeled samples from Sentiment140 using a RoBERTa-base-GoEmotions model, and manually annotated texts generated by GPT-4 mini. Our data balancing strategy ensured an even distribution across 28 emotion categories. Based on this dataset, we developed an enhanced multi-label classification model that combines pre-trained FastText embeddings, convolutional layers for local feature extraction, bidirectional LSTM for contextual learning, and an attention mechanism to highlight sentiment-relevant words. A sigmoid-activated output layer enables multi-label prediction, and mixed precision training improves computational efficiency. Experimental results demonstrate significant improvements in accuracy, precision, recall, F1-score, and AUC compared to models trained on imbalanced data, highlighting the effectiveness of our approach.
