SemEval-2017 Task 4: Sentiment Analysis in Twitter using BERT
Rupak Kumar Das, Ted Pedersen
TL;DR
This paper tackles sentiment analysis in Twitter under SemEval-2017 Task 4 (English), focusing on subtasks A–C. It leverages a pre-trained BERT BASE model from HuggingFace and fine-tunes it on the SemEval data, with Naive Bayes as a baseline. The results show BERT outperforms the baseline across all subtasks, particularly in the binary Subtask B (accuracy ~0.897, F1 ~0.848), and demonstrates stronger performance for binary than multi-class settings due to limited data and class imbalance. The work indicates the practical value of transformer-based fine-tuning for short, informal social-media text and provides dataset/code resources for reproducibility.
Abstract
This paper uses the BERT model, which is a transformer-based architecture, to solve task 4A, English Language, Sentiment Analysis in Twitter of SemEval2017. BERT is a very powerful large language model for classification tasks when the amount of training data is small. For this experiment, we have used the BERT(BASE) model, which has 12 hidden layers. This model provides better accuracy, precision, recall, and f1 score than the Naive Bayes baseline model. It performs better in binary classification subtasks than the multi-class classification subtasks. We also considered all kinds of ethical issues during this experiment, as Twitter data contains personal and sensible information. The dataset and code used in our experiment can be found in this GitHub repository.
