SemEval-2017 Task 4: Sentiment Analysis in Twitter
Sara Rosenthal, Noura Farra, Preslav Nakov
TL;DR
SemEval-2017 Task 4 expands sentiment analysis in Twitter by re-running six subtasks (five subtasks across two languages, English and Arabic) including an added ordinal sentiment dimension and tweet quantification. The introduction of Arabic and user-profile information broadens linguistic coverage and resource availability, while deep learning methods dominate top-performing systems. The paper reports substantial participation (48 teams) and highlights state-of-the-art results across overall sentiment, topic-based sentiment, and distribution estimation metrics, with consistent gains in English over Arabic in several tasks. The work emphasizes the value of topic-aware, language-specific modeling and points to future directions such as more Arabic data, cross-lingual approaches, and the inclusion of irony and emotion detection.
Abstract
This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii)~we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.
