A Syntax-Injected Approach for Faster and More Accurate Sentiment Analysis
Muhammad Imran, Olga Kellert, Carlos Gómez-Rodríguez
TL;DR
To address the speed bottleneck in syntax-based sentiment analysis, the paper proposes SELSP, a dependency parser cast as a sequence-labeling task trained with DistilBERT on UD English and Spanish data. The parser uses the Relative Encoding encoding and CoDeLin for decoding into UD trees, which feed a sentiment-analysis engine that applies sentiment dictionaries and syntax-based rules to predict polarity. Empirical results across English and Spanish datasets show SELSP-based SA achieves substantial speed gains—often three to eighteen times faster than Stanza and VADER—while maintaining or exceeding their accuracy in most configurations; RoBERTa-based transformers yield higher accuracy but slower inference. The work demonstrates that fast, explainable SA is feasible without domain-specific training data, and that in-domain sentiment dictionaries can boost performance, making the approach appealing for real-world deployment.
Abstract
Sentiment Analysis (SA) is a crucial aspect of Natural Language Processing (NLP), addressing subjective assessments in textual content. Syntactic parsing is useful in SA because explicit syntactic information can improve accuracy while providing explainability, but it tends to be a computational bottleneck in practice due to the slowness of parsing algorithms. This paper addresses said bottleneck by using a SEquence Labeling Syntactic Parser (SELSP) to inject syntax into SA. By treating dependency parsing as a sequence labeling problem, we greatly enhance the speed of syntax-based SA. SELSP is trained and evaluated on a ternary polarity classification task, demonstrating its faster performance and better accuracy in polarity prediction tasks compared to conventional parsers like Stanza and to heuristic approaches that use shallow syntactic rules for SA like VADER. This increased speed and improved accuracy make SELSP particularly appealing to SA practitioners in both research and industry. In addition, we test several sentiment dictionaries on our SELSP to see which one improves the performance in polarity prediction tasks. Moreover, we compare the SELSP with Transformer-based models trained on a 5-label classification task. The results show that dictionaries that capture polarity judgment variation provide better results than dictionaries that ignore polarity judgment variation. Moreover, we show that SELSP is considerably faster than Transformer-based models in polarity prediction tasks.
