Dancing in the syntax forest: fast, accurate and explainable sentiment analysis with SALSA
Carlos Gómez-Rodríguez, Muhammad Imran, David Vilares, Elena Solera, Olga Kellert
TL;DR
The paper addresses the need for scalable sentiment analysis that incorporates linguistic structure without heavy computation. It introduces SALSA, using fast syntactic parsing and a parsing-as-sequence-labeling paradigm to achieve accuracy with explainability. It outlines three formation approaches—rule-based, multitask learning, and an integrated tree-based method—and leverages multilingual resources (Universal Dependencies) with initial Spanish/English focus and datasets like SemEval 2022 Task 10 and Rest-Mex 2023. The work targets SMEs with a PoC ERC project, aiming to bridge research and commercialization by delivering efficient, explainable sentiment analysis tools.
Abstract
Sentiment analysis is a key technology for companies and institutions to gauge public opinion on products, services or events. However, for large-scale sentiment analysis to be accessible to entities with modest computational resources, it needs to be performed in a resource-efficient way. While some efficient sentiment analysis systems exist, they tend to apply shallow heuristics, which do not take into account syntactic phenomena that can radically change sentiment. Conversely, alternatives that take syntax into account are computationally expensive. The SALSA project, funded by the European Research Council under a Proof-of-Concept Grant, aims to leverage recently-developed fast syntactic parsing techniques to build sentiment analysis systems that are lightweight and efficient, while still providing accuracy and explainability through the explicit use of syntax. We intend our approaches to be the backbone of a working product of interest for SMEs to use in production.
