Revisiting Structured Sentiment Analysis as Latent Dependency Graph Parsing
Chengjie Zhou, Bobo Li, Hao Fei, Fei Li, Chong Teng, Donghong Ji
TL;DR
This paper reframes Structured Sentiment Analysis (SSA) as latent dependency graph parsing by treating flat sentiment spans as latent subtrees and solving them with a constrained TreeCRF in a two-stage parsing framework. The encoder–scorer architecture combines rich token representations with higher-order and headed-span scoring to jointly predict expression, holder, and target spans while marginalizing latent internal structures. The approach yields state-of-the-art results across five benchmarks, particularly improving long-span boundary detection and overall sentiment graph accuracy, and maintains end-to-end optimization. This latent-tree formulation advances SSA by explicitly modeling internal span structures, offering improved interpretability and robustness for complex sentiment structures with potential practical impact on negation and hedge detection in real-world text.
Abstract
Structured Sentiment Analysis (SSA) was cast as a problem of bi-lexical dependency graph parsing by prior studies. Multiple formulations have been proposed to construct the graph, which share several intrinsic drawbacks: (1) The internal structures of spans are neglected, thus only the boundary tokens of spans are used for relation prediction and span recognition, thus hindering the model's expressiveness; (2) Long spans occupy a significant proportion in the SSA datasets, which further exacerbates the problem of internal structure neglect. In this paper, we treat the SSA task as a dependency parsing task on partially-observed dependency trees, regarding flat spans without determined tree annotations as latent subtrees to consider internal structures of spans. We propose a two-stage parsing method and leverage TreeCRFs with a novel constrained inside algorithm to model latent structures explicitly, which also takes advantages of joint scoring graph arcs and headed spans for global optimization and inference. Results of extensive experiments on five benchmark datasets reveal that our method performs significantly better than all previous bi-lexical methods, achieving new state-of-the-art.
