Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis

Hao Liu; Lijun He; Jiaxi Liang; Zhihan Ren; Haixia Bi; Fan Li

Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis

Hao Liu, Lijun He, Jiaxi Liang, Zhihan Ren, Haixia Bi, Fan Li

TL;DR

DASCO tackles MABSA by addressing SCP, MIM, and SNE through a two-phase approach: (1) continue pretraining with aspect-aware objectives (AOE, ASSC) and ITM, augmented by GPT-4o–generated Scene Graphs to improve cue perception and cross-modal alignment; (2) a syntactic-semantic dual-branch architecture with target-specific scopes and adaptive scope interaction via contrastive graph learning to filter noise and sharpen aspect-sentiment reasoning. The framework yields state-of-the-art results on JMASA, MATE, and MASC across Twitter datasets, driven by the integrated scene graphs, aspect-sensitive pretraining, and cross-graph contrastive learning. It also demonstrates improved efficiency and competitiveness against large language models on MASC. Overall, DASCO provides a robust, scalable solution for fine-grained multimodal sentiment analysis with practical implications for cross-modal understanding and downstream affective tasks, with code and datasets to be released.

Abstract

Multimodal Aspect-Based Sentiment Analysis (MABSA) seeks to extract fine-grained information from image-text pairs to identify aspect terms and determine their sentiment polarity. However, existing approaches often fall short in simultaneously addressing three core challenges: Sentiment Cue Perception (SCP), Multimodal Information Misalignment (MIM), and Semantic Noise Elimination (SNE). To overcome these limitations, we propose DASCO (\textbf{D}ependency Structure \textbf{A}ugmented \textbf{Sco}ping Framework), a fine-grained scope-oriented framework that enhances aspect-level sentiment reasoning by leveraging dependency parsing trees. First, we designed a multi-task pretraining strategy for MABSA on our base model, combining aspect-oriented enhancement, image-text matching, and aspect-level sentiment-sensitive cognition. This improved the model's perception of aspect terms and sentiment cues while achieving effective image-text alignment, addressing key challenges like SCP and MIM. Furthermore, we incorporate dependency trees as syntactic branch combining with semantic branch, guiding the model to selectively attend to critical contextual elements within a target-specific scope while effectively filtering out irrelevant noise for addressing SNE problem. Extensive experiments on two benchmark datasets across three subtasks demonstrate that DASCO achieves state-of-the-art performance in MABSA, with notable gains in JMASA (+2.3\% F1 and +3.5\% precision on Twitter2015). The source code is available at https://github.com/LHaoooo/DASCO .

Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis

TL;DR

Abstract

Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)