Table of Contents
Fetching ...

PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

Bingbing Wang, Zhixin Bai, Zhengda Jin, Zihan Wang, Xintong Song, Jingjie Lin, Sixuan Li, Jing Li, Ruifeng Xu

TL;DR

The paper addresses the gap in multimodal conversational stance detection by introducing U-MStance, a dataset that injects user-level information into multimodal conversations, and PRISM, a framework that distills longitudinal user personas, performs rationalized cross-modal grounding via chain-of-thought, and uses mutual task reinforcement to jointly optimize stance detection and stance-aware response generation. PRISM demonstrates strong gains over baselines, including LLM-heavy approaches, and shows robust generalization across targets and backbones. The work highlights the practical impact of personalizing stance understanding in social media analysis and offers a scalable approach to realistic, user-centric multimodal reasoning.

Abstract

The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific targets within complex discussions. However, existing studies remain limited by: **1) pseudo-multimodality**, where visual cues appear only in source posts while comments are treated as text-only, misaligning with real-world multimodal interactions; and **2) user homogeneity**, where diverse users are treated uniformly, neglecting personal traits that shape stance expression. To address these issues, we introduce **U-MStance**, the first user-centric MCSD dataset, containing over 40k annotated comments across six real-world targets. We further propose **PRISM**, a **P**ersona-**R**easoned mult**I**modal **S**tance **M**odel for MCSD. PRISM first derives longitudinal user personas from historical posts and comments to capture individual traits, then aligns textual and visual cues within conversational context via Chain-of-Thought to bridge semantic and pragmatic gaps across modalities. Finally, a mutual task reinforcement mechanism is employed to jointly optimize stance detection and stance-aware response generation for bidirectional knowledge transfer. Experiments on U-MStance demonstrate that PRISM yields significant gains over strong baselines, underscoring the effectiveness of user-centric and context-grounded multimodal reasoning for realistic stance understanding.

PRISM of Opinions: A Persona-Reasoned Multimodal Framework for User-centric Conversational Stance Detection

TL;DR

The paper addresses the gap in multimodal conversational stance detection by introducing U-MStance, a dataset that injects user-level information into multimodal conversations, and PRISM, a framework that distills longitudinal user personas, performs rationalized cross-modal grounding via chain-of-thought, and uses mutual task reinforcement to jointly optimize stance detection and stance-aware response generation. PRISM demonstrates strong gains over baselines, including LLM-heavy approaches, and shows robust generalization across targets and backbones. The work highlights the practical impact of personalizing stance understanding in social media analysis and offers a scalable approach to realistic, user-centric multimodal reasoning.

Abstract

The rapid proliferation of multimodal social media content has driven research in Multimodal Conversational Stance Detection (MCSD), which aims to interpret users' attitudes toward specific targets within complex discussions. However, existing studies remain limited by: **1) pseudo-multimodality**, where visual cues appear only in source posts while comments are treated as text-only, misaligning with real-world multimodal interactions; and **2) user homogeneity**, where diverse users are treated uniformly, neglecting personal traits that shape stance expression. To address these issues, we introduce **U-MStance**, the first user-centric MCSD dataset, containing over 40k annotated comments across six real-world targets. We further propose **PRISM**, a **P**ersona-**R**easoned mult**I**modal **S**tance **M**odel for MCSD. PRISM first derives longitudinal user personas from historical posts and comments to capture individual traits, then aligns textual and visual cues within conversational context via Chain-of-Thought to bridge semantic and pragmatic gaps across modalities. Finally, a mutual task reinforcement mechanism is employed to jointly optimize stance detection and stance-aware response generation for bidirectional knowledge transfer. Experiments on U-MStance demonstrate that PRISM yields significant gains over strong baselines, underscoring the effectiveness of user-centric and context-grounded multimodal reasoning for realistic stance understanding.

Paper Structure

This paper contains 20 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of a conventional approach for MCSD (left) with our proposed PRISM framework (right).
  • Figure 2: Data construction pipeline of U-MStance.
  • Figure 3: The distribution of stance categories in our U-MStance.
  • Figure 4: Overflow of our PRISM framework.
  • Figure 5: Case study comparing PRISM with baseline models.
  • ...and 1 more figures