Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li
TL;DR
This work defines the novel task of Multi-modal Sentiment Analysis for Comment Response of Video Induced (MSA-CRVI), which aims to infer viewers' induced sentiment from comments in the context of micro videos. It introduces the CS(MV) dataset, a large TikTok-based benchmark with 8,210 micro videos and 107,267 comments labeled for opinion and emotion, and develops the VC-CSA baseline that grounds comments to video through multi-scale temporal representations, a consensus semantic learning module, and a golden feature grounding mechanism. Empirical results show that VC-CSA outperforms text-only and existing multi-modal baselines, highlighting the essential role of video context in interpreting comment-driven sentiment. The work lays groundwork for broader applications in public sentiment analysis and advertising effectiveness, and points to future extensions including audio features and larger datasets.
Abstract
Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers induced sentiment analysis. In light of this, we introduces a novel research task, Multi-modal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to inferring opinions and emotions according to the comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research. It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours. To infer the induced sentiment of comment should leverage the video content, so we propose the Video Content-aware Comment Sentiment Analysis (VC-CSA) method as baseline to address the challenges inherent in this new task. Extensive experiments demonstrate that our method is showing significant improvements over other established baselines.
