Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

Qi Jia; Baoyu Fan; Cong Xu; Lu Liu; Liang Jin; Guoguang Du; Zhenhua Guo; Yaqian Zhao; Xuanjing Huang; Rengang Li

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li

TL;DR

This work defines the novel task of Multi-modal Sentiment Analysis for Comment Response of Video Induced (MSA-CRVI), which aims to infer viewers' induced sentiment from comments in the context of micro videos. It introduces the CS(MV) dataset, a large TikTok-based benchmark with 8,210 micro videos and 107,267 comments labeled for opinion and emotion, and develops the VC-CSA baseline that grounds comments to video through multi-scale temporal representations, a consensus semantic learning module, and a golden feature grounding mechanism. Empirical results show that VC-CSA outperforms text-only and existing multi-modal baselines, highlighting the essential role of video context in interpreting comment-driven sentiment. The work lays groundwork for broader applications in public sentiment analysis and advertising effectiveness, and points to future extensions including audio features and larger datasets.

Abstract

Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers induced sentiment analysis. In light of this, we introduces a novel research task, Multi-modal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to inferring opinions and emotions according to the comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research. It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours. To infer the induced sentiment of comment should leverage the video content, so we propose the Video Content-aware Comment Sentiment Analysis (VC-CSA) method as baseline to address the challenges inherent in this new task. Extensive experiments demonstrate that our method is showing significant improvements over other established baselines.

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

TL;DR

Abstract

Paper Structure (17 sections, 7 equations, 8 figures, 5 tables)

This paper contains 17 sections, 7 equations, 8 figures, 5 tables.

Introduction
Related work
Dataset
Data collection
Data annotation
Comparison of dataset statistics
Ethics
Method
Multi-scale Temporal Representation
Consensus Semantic Learning
Golden Feature Grounding
Fusion and Classifier
Experiments
Conclusion
Dataset Details
...and 2 more sections

Figures (8)

Figure 1: Figure (a) describes the setting of traditional multi-modal sentiment analysis, which aims to determine the speaker's sentiment based on the given multi-modal information. Figure (b) illustrates the example of our proposed task. Two comments are highlighted in the figure and hold different induced sentiments toward the related video. For easy comprehension, a description of the video content is presented in a gray box. This description does not serve as input.
Figure 2: The architecture of Video Content-aware Comment Sentiment Analysis (VC-CSA). We mainly design Multi-scale Temporal Representation, Consensus Semantic Learning and Golden Feature Grounding modules to address the new challenges of the proposed task.
Figure 3: The distribution of the amounts of the micro video and comments under the hashtag.
Figure 4: The distribution of the number of labels in our CSMV dataset.
Figure 5: The distribution of the amounts of the comments under single micro video.
...and 3 more figures

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

TL;DR

Abstract

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

Authors

TL;DR

Abstract

Table of Contents

Figures (8)