MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Qinglan Wei; Yaqi Zhou; Longhui Xiao; Yuan Zhang

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Qinglan Wei, Yaqi Zhou, Longhui Xiao, Yuan Zhang

TL;DR

The first multimodal dataset of short video news covering hot events is created and a novel system MSEVA for emotion analysis of short videos is proposed, achieving good results on the bili-news dataset.

Abstract

YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different perspectives and emotions on hot events. These videos can go viral and have a significant impact on the public's mood and views. However, short videos' affective computing was a neglected area of research in the past. Monitoring the public's emotions through these videos requires a lot of time and effort, which may not be enough to prevent undesirable outcomes. In this paper, we create the first multimodal dataset of short video news covering hot events. We also propose an automatic technique for audio segmenting and transcribing. In addition, we improve the accuracy of the multimodal affective computing model by about 4.17% by optimizing it. Moreover, a novel system MSEVA for emotion analysis of short videos is proposed. Achieving good results on the bili-news dataset, the MSEVA system applies the multimodal emotion analysis method in the real world. It is helpful to conduct timely public opinion guidance and stop the spread of negative emotions. Data and code from our investigations can be accessed at: http://xxx.github.com.

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

TL;DR

Abstract

Paper Structure (15 sections, 9 figures, 12 tables)

This paper contains 15 sections, 9 figures, 12 tables.

Introduction
Related Work
Datasets of Multimodal Emotion Analysis
General Multimodal Analysis of Short Videos
General Multimodal Emotion Analysis of Videos
Our Work
Bili-News Dataset Construction
Optimize Multimodal Emotion Analysis Model
The Construction of the MSEVA System
Experiments
Statistical Analysis of Bili-news Dataset
Ablation Study of Automatic Segmentation and Transcription Module
Performance and Computational Efficiency Analysis of V2EM-RoBERTa Model
The Test of the MSEVA System Analysis
Conclusion and Future Work

Figures (9)

Figure 1: Examples of emotional influence from state media on we media
Figure 2: The process of automatic segmentation and transcription method
Figure 3: The architecture of V2EM-Roberta multimodal emotion analysis model
Figure 4: The architecture of multimodal short videos emotion visual analysis (MSEVA) System
Figure 5: The example for the similar resolution of face area after our data format preprocessing module (the left image is the input during training of IEMOCAP dataset, and the right image is the input from short video during inference of Bili-news dataset)
...and 4 more figures

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

TL;DR

Abstract

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (9)