Table of Contents
Fetching ...

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

Qinglan Wei, Yaqi Zhou, Longhui Xiao, Yuan Zhang

TL;DR

The first multimodal dataset of short video news covering hot events is created and a novel system MSEVA for emotion analysis of short videos is proposed, achieving good results on the bili-news dataset.

Abstract

YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different perspectives and emotions on hot events. These videos can go viral and have a significant impact on the public's mood and views. However, short videos' affective computing was a neglected area of research in the past. Monitoring the public's emotions through these videos requires a lot of time and effort, which may not be enough to prevent undesirable outcomes. In this paper, we create the first multimodal dataset of short video news covering hot events. We also propose an automatic technique for audio segmenting and transcribing. In addition, we improve the accuracy of the multimodal affective computing model by about 4.17% by optimizing it. Moreover, a novel system MSEVA for emotion analysis of short videos is proposed. Achieving good results on the bili-news dataset, the MSEVA system applies the multimodal emotion analysis method in the real world. It is helpful to conduct timely public opinion guidance and stop the spread of negative emotions. Data and code from our investigations can be accessed at: http://xxx.github.com.

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis

TL;DR

The first multimodal dataset of short video news covering hot events is created and a novel system MSEVA for emotion analysis of short videos is proposed, achieving good results on the bili-news dataset.

Abstract

YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different perspectives and emotions on hot events. These videos can go viral and have a significant impact on the public's mood and views. However, short videos' affective computing was a neglected area of research in the past. Monitoring the public's emotions through these videos requires a lot of time and effort, which may not be enough to prevent undesirable outcomes. In this paper, we create the first multimodal dataset of short video news covering hot events. We also propose an automatic technique for audio segmenting and transcribing. In addition, we improve the accuracy of the multimodal affective computing model by about 4.17% by optimizing it. Moreover, a novel system MSEVA for emotion analysis of short videos is proposed. Achieving good results on the bili-news dataset, the MSEVA system applies the multimodal emotion analysis method in the real world. It is helpful to conduct timely public opinion guidance and stop the spread of negative emotions. Data and code from our investigations can be accessed at: http://xxx.github.com.
Paper Structure (15 sections, 9 figures, 12 tables)

This paper contains 15 sections, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Examples of emotional influence from state media on we media
  • Figure 2: The process of automatic segmentation and transcription method
  • Figure 3: The architecture of V2EM-Roberta multimodal emotion analysis model
  • Figure 4: The architecture of multimodal short videos emotion visual analysis (MSEVA) System
  • Figure 5: The example for the similar resolution of face area after our data format preprocessing module (the left image is the input during training of IEMOCAP dataset, and the right image is the input from short video during inference of Bili-news dataset)
  • ...and 4 more figures