Table of Contents
Fetching ...

Delving Deep into Engagement Prediction of Short Videos

Dasong Li, Wenjie Li, Baili Lu, Hongsheng Li, Sizhuo Ma, Gurunandan Krishnan, Jian Wang

TL;DR

This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions and proposes two metrics: normalized average watch percentage (NAWP) and engagement continuation rate (ECR) to describe the engagement levels of short videos.

Abstract

Understanding and modeling the popularity of User Generated Content (UGC) short videos on social media platforms presents a critical challenge with broad implications for content creators and recommendation systems. This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions. Surprisingly, our findings reveal that Mean Opinion Scores from previous video quality assessment datasets do not strongly correlate with video engagement levels. To address this, we introduce a substantial dataset comprising 90,000 real-world UGC short videos from Snapchat. Rather than relying on view count, average watch time, or rate of likes, we propose two metrics: normalized average watch percentage (NAWP) and engagement continuation rate (ECR) to describe the engagement levels of short videos. Comprehensive multi-modal features, including visual content, background music, and text data, are investigated to enhance engagement prediction. With the proposed dataset and two key metrics, our method demonstrates its ability to predict engagements of short videos purely from video content.

Delving Deep into Engagement Prediction of Short Videos

TL;DR

This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions and proposes two metrics: normalized average watch percentage (NAWP) and engagement continuation rate (ECR) to describe the engagement levels of short videos.

Abstract

Understanding and modeling the popularity of User Generated Content (UGC) short videos on social media platforms presents a critical challenge with broad implications for content creators and recommendation systems. This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions. Surprisingly, our findings reveal that Mean Opinion Scores from previous video quality assessment datasets do not strongly correlate with video engagement levels. To address this, we introduce a substantial dataset comprising 90,000 real-world UGC short videos from Snapchat. Rather than relying on view count, average watch time, or rate of likes, we propose two metrics: normalized average watch percentage (NAWP) and engagement continuation rate (ECR) to describe the engagement levels of short videos. Comprehensive multi-modal features, including visual content, background music, and text data, are investigated to enhance engagement prediction. With the proposed dataset and two key metrics, our method demonstrates its ability to predict engagements of short videos purely from video content.
Paper Structure (16 sections, 6 equations, 4 figures, 4 tables)

This paper contains 16 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Sample frames of the short videos in our dataset. The frame samples are cropped to exclude sensitive content such as human faces and watermarks for display.
  • Figure 2: (a), (b), (e): The distributions of average watch time (AWT), average watch percentage (AWP) and engagement continuation rate (ECR), respectively. ECR, calculated as the probability of watch time exceeding 5 seconds: $\mathbb{P}$ (watch > 5s), is more duration-independent. (c): We fit top 3% of average watch times to derive a universal metric for videos of different durations. (d): Further normalization of the average time is achieved by fitting a line, resulting in the normalized average watch percentage (NAWP). A color mapping is used to encode the distribution densities in (a), (b), (d) and (e). (f), (g): Distributions of NAWP and ECR. Both two metrics follow bimodal distribution, reflecting the unique property of user's swiftly skipping uninteresting videos or spend relative longer time on their interesting videos in short videos platforms. (h): The strong correlation between ECR and NAWP. (i): The distribution of like rate.
  • Figure 3: The effectiveness of comprehensive multi-modal features to enhance engagement prediction. The blue bars represent incrementally incorporating new features to achieve improved SRCC, while a gray bar indicates that the modification was not adopted. These multi-modal features incorporated into our network leads to increasingly better performance than previous VQA features.
  • Figure 4: The overview of multi-modal feature extractions. The learnable Multilayer Perceptron (MLP) to process extracted features is omitted for simplicity.