Table of Contents
Fetching ...

SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity

Yijie Xu, Bolun Zheng, Wei Zhu, Hangjia Pan, Yuchen Yao, Ning Xu, Anan Liu, Quan Zhang, Chenggang Yan

TL;DR

SMTPD introduces a large-scale, multilingual, multi-modal benchmark for temporal popularity prediction on YouTube with aligned 30-day popularity sequences. The authors propose a baseline framework that combines visual (ResNet-101), textual (BERT-Multilingual), numerical, and categorical features, fused into a temporal regression model based on an LSTM and trained with a Composite Gradient Loss. Experimental results show that temporal alignment and early popularity signals substantially improve prediction accuracy across languages, outperforming SMPD baselines in temporal forecasting. The dataset and baseline enable cross-language, time-aligned analysis of social media popularity and support development of more effective prediction models for content optimization and digital marketing.

Abstract

Social media popularity prediction task aims to predict the popularity of posts on social media platforms, which has a positive driving effect on application scenarios such as content optimization, digital marketing and online advertising. Though many studies have made significant progress, few of them pay much attention to the integration between popularity prediction with temporal alignment. In this paper, with exploring YouTube's multilingual and multi-modal content, we construct a new social media temporal popularity prediction benchmark, namely SMTPD, and suggest a baseline framework for temporal popularity prediction. Through data analysis and experiments, we verify that temporal alignment and early popularity play crucial roles in social media popularity prediction for not only deepening the understanding of temporal dynamics of popularity in social media but also offering a suggestion about developing more effective prediction models in this field. Code is available at https://github.com/zhuwei321/SMTPD.

SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity

TL;DR

SMTPD introduces a large-scale, multilingual, multi-modal benchmark for temporal popularity prediction on YouTube with aligned 30-day popularity sequences. The authors propose a baseline framework that combines visual (ResNet-101), textual (BERT-Multilingual), numerical, and categorical features, fused into a temporal regression model based on an LSTM and trained with a Composite Gradient Loss. Experimental results show that temporal alignment and early popularity signals substantially improve prediction accuracy across languages, outperforming SMPD baselines in temporal forecasting. The dataset and baseline enable cross-language, time-aligned analysis of social media popularity and support development of more effective prediction models for content optimization and digital marketing.

Abstract

Social media popularity prediction task aims to predict the popularity of posts on social media platforms, which has a positive driving effect on application scenarios such as content optimization, digital marketing and online advertising. Though many studies have made significant progress, few of them pay much attention to the integration between popularity prediction with temporal alignment. In this paper, with exploring YouTube's multilingual and multi-modal content, we construct a new social media temporal popularity prediction benchmark, namely SMTPD, and suggest a baseline framework for temporal popularity prediction. Through data analysis and experiments, we verify that temporal alignment and early popularity play crucial roles in social media popularity prediction for not only deepening the understanding of temporal dynamics of popularity in social media but also offering a suggestion about developing more effective prediction models in this field. Code is available at https://github.com/zhuwei321/SMTPD.

Paper Structure

This paper contains 24 sections, 14 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Content sections and popularity trend of SMTPD. In \ref{['fig:sample']}, a sample involves four sections of content, with temporal popularity. \ref{['fig:box']} depicts box plots of daily popularity scores, illustrating variations in popularity distribution at different time points. The distribution consistently demonstrates a decay pattern over time.
  • Figure 2: The PC Matrix
  • Figure 3: The SRC Matrix
  • Figure 5: The statistics based on category.\ref{['fig:c1']} counts the number of samples in each category, and \ref{['fig:c2']} shows the average popularity score of samples in each category.
  • Figure 6: Languages analysis. The left is the proportions of these languages, with "Other" encompassing 90 different languages. The right represents the average popularity in different languages, revealing the geographic biases . The languages of samples are counted by reference to the title.
  • ...and 1 more figures