Table of Contents
Fetching ...

Contrastive Learning for Implicit Social Factors in Social Media Popularity Prediction

Zhizhen Zhang, Ruihong Qiu, Xiaohui Xie

TL;DR

This work addresses the prediction of social media post popularity by incorporating implicit social factors that arise from platform dynamics, beyond content quality. It introduces PPCL, an end-to-end framework with a multimodal Post Encoder, a structured User Encoder, and a Popularity Predictor, augmented by three supervised contrastive tasks (CRD, UISD, UID) to capture Content Relevance, User Influence Similarity, and User Identity signals. Through hierarchical labels, model-level augmentations, and carefully designed batch sampling, PPCL demonstrates consistent improvements on the Social Media Popularity Dataset (SMPD) across multiple data regimes, highlighting data efficiency and richer representations. The findings underscore the practical impact of platform-induced signals for more accurate and robust popularity prediction in real-world social media systems.

Abstract

On social media sharing platforms, some posts are inherently destined for popularity. Therefore, understanding the reasons behind this phenomenon and predicting popularity before post publication holds significant practical value. The previous work predominantly focuses on enhancing post content extraction for better prediction results. However, certain factors introduced by social platforms also impact post popularity, which has not been extensively studied. For instance, users are more likely to engage with posts from individuals they follow, potentially influencing the popularity of these posts. We term these factors, unrelated to the explicit attractiveness of content, as implicit social factors. Through the analysis of users' post browsing behavior (also validated in public datasets), we propose three implicit social factors related to popularity, including content relevance, user influence similarity, and user identity. To model the proposed social factors, we introduce three supervised contrastive learning tasks. For different task objectives and data types, we assign them to different encoders and control their gradient flows to achieve joint optimization. We also design corresponding sampling and augmentation algorithms to improve the effectiveness of contrastive learning. Extensive experiments on the Social Media Popularity Dataset validate the superiority of our proposed method and also confirm the important role of implicit social factors in popularity prediction. We open source the code at https://github.com/Daisy-zzz/PPCL.git.

Contrastive Learning for Implicit Social Factors in Social Media Popularity Prediction

TL;DR

This work addresses the prediction of social media post popularity by incorporating implicit social factors that arise from platform dynamics, beyond content quality. It introduces PPCL, an end-to-end framework with a multimodal Post Encoder, a structured User Encoder, and a Popularity Predictor, augmented by three supervised contrastive tasks (CRD, UISD, UID) to capture Content Relevance, User Influence Similarity, and User Identity signals. Through hierarchical labels, model-level augmentations, and carefully designed batch sampling, PPCL demonstrates consistent improvements on the Social Media Popularity Dataset (SMPD) across multiple data regimes, highlighting data efficiency and richer representations. The findings underscore the practical impact of platform-induced signals for more accurate and robust popularity prediction in real-world social media systems.

Abstract

On social media sharing platforms, some posts are inherently destined for popularity. Therefore, understanding the reasons behind this phenomenon and predicting popularity before post publication holds significant practical value. The previous work predominantly focuses on enhancing post content extraction for better prediction results. However, certain factors introduced by social platforms also impact post popularity, which has not been extensively studied. For instance, users are more likely to engage with posts from individuals they follow, potentially influencing the popularity of these posts. We term these factors, unrelated to the explicit attractiveness of content, as implicit social factors. Through the analysis of users' post browsing behavior (also validated in public datasets), we propose three implicit social factors related to popularity, including content relevance, user influence similarity, and user identity. To model the proposed social factors, we introduce three supervised contrastive learning tasks. For different task objectives and data types, we assign them to different encoders and control their gradient flows to achieve joint optimization. We also design corresponding sampling and augmentation algorithms to improve the effectiveness of contrastive learning. Extensive experiments on the Social Media Popularity Dataset validate the superiority of our proposed method and also confirm the important role of implicit social factors in popularity prediction. We open source the code at https://github.com/Daisy-zzz/PPCL.git.

Paper Structure

This paper contains 39 sections, 25 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Popularity distributions across three data attributes in SMPD, respectively validating the impact of proposed social factors on popularity: Content Relevance (CR), User Influence Similarity (UIS), and User Identity (UI).
  • Figure 2: An overview of the proposed PPCL. The model architecture consists of post encoder, user encoder, and popularity predictor. Three contrastive learning losses, i.e., $\mathcal{L}_{CRD}$, $\mathcal{L}_{UISD}$, and $\mathcal{L}_{UID}$ are then added to optimize each of the above components for modeling the proposed implicit social factors.
  • Figure 3: Visualization of features output by $Enc_{pop}$ of PPCL and w/o CL with colors indicating values of post popularity.
  • Figure A1: Parameter sensitivity analysis.
  • Figure A2: Visualization of post popularity. The left-to-right horizontal axis represents predicted popularity from small to large, and the bottom-to-top vertical axis represents popularity labels from small to large.
  • ...and 1 more figures