Contrastive Learning for Implicit Social Factors in Social Media Popularity Prediction
Zhizhen Zhang, Ruihong Qiu, Xiaohui Xie
TL;DR
This work addresses the prediction of social media post popularity by incorporating implicit social factors that arise from platform dynamics, beyond content quality. It introduces PPCL, an end-to-end framework with a multimodal Post Encoder, a structured User Encoder, and a Popularity Predictor, augmented by three supervised contrastive tasks (CRD, UISD, UID) to capture Content Relevance, User Influence Similarity, and User Identity signals. Through hierarchical labels, model-level augmentations, and carefully designed batch sampling, PPCL demonstrates consistent improvements on the Social Media Popularity Dataset (SMPD) across multiple data regimes, highlighting data efficiency and richer representations. The findings underscore the practical impact of platform-induced signals for more accurate and robust popularity prediction in real-world social media systems.
Abstract
On social media sharing platforms, some posts are inherently destined for popularity. Therefore, understanding the reasons behind this phenomenon and predicting popularity before post publication holds significant practical value. The previous work predominantly focuses on enhancing post content extraction for better prediction results. However, certain factors introduced by social platforms also impact post popularity, which has not been extensively studied. For instance, users are more likely to engage with posts from individuals they follow, potentially influencing the popularity of these posts. We term these factors, unrelated to the explicit attractiveness of content, as implicit social factors. Through the analysis of users' post browsing behavior (also validated in public datasets), we propose three implicit social factors related to popularity, including content relevance, user influence similarity, and user identity. To model the proposed social factors, we introduce three supervised contrastive learning tasks. For different task objectives and data types, we assign them to different encoders and control their gradient flows to achieve joint optimization. We also design corresponding sampling and augmentation algorithms to improve the effectiveness of contrastive learning. Extensive experiments on the Social Media Popularity Dataset validate the superiority of our proposed method and also confirm the important role of implicit social factors in popularity prediction. We open source the code at https://github.com/Daisy-zzz/PPCL.git.
