Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky
Benjamin White, Anastasia Shimorina
TL;DR
This paper tackles the problem of predicting both common and rare user actions on Bluesky by deploying a hybrid pipeline that combines lookup-based prediction, persona-specific LightGBM models, a rare-action classifier with a text+temporal fusion network, and GPT-4.1-mini generated replies. The approach demonstrates that lookup-based predictions excel for high-confidence, recurring interactions, while per-cluster models effectively handle frequent actions, and a specialized rare-action classifier addresses low-frequency events with temporal and textual signals. Key findings include macro-F1 scores around 0.56 for rare actions and a text-generation cosine similarity of approximately 0.83, with overall improvements over pure transformer baselines. The work advances practical social-media behavior modeling by showing how temporal, semantic, and generation components can be integrated to simulate user actions and replies in a persona-driven setting, achieving top performance in the SocialSim 2025 challenge.
Abstract
Understanding and predicting user behavior on social media platforms is crucial for content recommendation and platform design. While existing approaches focus primarily on common actions like retweeting and liking, the prediction of rare but significant behaviors remains largely unexplored. This paper presents a hybrid methodology for social media user behavior prediction that addresses both frequent and infrequent actions across a diverse action vocabulary. We evaluate our approach on a large-scale Bluesky dataset containing 6.4 million conversation threads spanning 12 distinct user actions across 25 persona clusters. Our methodology combines four complementary approaches: (i) a lookup database system based on historical response patterns; (ii) persona-specific LightGBM models with engineered temporal and semantic features for common actions; (iii) a specialized hybrid neural architecture fusing textual and temporal representations for rare action classification; and (iv) generation of text replies. Our persona-specific models achieve an average macro F1-score of 0.64 for common action prediction, while our rare action classifier achieves 0.56 macro F1-score across 10 rare actions. These results demonstrate that effective social media behavior prediction requires tailored modeling strategies recognizing fundamental differences between action types. Our approach achieved first place in the SocialSim: Social-Media Based Personas challenge organized at the Social Simulation with LLMs workshop at COLM 2025.
