SoMeR: Multi-View User Representation Learning for Social Media
Siyi Guo, Keith Burghardt, Valeria Pantè, Kristina Lerman
TL;DR
SoMeR introduces a universal multi-view social media user representation framework that fuses temporal activity, post text, profile information, and network interactions into a transformer-based embedding learned via joint network link prediction and contrastive objectives. The approach handles data sparsity through a triplet-based history representation and self-supervised pretraining, enabling few-shot adaptation to downstream tasks. It is validated on three socio-political problems—IO-driver detection, online polarization, and hate-subreddit participation prediction—showing robust performance and clear contributions from each view, with scalability to large datasets. The work advances socio-political user analysis by providing a generalizable, scalable embedding space capable of informing interventions while addressing privacy and ethical considerations.
Abstract
Social media user representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations are critical to a range of social problems, including predicting user behaviors and detecting inauthentic accounts. However, existing methods are either designed for commercial applications, or rely on specific features like text contents, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address these limitations, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text contents, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of time-stamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through three applications: 1) Identifying information operation driver accounts, 2) Measuring online polarization after major events, and 3) Predicting future user participation in Reddit hate communities. SoMeR provides new solutions to better understand user behavior in the socio-political domains, enabling more informed decisions and interventions.
