Table of Contents
Fetching ...

SoMeR: Multi-View User Representation Learning for Social Media

Siyi Guo, Keith Burghardt, Valeria Pantè, Kristina Lerman

TL;DR

SoMeR introduces a universal multi-view social media user representation framework that fuses temporal activity, post text, profile information, and network interactions into a transformer-based embedding learned via joint network link prediction and contrastive objectives. The approach handles data sparsity through a triplet-based history representation and self-supervised pretraining, enabling few-shot adaptation to downstream tasks. It is validated on three socio-political problems—IO-driver detection, online polarization, and hate-subreddit participation prediction—showing robust performance and clear contributions from each view, with scalability to large datasets. The work advances socio-political user analysis by providing a generalizable, scalable embedding space capable of informing interventions while addressing privacy and ethical considerations.

Abstract

Social media user representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations are critical to a range of social problems, including predicting user behaviors and detecting inauthentic accounts. However, existing methods are either designed for commercial applications, or rely on specific features like text contents, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address these limitations, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text contents, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of time-stamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through three applications: 1) Identifying information operation driver accounts, 2) Measuring online polarization after major events, and 3) Predicting future user participation in Reddit hate communities. SoMeR provides new solutions to better understand user behavior in the socio-political domains, enabling more informed decisions and interventions.

SoMeR: Multi-View User Representation Learning for Social Media

TL;DR

SoMeR introduces a universal multi-view social media user representation framework that fuses temporal activity, post text, profile information, and network interactions into a transformer-based embedding learned via joint network link prediction and contrastive objectives. The approach handles data sparsity through a triplet-based history representation and self-supervised pretraining, enabling few-shot adaptation to downstream tasks. It is validated on three socio-political problems—IO-driver detection, online polarization, and hate-subreddit participation prediction—showing robust performance and clear contributions from each view, with scalability to large datasets. The work advances socio-political user analysis by providing a generalizable, scalable embedding space capable of informing interventions while addressing privacy and ethical considerations.

Abstract

Social media user representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations are critical to a range of social problems, including predicting user behaviors and detecting inauthentic accounts. However, existing methods are either designed for commercial applications, or rely on specific features like text contents, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address these limitations, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text contents, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of time-stamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through three applications: 1) Identifying information operation driver accounts, 2) Measuring online polarization after major events, and 3) Predicting future user participation in Reddit hate communities. SoMeR provides new solutions to better understand user behavior in the socio-political domains, enabling more informed decisions and interventions.
Paper Structure (42 sections, 8 equations, 8 figures, 4 tables)

This paper contains 42 sections, 8 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Model Architecture of SoMeR. We format a user's posting history into triplets of time, feature, and value, which undergo encoding via a Triplet Encoder, a transformer-based contextual learning module and a fusion attention layer, becoming a user history embedding that is then concatenated to the user profile embedding. Through training with two self-supervised objectives - network link prediction and contrastive loss - our method effectively captures user similarity in the latent space.
  • Figure 2: Users shifted in the embeddings space after the SCOTUS abortion ruling. Points in (a) are encoded with user post histories between January 1st to May 2nd, 2022. Points in (b) are encoded with the post histories from the same users between June 24th to November 8th, 2022. Points in (a) and (b) are both projected in the same embedding space.
  • Figure 3: Users with same ideology moved closer after SCOTUS abortion ruling, and users with different ideologies moved away. The color represents the percent change in the mean of these nearest neighbor metrics across populations from baseline period to after ruling period. *** indicates that the means are significantly different in two time periods with p-value $< 0.0001$.
  • Figure 4: Comparison of changes observed in the embedding spaces learned by ablated models and the full model. Temporal features contributed little whereas textual features are the greater factor. The color represents the percent change in the mean of these four metrics across populations from baseline period to after ruling period. *** indicates that the means are significantly different in two time periods with p-value $< 0.0001$.
  • Figure 5: t-SNE of embedding spaces learned from synthetic datasets that (a) simulates a simple scenario with three clusters, (b) simulates a hard scenario with 10 clusters and more noise, (c) simulates the scenario that temporal patterns vary across clusters, and (d) simulates the scenario that feature values across clusters.
  • ...and 3 more figures