Table of Contents
Fetching ...

Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data

Julie Jiang, Emilio Ferrara

TL;DR

Social-LLM tackles scalable, inductive modeling of user behavior by combining content cues from profiles with first-order social interactions via a Siamese-LM framework and edge-type aware transformations. It achieves efficient training with linear complexity in edges and nodes and produces reusable embeddings for downstream detection tasks, even for unseen users. Across seven real-world Twitter datasets covering politics, morality, and hate, Social-LLM generally outperforms content-, network-, and hybrid baselines, with significant improvements and robust performance; ablations show the importance of edge directionality, mixed edge types, and optional tweet embeddings. The approach enables visualization and scalable deployment, offering a practical tool for computational social science research and large-scale social network analysis.

Abstract

The proliferation of social network data has unlocked unprecedented opportunities for extensive, data-driven exploration of human behavior. The structural intricacies of social networks offer insights into various computational social science issues, particularly concerning social influence and information diffusion. However, modeling large-scale social network data comes with computational challenges. Though large language models make it easier than ever to model textual content, any advanced network representation methods struggle with scalability and efficient deployment to out-of-sample users. In response, we introduce a novel approach tailored for modeling social network data in user detection tasks. This innovative method integrates localized social network interactions with the capabilities of large language models. Operating under the premise of social network homophily, which posits that socially connected users share similarities, our approach is designed to address these challenges. We conduct a thorough evaluation of our method across seven real-world social network datasets, spanning a diverse range of topics and detection tasks, showcasing its applicability to advance research in computational social science.

Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network Data

TL;DR

Social-LLM tackles scalable, inductive modeling of user behavior by combining content cues from profiles with first-order social interactions via a Siamese-LM framework and edge-type aware transformations. It achieves efficient training with linear complexity in edges and nodes and produces reusable embeddings for downstream detection tasks, even for unseen users. Across seven real-world Twitter datasets covering politics, morality, and hate, Social-LLM generally outperforms content-, network-, and hybrid baselines, with significant improvements and robust performance; ablations show the importance of edge directionality, mixed edge types, and optional tweet embeddings. The approach enables visualization and scalable deployment, offering a practical tool for computational social science research and large-scale social network analysis.

Abstract

The proliferation of social network data has unlocked unprecedented opportunities for extensive, data-driven exploration of human behavior. The structural intricacies of social networks offer insights into various computational social science issues, particularly concerning social influence and information diffusion. However, modeling large-scale social network data comes with computational challenges. Though large language models make it easier than ever to model textual content, any advanced network representation methods struggle with scalability and efficient deployment to out-of-sample users. In response, we introduce a novel approach tailored for modeling social network data in user detection tasks. This innovative method integrates localized social network interactions with the capabilities of large language models. Operating under the premise of social network homophily, which posits that socially connected users share similarities, our approach is designed to address these challenges. We conduct a thorough evaluation of our method across seven real-world social network datasets, spanning a diverse range of topics and detection tasks, showcasing its applicability to advance research in computational social science.
Paper Structure (33 sections, 4 figures, 2 tables)

This paper contains 33 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Distribution of user bot scores prior to user preprocessing for the Immigration-Hate datasets.
  • Figure 2: Ablation study of user tweet embeddings on the Ukr-Rus-Suspended dataset (Experiment 5).
  • Figure 3: Sensitivity to embedding dimension $d$ (Experiment 6).
  • Figure 4: Visualization of Social-LLM embeddings on select datasets.