Y Social: an LLM-powered Social Media Digital Twin

Giulio Rossetti; Massimo Stella; Rémy Cazabet; Katherine Abramski; Erica Cau; Salvatore Citraro; Andrea Failla; Riccardo Improta; Virginia Morini; Valentina Pansanella

Y Social: an LLM-powered Social Media Digital Twin

Giulio Rossetti, Massimo Stella, Rémy Cazabet, Katherine Abramski, Erica Cau, Salvatore Citraro, Andrea Failla, Riccardo Improta, Virginia Morini, Valentina Pansanella

TL;DR

Y Social introduces a modular, LLM-powered digital twin for online social platforms, enabling controlled experiments on user interactions, content diffusion, and policy impact. It combines a REST API backend, an LLM interrogation server, and a simulation client to run heterogeneous AI agents with configurable personas. The paper demonstrates a political-debate case study and discusses the platform's potential to support multidisciplinary research, from network science to NLP and psychology. This work advances the ability to study human-AI co-evolution, algorithmic bias, and information dynamics in a reproducible, tunable virtual OSN.

Abstract

In this paper we introduce Y, a new-generation digital twin designed to replicate an online social media platform. Digital twins are virtual replicas of physical systems that allow for advanced analyses and experimentation. In the case of social media, a digital twin such as Y provides a powerful tool for researchers to simulate and understand complex online interactions. {\tt Y} leverages state-of-the-art Large Language Models (LLMs) to replicate sophisticated agent behaviors, enabling accurate simulations of user interactions, content dissemination, and network dynamics. By integrating these aspects, Y offers valuable insights into user engagement, information spread, and the impact of platform policies. Moreover, the integration of LLMs allows Y to generate nuanced textual content and predict user responses, facilitating the study of emergent phenomena in online environments. To better characterize the proposed digital twin, in this paper we describe the rationale behind its implementation, provide examples of the analyses that can be performed on the data it enables to be generated, and discuss its relevance for multidisciplinary research.

Y Social: an LLM-powered Social Media Digital Twin

TL;DR

Abstract

Paper Structure (19 sections, 5 figures, 1 algorithm)

This paper contains 19 sections, 5 figures, 1 algorithm.

Introduction
Related Works
LLM-enhanced Social Simulations
Y Social - Digital Twin
y_server: Social Media platform primitives
Introducing Algorithmic Bias: Recommender System(s)
y_client: Simulating Agents' social interactions
LLM-powered agents
Orchestrating a Simulation.
Case study: Political Debate Arena
Simulation configuration
Generated Data Examples
Leveraging the Y Digital Twin to boost multidisciplinary research
Network Science
Social AI
...and 4 more sections

Figures (5)

Figure 1: Y architecture. The Digital Twin is composed of a y_server - exposing a REST API allowing simulation synchronization and data storage - and a y_client implementing the simulation logic and the interface toward the LLM service(s) simulating agents' behaviors. The y_client can be deployed: on the same machine of the y_server (Host A); on a machine running multiple clients (Host B); on a different machine hosting also the LLM service (Host C). Moreover, by design, a generic y_client can leverage one (or more) LLM(s), either self-hosted or commercial.
Figure 2: Y database diagram. The Digital Twin simulation data is stored in a database where the main tables are: user_mgmt describing the agents' profiles; post collecting the data of the agents' generated contents (often characterised by many-to-many relations - e.g., toward the mentions, hashtags, reactions and emotions tables); websites and articles storing the updated information on the shared news; follow storing the established/removed social relationships; rounds simulating the system the temporal clock - used to enforce y_client(s) synchronization.
Figure 3: Simulation setup. (a) Hourly activity rates of LLM agents (fitted on BlueSky Social data); (b) Representativeness of political leaning classes in the simulated agent population; (d) Age distribution per political leaning.
Figure 4: Generated content statistics. (a) CCDF of generated contents (posts, comments, news, hashtags, mentions) per agent; (b) temporal distribution of agents' reactions (like/dislike) to peer contents; (c) Most used hashtags; (d) Most frequently elicited emotions in generated texts.
Figure 5: Discussion threads. Examples of (a) LLM-generated discussion thread leveraging agent profiles and interests; (b) Thread based on a piece of news accessed via RSS feeds; (c) Cumulative decreasing distribution of thread lengths - i.e., number of comments per post; (d) CCDF of the number of times a post has been recommended for agents to read.

Y Social: an LLM-powered Social Media Digital Twin

TL;DR

Abstract

Y Social: an LLM-powered Social Media Digital Twin

Authors

TL;DR

Abstract

Table of Contents

Figures (5)