GRAPHIA: Harnessing Social Graph Data to Enhance LLM-Based Social Simulation
Jiarui Ji, Zehua Zhang, Zhewei Wei, Bin Tong, Guan Wang, Bo Zheng
TL;DR
Graphia presents a unified reinforcement-learning framework that uses social-graph data to supervise post-training of large language models for social simulation, bridging micro-level interactions and macro-level network dynamics. It introduces two evaluation paradigms, TDGG for micro-level fidelity and IDGG for macro-level realism, powered by two specialized agents (Graphia-Q for destination selection and Graphia-E for edge generation) plus an activity predictor. Across three real-world networks, Graphia yields consistent gains in micro-level alignment (destination selection and edge quality) and macro-level fidelity (structure similarity and replication of emergent phenomena like power laws and echo chambers), and supports counterfactual simulations to analyze platform incentives. The work demonstrates that graph-structured supervision can substantially close the gap between agent behaviors and evolving network dynamics in LLM-based social simulations, offering a scalable path for richer, more realistic synthetic social data.
Abstract
Large language models (LLMs) have shown promise in simulating human-like social behaviors. Social graphs provide high-quality supervision signals that encode both local interactions and global network structure, yet they remain underutilized for LLM training. To address this gap, we propose Graphia, the first general LLM-based social graph simulation framework that leverages graph data as supervision for LLM post-training via reinforcement learning. With GNN-based structural rewards, Graphia trains specialized agents to predict whom to interact with (destination selection) and how to interact (edge generation), followed by designed graph generation pipelines. We evaluate Graphia under two settings: Transductive Dynamic Graph Generation (TDGG), a micro-level task with our proposed node-wise interaction alignment metrics; and Inductive Dynamic Graph Generation (IDGG), a macro-level task with our proposed metrics for aligning emergent network properties. On three real-world networks, Graphia improves micro-level alignment by 6.1% in the composite destination selection score, 12% in edge classification accuracy, and 27.9% in edge content BERTScore over the strongest baseline. For macro-level alignment, it achieves 41.11% higher structural similarity and 32.98% better replication of social phenomena such as power laws and echo chambers. Graphia also supports counterfactual simulation, generating plausible behavioral shifts under platform incentives. Our results show that social graphs can serve as high-quality supervision signals for LLM post-training, closing the gap between agent behaviors and network dynamics for LLM-based simulation. Code is available at https://github.com/Ji-Cather/Graphia.git.
