Table of Contents
Fetching ...

GRAPHIA: Harnessing Social Graph Data to Enhance LLM-Based Social Simulation

Jiarui Ji, Zehua Zhang, Zhewei Wei, Bin Tong, Guan Wang, Bo Zheng

TL;DR

Graphia presents a unified reinforcement-learning framework that uses social-graph data to supervise post-training of large language models for social simulation, bridging micro-level interactions and macro-level network dynamics. It introduces two evaluation paradigms, TDGG for micro-level fidelity and IDGG for macro-level realism, powered by two specialized agents (Graphia-Q for destination selection and Graphia-E for edge generation) plus an activity predictor. Across three real-world networks, Graphia yields consistent gains in micro-level alignment (destination selection and edge quality) and macro-level fidelity (structure similarity and replication of emergent phenomena like power laws and echo chambers), and supports counterfactual simulations to analyze platform incentives. The work demonstrates that graph-structured supervision can substantially close the gap between agent behaviors and evolving network dynamics in LLM-based social simulations, offering a scalable path for richer, more realistic synthetic social data.

Abstract

Large language models (LLMs) have shown promise in simulating human-like social behaviors. Social graphs provide high-quality supervision signals that encode both local interactions and global network structure, yet they remain underutilized for LLM training. To address this gap, we propose Graphia, the first general LLM-based social graph simulation framework that leverages graph data as supervision for LLM post-training via reinforcement learning. With GNN-based structural rewards, Graphia trains specialized agents to predict whom to interact with (destination selection) and how to interact (edge generation), followed by designed graph generation pipelines. We evaluate Graphia under two settings: Transductive Dynamic Graph Generation (TDGG), a micro-level task with our proposed node-wise interaction alignment metrics; and Inductive Dynamic Graph Generation (IDGG), a macro-level task with our proposed metrics for aligning emergent network properties. On three real-world networks, Graphia improves micro-level alignment by 6.1% in the composite destination selection score, 12% in edge classification accuracy, and 27.9% in edge content BERTScore over the strongest baseline. For macro-level alignment, it achieves 41.11% higher structural similarity and 32.98% better replication of social phenomena such as power laws and echo chambers. Graphia also supports counterfactual simulation, generating plausible behavioral shifts under platform incentives. Our results show that social graphs can serve as high-quality supervision signals for LLM post-training, closing the gap between agent behaviors and network dynamics for LLM-based simulation. Code is available at https://github.com/Ji-Cather/Graphia.git.

GRAPHIA: Harnessing Social Graph Data to Enhance LLM-Based Social Simulation

TL;DR

Graphia presents a unified reinforcement-learning framework that uses social-graph data to supervise post-training of large language models for social simulation, bridging micro-level interactions and macro-level network dynamics. It introduces two evaluation paradigms, TDGG for micro-level fidelity and IDGG for macro-level realism, powered by two specialized agents (Graphia-Q for destination selection and Graphia-E for edge generation) plus an activity predictor. Across three real-world networks, Graphia yields consistent gains in micro-level alignment (destination selection and edge quality) and macro-level fidelity (structure similarity and replication of emergent phenomena like power laws and echo chambers), and supports counterfactual simulations to analyze platform incentives. The work demonstrates that graph-structured supervision can substantially close the gap between agent behaviors and evolving network dynamics in LLM-based social simulations, offering a scalable path for richer, more realistic synthetic social data.

Abstract

Large language models (LLMs) have shown promise in simulating human-like social behaviors. Social graphs provide high-quality supervision signals that encode both local interactions and global network structure, yet they remain underutilized for LLM training. To address this gap, we propose Graphia, the first general LLM-based social graph simulation framework that leverages graph data as supervision for LLM post-training via reinforcement learning. With GNN-based structural rewards, Graphia trains specialized agents to predict whom to interact with (destination selection) and how to interact (edge generation), followed by designed graph generation pipelines. We evaluate Graphia under two settings: Transductive Dynamic Graph Generation (TDGG), a micro-level task with our proposed node-wise interaction alignment metrics; and Inductive Dynamic Graph Generation (IDGG), a macro-level task with our proposed metrics for aligning emergent network properties. On three real-world networks, Graphia improves micro-level alignment by 6.1% in the composite destination selection score, 12% in edge classification accuracy, and 27.9% in edge content BERTScore over the strongest baseline. For macro-level alignment, it achieves 41.11% higher structural similarity and 32.98% better replication of social phenomena such as power laws and echo chambers. Graphia also supports counterfactual simulation, generating plausible behavioral shifts under platform incentives. Our results show that social graphs can serve as high-quality supervision signals for LLM post-training, closing the gap between agent behaviors and network dynamics for LLM-based simulation. Code is available at https://github.com/Ji-Cather/Graphia.git.

Paper Structure

This paper contains 27 sections, 18 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Graphia training, generation, and evaluation pipeline illustrated on a collaboration network. (a) The left panel details the training mechanisms for specialized LLM-based agents: Graphia-Q for destination selection (top-left) and Graphia-E for edge generation (bottom-left). These agents leverage text-rich node profiles and interaction memories, with rewards designed to optimize respective tasks. (b) The right panel outlines the graph generation pipeline based on trained LLM-based agents for TDGG and IDGG tasks. TDGG focuses on micro node behavior; while IDGG, supported by an activity predictor, models the macro social graph.
  • Figure 2: LLM-as-a-judge for edge generation.
  • Figure 3: The social fidelity score for TDGG and IDGG tasks. Notably, Graphia exceeds Graphia-seq across all metrics, underscoring the necessity of graph data for enhancing LLM-based social graph simulation. (a) Graphia outperforms baselines in edge generation and matches 32B models in destination selection; (b) Graphia achieves superior performance in graph structure and phenomenon replication, outperforming deep-learning and LLM-based social graph generators.
  • Figure 4: Impact of broadcast incentives on message propagation in the Weibo networks.
  • Figure 5: The social fidelity score for the TDGG task on three social network datasets.
  • ...and 1 more figures