Table of Contents
Fetching ...

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

Hexi Wang, Yujia Zhou, Bangde Du, Qingyao Ai, Yiqun Liu

Abstract

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture social diversity, producing flattened inter-group differences and overly homogeneous responses within demographic groups. We identify this limitation as a Diversity Collapse phenomenon in LLM hidden representations, where distinct social identities become increasingly indistinguishable across layers. Motivated by this observation, we propose Parametric Social Identity Injection (PSII), a general framework that injects explicit, parametric representations of demographic attributes and value orientations directly into intermediate hidden states of LLMs. Unlike prompt-based persona conditioning, PSII enables fine-grained and controllable identity modulation at the representation level. Extensive experiments on the World Values Survey using multiple open-source LLMs show that PSII significantly improves distributional fidelity and diversity, reducing KL divergence to real-world survey data while enhancing overall diversity. This work provides new insights into representation-level control of LLM agents and advances scalable, diversity-aware public opinion simulation. Code and data are available at https://github.com/halsayxi/PSII.

Parametric Social Identity Injection and Diversification in Public Opinion Simulation

Abstract

Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, offering a promising alternative to costly and slow human surveys. Despite their scalability, current LLM-based simulation methods fail to capture social diversity, producing flattened inter-group differences and overly homogeneous responses within demographic groups. We identify this limitation as a Diversity Collapse phenomenon in LLM hidden representations, where distinct social identities become increasingly indistinguishable across layers. Motivated by this observation, we propose Parametric Social Identity Injection (PSII), a general framework that injects explicit, parametric representations of demographic attributes and value orientations directly into intermediate hidden states of LLMs. Unlike prompt-based persona conditioning, PSII enables fine-grained and controllable identity modulation at the representation level. Extensive experiments on the World Values Survey using multiple open-source LLMs show that PSII significantly improves distributional fidelity and diversity, reducing KL divergence to real-world survey data while enhancing overall diversity. This work provides new insights into representation-level control of LLM agents and advances scalable, diversity-aware public opinion simulation. Code and data are available at https://github.com/halsayxi/PSII.
Paper Structure (31 sections, 9 equations, 9 figures, 6 tables)

This paper contains 31 sections, 9 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Layer-wise scatter plots of final-token hidden states for 500 simulated agents (top) and an illustration of Diversity Collapse in Transformer hidden states (bottom). In the top panels, red points denote baseline methods and gray points denote PSII-generated agents; the reported scores measure the average spatial dispersion of representations in each layer. The bottom panel depicts the Diversity Collapse phenomenon.
  • Figure 2: Overview of the Parametric Social Identity Injection (PSII) mechanism. From left to right: Identity Construction, including agent profile construction and identity vector construction; Parametric Injection, including noise addition and hierarchical injection; Performance Evaluation of the simulated agents.
  • Figure 3: Response distributions for a randomly selected question from each of the four categories. PSII closely matches human-like diversity, while baseline methods often concentrate on a few options.
  • Figure 4: Language distribution in the WVS dataset.
  • Figure 5: The effects of injecting demographic attributes into different network layers. It illustrates how layer selection impacts simulation accuracy (KL divergence) and diversity (normalized entropy).
  • ...and 4 more figures