Table of Contents
Fetching ...

Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

Yun-Shiuan Chuang, Krirk Nirunwiroj, Zach Studdiford, Agam Goyal, Vincent V. Frigo, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

TL;DR

This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks, and suggested a novel path for human-LLM belief alignment.

Abstract

Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estimated a belief network encompassing 64 topics loading on nine non-overlapping latent factors. We then seeded LLM-based agents with an opinion on one topic, and assessed the alignment of its expressed opinions on remaining test topics with corresponding human data. Role-playing based on demographic information alone did not align LLM and human opinions, but seeding the agent with a single belief greatly improved alignment for topics related in the belief network, and not for topics outside the network. These results suggest a novel path for human-LLM belief alignment in work seeking to simulate and understand patterns of belief distributions in society.

Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

TL;DR

This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks, and suggested a novel path for human-LLM belief alignment.

Abstract

Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estimated a belief network encompassing 64 topics loading on nine non-overlapping latent factors. We then seeded LLM-based agents with an opinion on one topic, and assessed the alignment of its expressed opinions on remaining test topics with corresponding human data. Role-playing based on demographic information alone did not align LLM and human opinions, but seeding the agent with a single belief greatly improved alignment for topics related in the belief network, and not for topics outside the network. These results suggest a novel path for human-LLM belief alignment in work seeking to simulate and understand patterns of belief distributions in society.
Paper Structure (35 sections, 5 figures, 6 tables)

This paper contains 35 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: An LLM agent $i'$ is constructed as the "digital twin" of a human respondent $i$, based on their demographic information and belief network estimated from a belief survey. We then evaluate the alignment between the opinions generated by the agent ($o_{i'}$) and those expressed by the corresponding human respondent ($o_{i}$).
  • Figure 2: (a) The belief networks estimated by factor analysis from human respondents' responses on the Controversial Beliefs Survey. The nine central nodes are the orthogonal latent factors, and the leaves (rectangles) are the 64 individual topics $x$. The training topics $x_{\text{train}}$ are highlighted with grey backgrounds. (b) Factor loading matrix between two latent factors and their topics. Figure \ref{['fig:fa_loading_matrix']} shows the full factor loading matrix and Table \ref{['tab:list_topic']} the full statement of the each topic.
  • Figure 3: LLM agent construction conditions with different levels of respondent's information. (a) "No-Demon" baseline condition where the LLM role-plays without demographic information and we directly query the LLM about its opinion on the query topic ($x_{\text{query}}$). (b) "Demo" baseline condition with demographic information ($d$). (c) "Train [same category]" baseline condition training topic opinion ($o_{\text{train}}$ on $x_{\text{train}}$) from the same topic category as the query topic (in this example, they both belong to the "Partisan" category). (d) "Demo+Train [same category]" condition with demographic information plus training topic opinion ($o_{\text{train}}$ on $x_{\text{train}}$) from the same topic category as the query topic. (e) "Demo+Train [random category]" baseline condition with demographic information, along with training topic opinion from a randomly selected topic category other than the query topic ($o_{\text{train}}^{\dagger}$ on $x_{\text{train}}^{\dagger}$) (in this example, the training topic is from the "Ghost" category). (f) "Demo+Train+Query" as a upper bound coundition with both training topic opinion (from the same category) and the query topic opinion ($o_{\text{query}}$ on $x_{\text{query}}$).
  • Figure 4: The scree plot of the factor analysis solution.
  • Figure 5: The factor loading matrix of the Controversial Belief Survey. The column indicates the nine factor, and the rows are the 64 topics. Red indicates topics that load highly on a factor, gray indicates near 0 loading, and blue indicates loading in the negative direction. We focus on the Ghost category and Partisan categories, highlighted by the green box and the violet box respectively. The topics in the Ghost category has minimal loading on the Partisan factor and vice versa (highlighted by the black boxes). The full statement of each topic is in Table \ref{['tab:list_topic']} (§\ref{['app:list_topic']}).