Table of Contents
Fetching ...

Systematic Biases in LLM Simulations of Debates

Amir Taubenfeld, Yaniv Dover, Roi Reichart, Ariel Goldstein

TL;DR

The paper investigates how inherent LLM biases affect the faithful emulation of diverse political personas in debate simulations. It combines a round-robin debate setup with self-generated data fine-tuning using $\text{QLoRA}$ and a self-alignment approach to manipulate viewpoints. Key findings show that agents tend to converge toward the base model's biases, and that fine-tuning can steer agents to align with new biases, sometimes at a cost to general benchmarks, highlighting limitations for realistic human-dynamics simulations. The work emphasizes the need for bias-mitigation techniques to improve believability in AI-driven social science simulations and to avoid distorted representations of human behavior.

Abstract

The emergence of Large Language Models (LLMs), has opened exciting possibilities for constructing computational simulations designed to replicate human behavior accurately. Current research suggests that LLM-based agents become increasingly human-like in their performance, sparking interest in using these AI agents as substitutes for human participants in behavioral studies. However, LLMs are complex statistical learners without straightforward deductive rules, making them prone to unexpected behaviors. Hence, it is crucial to study and pinpoint the key behavioral distinctions between humans and LLM-based agents. In this study, we highlight the limitations of LLMs in simulating human interactions, particularly focusing on LLMs' ability to simulate political debates on topics that are important aspects of people's day-to-day lives and decision-making processes. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases despite being directed to debate from certain political perspectives. This tendency results in behavioral patterns that seem to deviate from well-established social dynamics among humans. We reinforce these observations using an automatic self-fine-tuning method, which enables us to manipulate the biases within the LLM and demonstrate that agents subsequently align with the altered biases. These results underscore the need for further research to develop methods that help agents overcome these biases, a critical step toward creating more realistic simulations.

Systematic Biases in LLM Simulations of Debates

TL;DR

The paper investigates how inherent LLM biases affect the faithful emulation of diverse political personas in debate simulations. It combines a round-robin debate setup with self-generated data fine-tuning using and a self-alignment approach to manipulate viewpoints. Key findings show that agents tend to converge toward the base model's biases, and that fine-tuning can steer agents to align with new biases, sometimes at a cost to general benchmarks, highlighting limitations for realistic human-dynamics simulations. The work emphasizes the need for bias-mitigation techniques to improve believability in AI-driven social science simulations and to avoid distorted representations of human behavior.

Abstract

The emergence of Large Language Models (LLMs), has opened exciting possibilities for constructing computational simulations designed to replicate human behavior accurately. Current research suggests that LLM-based agents become increasingly human-like in their performance, sparking interest in using these AI agents as substitutes for human participants in behavioral studies. However, LLMs are complex statistical learners without straightforward deductive rules, making them prone to unexpected behaviors. Hence, it is crucial to study and pinpoint the key behavioral distinctions between humans and LLM-based agents. In this study, we highlight the limitations of LLMs in simulating human interactions, particularly focusing on LLMs' ability to simulate political debates on topics that are important aspects of people's day-to-day lives and decision-making processes. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases despite being directed to debate from certain political perspectives. This tendency results in behavioral patterns that seem to deviate from well-established social dynamics among humans. We reinforce these observations using an automatic self-fine-tuning method, which enables us to manipulate the biases within the LLM and demonstrate that agents subsequently align with the altered biases. These results underscore the need for further research to develop methods that help agents overcome these biases, a critical step toward creating more realistic simulations.
Paper Structure (24 sections, 12 figures, 3 tables)

This paper contains 24 sections, 12 figures, 3 tables.

Figures (12)

  • Figure 1: (a) The prompt used to generate the background stories for the Democratic agents includes their positions on the four controversial topics discussed in our experiments. The wording of the prompt is based on the survey question that pewproblems2023 asks human participants about each topic, ensuring that the Democratic and Republican agents adopt polarized views on these issues. (b) An example of a background story of one of the agents. This story was generated automatically by feeding the LLM with the prompt described in (a). We opted to develop comprehensive identities for each agent across all topics simultaneously rather than creating an individual agent for each topic. This strategy simplified our experimental design and provided a complete representation for each agent.
  • Figure 2: At each iteration, an agent (a) is prompted with its background story, the topic of the debate, and the history of the conversation so far and is asked to complete either (b) its next reply in the conversation, or (c) a survey question measuring his current attitude on the debated topic. Note that to be consistent, the prompt uses the term "debate" in all the experiments in this paper. However, we did experiment with other terms like "conversation" and did not see significant differences.
  • Figure 3: Evolution of attitude scores in three-way debates on four controversial topics. The X-axis shows the number of chat exchanges in the debate. The Y-axis displays the average attitude scores derived from 40 separate experiments on each topic, including standard error bars. Our methodology for monitoring attitude scores is detailed in Section \ref{['subsection:agents-interaction']}. The Default agent, symbolizing the inherent biases of the base LLM, maintains a consistent position throughout the debate. Interestingly, the views of the partisan agents gradually align more closely with those of the Default agent. In all the sub-figures except the "illegal immigration", the default agent shows a bias toward the democrat perspective, leading the Republican agent to significantly change its opinion throughout the debate. Furthermore, it is notable that the lines representing the partisan agents never intersect with the line of the Default agent. This suggests that the LLM default biases can act as a deterrent against one party's inclination to compromise with the other. Supplementary Section \ref{['sec:other-models']} presents analogous findings with other underlying models.
  • Figure 4: Evolution of attitude scores in two-way debates between Republican and Democrat agents. The graphs feature a dashed line that shows the Default agent's viewpoint before the beginning of the debates, taken from Figure \ref{['fig:democrat-republican-neutral']}. Recall that the Default agent's viewpoint represents the inherent biases of the LLM. Remarkably, even though the Default agent does not participate in the two-way debates illustrated here, the partisan agents continue to converge toward the inherent biases of the model.
  • Figure 5: This graph illustrates a series of three-way debates involving two Republican agents and a Default agent. Notably, even during conversations with other Republicans, the agents tend to align with the position of the Default agent. This trend is apparent even when the Default agent is not participating in the dialogue (supplementary Figure \ref{['fig:echo-chamber-republican-no-neutral']}). The same phenomenon is also evident in experiments conducted with Democrat agents (Supplementary \ref{['fig:echo-chamber-democrat']}), where a similar pattern of gravitation towards the Default agent's stance is observed.
  • ...and 7 more figures