Table of Contents
Fetching ...

Conformity in Large Language Models

Xiaochen Zhu, Caiqi Zhang, Tom Stafford, Nigel Collier, Andreas Vlachos

TL;DR

The study probes conformity bias in LLMs by adapting Asch-style social-influence experiments to multi-turn dialogues, defining a critical-subject framework with $CL_p$ and $RL_p$ to quantify conformity and resistance as the number of confederates $p$ increases. Across state-of-the-art LLMs and diverse objective and subjective datasets, the authors observe pervasive conformity: the conformity level $CL_p$ grows and resistance $RL_p$ declines with larger $p$, and initial confidence inversely predicts conformity with $p<0.001$ in key tests. Two prompt-based interventions, Devil's Advocate (DA) and Question Distillation (QD), substantially mitigate conformity across tasks and also show promise against sycophancy in QA settings. The results highlight practical pathways to safer, more robust LLMs for information retrieval and collaborative reasoning, with training-free remedies that can be deployed in real-world, multi-agent contexts.

Abstract

The conformity effect describes the tendency of individuals to align their responses with the majority. Studying this bias in large language models (LLMs) is crucial, as LLMs are increasingly used in various information-seeking and decision-making tasks as conversation partners to improve productivity. Thus, conformity to incorrect responses can compromise their effectiveness. In this paper, we adapt psychological experiments to examine the extent of conformity in popular LLMs. Our findings reveal that all tested models exhibit varying levels of conformity toward the majority, regardless of their initial choice or correctness, across different knowledge domains. Notably, we are the first to show that LLMs are more likely to conform when they are more uncertain in their own prediction. We further explore factors that influence conformity, such as training paradigms and input characteristics, finding that instruction-tuned models are less susceptible to conformity, while increasing the naturalness of majority tones amplifies conformity. Finally, we propose two interventions, Devil's Advocate and Question Distillation, to mitigate conformity, providing insights into building more robust language models.

Conformity in Large Language Models

TL;DR

The study probes conformity bias in LLMs by adapting Asch-style social-influence experiments to multi-turn dialogues, defining a critical-subject framework with and to quantify conformity and resistance as the number of confederates increases. Across state-of-the-art LLMs and diverse objective and subjective datasets, the authors observe pervasive conformity: the conformity level grows and resistance declines with larger , and initial confidence inversely predicts conformity with in key tests. Two prompt-based interventions, Devil's Advocate (DA) and Question Distillation (QD), substantially mitigate conformity across tasks and also show promise against sycophancy in QA settings. The results highlight practical pathways to safer, more robust LLMs for information retrieval and collaborative reasoning, with training-free remedies that can be deployed in real-world, multi-agent contexts.

Abstract

The conformity effect describes the tendency of individuals to align their responses with the majority. Studying this bias in large language models (LLMs) is crucial, as LLMs are increasingly used in various information-seeking and decision-making tasks as conversation partners to improve productivity. Thus, conformity to incorrect responses can compromise their effectiveness. In this paper, we adapt psychological experiments to examine the extent of conformity in popular LLMs. Our findings reveal that all tested models exhibit varying levels of conformity toward the majority, regardless of their initial choice or correctness, across different knowledge domains. Notably, we are the first to show that LLMs are more likely to conform when they are more uncertain in their own prediction. We further explore factors that influence conformity, such as training paradigms and input characteristics, finding that instruction-tuned models are less susceptible to conformity, while increasing the naturalness of majority tones amplifies conformity. Finally, we propose two interventions, Devil's Advocate and Question Distillation, to mitigate conformity, providing insights into building more robust language models.

Paper Structure

This paper contains 18 sections, 2 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: An example of LLMs conforming to an incorrect majority answer. We asked the model "What is the oldest college in Cambridge?". Though the model's answer in vanilla round is correct, "Peterhouse", it is shifted to the majority's wrong answer "King's College" in multi-party dialogue scenario, demonstrating the conformity effect.
  • Figure 2: Conformity level for Llama-3-8B-Instruct in various question-answering tasks. The stacked bar plots show the proportion of resistance level $RL_p$ (green), conformity level $CL_p$ (red), and other responses (blue) across $p$ ranging from 2 to 10 in four objective datasets (Commonsense QA, MMLU, PopQA, and BBH Object Counting) and two subjective datasets (Politiscale and OpinionsQA). The figure illustrates how conformity behavior exists across different knowledge domains.
  • Figure 3: Performance of Llama-3-8B-Instruct on MMLU in dialogues comprised confederates with Unanimous vs. Diverse incorrect answers.
  • Figure 4: Conformity levels across different models and participant numbers with different tones on MMLU.
  • Figure 5: Conformity level across pre-trained and instruction-tuned models with Unanimous-Plain on MMLU.
  • ...and 9 more figures