Conformity in Large Language Models

Xiaochen Zhu; Caiqi Zhang; Tom Stafford; Nigel Collier; Andreas Vlachos

Conformity in Large Language Models

Xiaochen Zhu, Caiqi Zhang, Tom Stafford, Nigel Collier, Andreas Vlachos

TL;DR

The study probes conformity bias in LLMs by adapting Asch-style social-influence experiments to multi-turn dialogues, defining a critical-subject framework with $CL_p$ and $RL_p$ to quantify conformity and resistance as the number of confederates $p$ increases. Across state-of-the-art LLMs and diverse objective and subjective datasets, the authors observe pervasive conformity: the conformity level $CL_p$ grows and resistance $RL_p$ declines with larger $p$, and initial confidence inversely predicts conformity with $p<0.001$ in key tests. Two prompt-based interventions, Devil's Advocate (DA) and Question Distillation (QD), substantially mitigate conformity across tasks and also show promise against sycophancy in QA settings. The results highlight practical pathways to safer, more robust LLMs for information retrieval and collaborative reasoning, with training-free remedies that can be deployed in real-world, multi-agent contexts.

Abstract

The conformity effect describes the tendency of individuals to align their responses with the majority. Studying this bias in large language models (LLMs) is crucial, as LLMs are increasingly used in various information-seeking and decision-making tasks as conversation partners to improve productivity. Thus, conformity to incorrect responses can compromise their effectiveness. In this paper, we adapt psychological experiments to examine the extent of conformity in popular LLMs. Our findings reveal that all tested models exhibit varying levels of conformity toward the majority, regardless of their initial choice or correctness, across different knowledge domains. Notably, we are the first to show that LLMs are more likely to conform when they are more uncertain in their own prediction. We further explore factors that influence conformity, such as training paradigms and input characteristics, finding that instruction-tuned models are less susceptible to conformity, while increasing the naturalness of majority tones amplifies conformity. Finally, we propose two interventions, Devil's Advocate and Question Distillation, to mitigate conformity, providing insights into building more robust language models.

Conformity in Large Language Models

TL;DR

Abstract

Conformity in Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)