Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections
Clarissa Sabrina Arlinghaus, Tristan Kenneweg, Barbara Hammer, Günter W. Maier
TL;DR
This work investigates whether ChatGPT-4o exhibits social conformity in a high-stakes hiring task. Using a baseline condition plus two conformity studies (one with eight opposing opinions and one with a single opposing opinion) across a hidden-profile paradigm, the authors measure suitability, final selection, certainty, and explicit conformity. They find substantial conformity to social consensus, with near-universal adjustment under eight opposing opinions and meaningful, though reduced, conformity in the dyadic setting; normative conformity plays a dominant role, especially when faced with disagreement. The results challenge the view of LLMs as neutral advisors and highlight the need to elicit AI judgments prior to exposing them to human opinions, along with careful prompt design and alignment to preserve independent, epistemically robust decision-making in collaborative contexts.
Abstract
Large language models (LLMs) such as ChatGPT are increasingly integrated into high-stakes decision-making, yet little is known about their susceptibility to social influence. We conducted three preregistered conformity experiments with GPT-4o in a hiring context. In a baseline study, GPT consistently favored the same candidate (Profile C), reported moderate expertise (M = 3.01) and high certainty (M = 3.89), and rarely changed its choice. In Study 1 (GPT + 8), GPT faced unanimous opposition from eight simulated partners and almost always conformed (99.9%), reporting lower certainty and significantly elevated self-reported informational and normative conformity (p < .001). In Study 2 (GPT + 1), GPT interacted with a single partner and still conformed in 40.2% of disagreement trials, reporting less certainty and more normative conformity. Across studies, results demonstrate that GPT does not act as an independent observer but adapts to perceived social consensus. These findings highlight risks of treating LLMs as neutral decision aids and underline the need to elicit AI judgments prior to exposing them to human opinions.
