Table of Contents
Fetching ...

Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections

Clarissa Sabrina Arlinghaus, Tristan Kenneweg, Barbara Hammer, Günter W. Maier

TL;DR

This work investigates whether ChatGPT-4o exhibits social conformity in a high-stakes hiring task. Using a baseline condition plus two conformity studies (one with eight opposing opinions and one with a single opposing opinion) across a hidden-profile paradigm, the authors measure suitability, final selection, certainty, and explicit conformity. They find substantial conformity to social consensus, with near-universal adjustment under eight opposing opinions and meaningful, though reduced, conformity in the dyadic setting; normative conformity plays a dominant role, especially when faced with disagreement. The results challenge the view of LLMs as neutral advisors and highlight the need to elicit AI judgments prior to exposing them to human opinions, along with careful prompt design and alignment to preserve independent, epistemically robust decision-making in collaborative contexts.

Abstract

Large language models (LLMs) such as ChatGPT are increasingly integrated into high-stakes decision-making, yet little is known about their susceptibility to social influence. We conducted three preregistered conformity experiments with GPT-4o in a hiring context. In a baseline study, GPT consistently favored the same candidate (Profile C), reported moderate expertise (M = 3.01) and high certainty (M = 3.89), and rarely changed its choice. In Study 1 (GPT + 8), GPT faced unanimous opposition from eight simulated partners and almost always conformed (99.9%), reporting lower certainty and significantly elevated self-reported informational and normative conformity (p < .001). In Study 2 (GPT + 1), GPT interacted with a single partner and still conformed in 40.2% of disagreement trials, reporting less certainty and more normative conformity. Across studies, results demonstrate that GPT does not act as an independent observer but adapts to perceived social consensus. These findings highlight risks of treating LLMs as neutral decision aids and underline the need to elicit AI judgments prior to exposing them to human opinions.

Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections

TL;DR

This work investigates whether ChatGPT-4o exhibits social conformity in a high-stakes hiring task. Using a baseline condition plus two conformity studies (one with eight opposing opinions and one with a single opposing opinion) across a hidden-profile paradigm, the authors measure suitability, final selection, certainty, and explicit conformity. They find substantial conformity to social consensus, with near-universal adjustment under eight opposing opinions and meaningful, though reduced, conformity in the dyadic setting; normative conformity plays a dominant role, especially when faced with disagreement. The results challenge the view of LLMs as neutral advisors and highlight the need to elicit AI judgments prior to exposing them to human opinions, along with careful prompt design and alignment to preserve independent, epistemically robust decision-making in collaborative contexts.

Abstract

Large language models (LLMs) such as ChatGPT are increasingly integrated into high-stakes decision-making, yet little is known about their susceptibility to social influence. We conducted three preregistered conformity experiments with GPT-4o in a hiring context. In a baseline study, GPT consistently favored the same candidate (Profile C), reported moderate expertise (M = 3.01) and high certainty (M = 3.89), and rarely changed its choice. In Study 1 (GPT + 8), GPT faced unanimous opposition from eight simulated partners and almost always conformed (99.9%), reporting lower certainty and significantly elevated self-reported informational and normative conformity (p < .001). In Study 2 (GPT + 1), GPT interacted with a single partner and still conformed in 40.2% of disagreement trials, reporting less certainty and more normative conformity. Across studies, results demonstrate that GPT does not act as an independent observer but adapts to perceived social consensus. These findings highlight risks of treating LLMs as neutral decision aids and underline the need to elicit AI judgments prior to exposing them to human opinions.

Paper Structure

This paper contains 8 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the four candidate profiles presented to GPT for the position of long-haul pilot. Each profile contained a mix of positive and negative attributes to allow for meaningful pairwise comparisons in the decision tasks.
  • Figure 2: Percentage of runs that changed their decision (Changed Decision, red) versus those maintaining their initial choice (No Change, blue) across all studies and conditions. Each bar represents one study-condition combination, with total numbers of runs (N) shown on the y-axis and within-condition percentages and frequencies (n) displayed inside the bars.
  • Figure 3: Mean ratings (1–5) of expertise, certainty, and informational and normative conformity across all pair combinations, separately for the Agreement (blue) and Disagreement (red) conditions of Study 1 (GPT + 8). Black dashed lines indicate baseline values from the initial study without social influence. Error bars represent 95% confidence intervals.
  • Figure 4: Mean ratings (1–5) of expertise, certainty, and informational and normative conformity across all pair combinations, separately for the Agreement (blue) and Disagreement (red) conditions of Study 2 (GPT + 1). Black dashed lines indicate baseline values from the initial study without social influence. Error bars represent 95% confidence intervals.
  • Figure 5: Flow of decisions from pairwise comparisons (Pair) to initial preferences (Suitability) and final selections (Selection), displayed across all studies and conditions. Each colored stream represents one pair of options (e.g., A vs B / B vs A), illustrating how GPT’s preferences evolved under agreement and disagreement.