Table of Contents
Fetching ...

Large Language Models are biased to overestimate profoundness

Eugenio Herrera-Berg, Tomás Vergara Browne, Pablo León-Villagrá, Marc-Lluís Vives, Cristian Buc Calderon

TL;DR

This study investigates whether large language models assign profound meaning to statements, including pseudo-profound BS, and how prompting and RLHF influence these judgments. By evaluating GPT-4 and several other LLMs against human ratings across five datasets and multiple prompting strategies, the authors reveal a robust, albeit biased, alignment in the sense of distinguishing statement types, while LLMs tend to overestimate profoundness (except Tk-Instruct, which underestimates). Few-shot prompts mitigate this bias for GPT-4, whereas chain-of-thought prompts have limited impact. The work highlights potential biases introduced by RLHF and discusses the cognitive mechanisms that may underlie these disparities, offering guidance for improving model interpretability and alignment. Overall, the findings emphasize both the capacity and the limits of current LLMs in tasks requiring assessment of meaning, with implications for AI safety and alignment research.

Abstract

Recent advancements in natural language processing by large language models (LLMs), such as GPT-4, have been suggested to approach Artificial General Intelligence. And yet, it is still under dispute whether LLMs possess similar reasoning abilities to humans. This study evaluates GPT-4 and various other LLMs in judging the profoundness of mundane, motivational, and pseudo-profound statements. We found a significant statement-to-statement correlation between the LLMs and humans, irrespective of the type of statements and the prompting technique used. However, LLMs systematically overestimate the profoundness of nonsensical statements, with the exception of Tk-instruct, which uniquely underestimates the profoundness of statements. Only few-shot learning prompts, as opposed to chain-of-thought prompting, draw LLMs ratings closer to humans. Furthermore, this work provides insights into the potential biases induced by Reinforcement Learning from Human Feedback (RLHF), inducing an increase in the bias to overestimate the profoundness of statements.

Large Language Models are biased to overestimate profoundness

TL;DR

This study investigates whether large language models assign profound meaning to statements, including pseudo-profound BS, and how prompting and RLHF influence these judgments. By evaluating GPT-4 and several other LLMs against human ratings across five datasets and multiple prompting strategies, the authors reveal a robust, albeit biased, alignment in the sense of distinguishing statement types, while LLMs tend to overestimate profoundness (except Tk-Instruct, which underestimates). Few-shot prompts mitigate this bias for GPT-4, whereas chain-of-thought prompts have limited impact. The work highlights potential biases introduced by RLHF and discusses the cognitive mechanisms that may underlie these disparities, offering guidance for improving model interpretability and alignment. Overall, the findings emphasize both the capacity and the limits of current LLMs in tasks requiring assessment of meaning, with implications for AI safety and alignment research.

Abstract

Recent advancements in natural language processing by large language models (LLMs), such as GPT-4, have been suggested to approach Artificial General Intelligence. And yet, it is still under dispute whether LLMs possess similar reasoning abilities to humans. This study evaluates GPT-4 and various other LLMs in judging the profoundness of mundane, motivational, and pseudo-profound statements. We found a significant statement-to-statement correlation between the LLMs and humans, irrespective of the type of statements and the prompting technique used. However, LLMs systematically overestimate the profoundness of nonsensical statements, with the exception of Tk-instruct, which uniquely underestimates the profoundness of statements. Only few-shot learning prompts, as opposed to chain-of-thought prompting, draw LLMs ratings closer to humans. Furthermore, this work provides insights into the potential biases induced by Reinforcement Learning from Human Feedback (RLHF), inducing an increase in the bias to overestimate the profoundness of statements.
Paper Structure (9 sections, 3 figures, 3 tables)

This paper contains 9 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Distribution of ratings per statement type in humans and the LLMs.
  • Figure 2: Profoundness ratings across all evaluation prompts for each LLM.
  • Figure 3: Overview of the profoundness assessment in humans, and the LLMs with 1-shot and 3-shot learning. Ratings are adjusted to be centered around the midpoint (3).