Table of Contents
Fetching ...

Comparing the Framing Effect in Humans and LLMs on Naturally Occurring Texts

Gili Lior, Liron Nacchace, Gabriel Stanovsky

TL;DR

This work investigates whether the framing effect observed in humans extends to large language models (LLMs) when processing naturally occurring text. The authors introduce WildFrame, a dataset of 1,000 base statements drawn from real Amazon reviews, each reframed to opposite sentiment using multiple prompts, with five human judgments per reframed item; eleven LLMs are evaluated on their alignment with human sentiment shifts rather than raw accuracy. Key findings show that all models exhibit framing sensitivity with a strong correspondence to human judgments ($r \geq 0.52$ overall), and both humans and models are more influenced by positive framing than negative framing. The results raise important questions about whether advancing LLMs should mimic human cognitive biases or prioritize framing-invariant fairness and consistency, highlighting a nuanced trade-off depending on application domain. WildFrame thus provides a principled benchmark for diagnosing how model architectures, training, and prompting influence susceptibility to framing, informing future design choices for task-specific alignment and robustness.

Abstract

Humans are influenced by how information is presented, a phenomenon known as the framing effect. Prior work suggests that LLMs may also be susceptible to framing, but it has relied on synthetic data and did not compare to human behavior. To address this gap, we introduce WildFrame - a dataset for evaluating LLM responses to positive and negative framing in naturally-occurring sentences, alongside human responses on the same data. WildFrame consists of 1,000 real-world texts selected to convey a clear sentiment; we then reframe each text in either a positive or negative light and collect human sentiment annotations. Evaluating eleven LLMs on WildFrame, we find that all models respond to reframing in a human-like manner ($r\geq0.52$), and that both humans and models are influenced more by positive than negative reframing. Notably, GPT models are the least correlated with human behavior among all tested models. These findings raise a discussion around the goals of state-of-the-art LLM development and whether models should align closely with human behavior, to preserve cognitive phenomena such as the framing effect, or instead mitigate such biases in favor of fairness and consistency.

Comparing the Framing Effect in Humans and LLMs on Naturally Occurring Texts

TL;DR

This work investigates whether the framing effect observed in humans extends to large language models (LLMs) when processing naturally occurring text. The authors introduce WildFrame, a dataset of 1,000 base statements drawn from real Amazon reviews, each reframed to opposite sentiment using multiple prompts, with five human judgments per reframed item; eleven LLMs are evaluated on their alignment with human sentiment shifts rather than raw accuracy. Key findings show that all models exhibit framing sensitivity with a strong correspondence to human judgments ( overall), and both humans and models are more influenced by positive framing than negative framing. The results raise important questions about whether advancing LLMs should mimic human cognitive biases or prioritize framing-invariant fairness and consistency, highlighting a nuanced trade-off depending on application domain. WildFrame thus provides a principled benchmark for diagnosing how model architectures, training, and prompting influence susceptibility to framing, informing future design choices for task-specific alignment and robustness.

Abstract

Humans are influenced by how information is presented, a phenomenon known as the framing effect. Prior work suggests that LLMs may also be susceptible to framing, but it has relied on synthetic data and did not compare to human behavior. To address this gap, we introduce WildFrame - a dataset for evaluating LLM responses to positive and negative framing in naturally-occurring sentences, alongside human responses on the same data. WildFrame consists of 1,000 real-world texts selected to convey a clear sentiment; we then reframe each text in either a positive or negative light and collect human sentiment annotations. Evaluating eleven LLMs on WildFrame, we find that all models respond to reframing in a human-like manner (), and that both humans and models are influenced more by positive than negative reframing. Notably, GPT models are the least correlated with human behavior among all tested models. These findings raise a discussion around the goals of state-of-the-art LLM development and whether models should align closely with human behavior, to preserve cognitive phenomena such as the framing effect, or instead mitigate such biases in favor of fairness and consistency.

Paper Structure

This paper contains 35 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: The WildFrame data construction process. In step (a) we extract statements based on their syntactic structure, aiming for statements with clear negative or positive sentiment. Next, in (b), we reframe the statement by adding a suffix or prefix, conveying the opposite sentiment. Finally, in (c), five annotators mark the sentiment of the reframed statement, counting how many annotators shift sentiment, i.e., the reframed statement sentiment is opposite to the base sentiment. The red parts in the figure represent negative parts of statement, while green represents positive parts.
  • Figure 2: Distribution of sentiment scores before and after applying opposite-sentiment framing, as detailed in Section \ref{['sec:adding-framing']}. Prior to framing, base sentences exhibit a clear polarity (positive or negative), whereas the application of opposite framing introduces ambiguity, shifting the sentiment scores toward a less distinct polarity.
  • Figure 3: Pairwise agreement rates between LLMs and human annotators. The matrix is divided into model-to-model (top-left), human-to-human (bottom-right), and cross-comparisons. The "In-house" row represents labels from controlled graduate student annotators, serving as a quality anchor. The visual density shows that the model cluster is significantly tighter (higher agreement) than the human cluster, which displays the natural variance expected in subjective tasks.
  • Figure 4: Percentage of reframed statements that results in sentiment shift positive to negative (or vice versa). Red represents negative-base statements reframed as positive, and green represents positive-base statements reframed as negative. Horizontal lines show the mean across models. We find that both LLMs and humans are influenced by opposite framing, with a stronger effect for positive reframing.
  • Figure 5: Pearson correlation coefficients between human sentiment shifts and predictions from various LLMs after applying opposite sentiment framing. Higher values indicate stronger alignment between the model's behavior and human annotations.
  • ...and 8 more figures