Comparing the Framing Effect in Humans and LLMs on Naturally Occurring Texts
Gili Lior, Liron Nacchace, Gabriel Stanovsky
TL;DR
This work investigates whether the framing effect observed in humans extends to large language models (LLMs) when processing naturally occurring text. The authors introduce WildFrame, a dataset of 1,000 base statements drawn from real Amazon reviews, each reframed to opposite sentiment using multiple prompts, with five human judgments per reframed item; eleven LLMs are evaluated on their alignment with human sentiment shifts rather than raw accuracy. Key findings show that all models exhibit framing sensitivity with a strong correspondence to human judgments ($r \geq 0.52$ overall), and both humans and models are more influenced by positive framing than negative framing. The results raise important questions about whether advancing LLMs should mimic human cognitive biases or prioritize framing-invariant fairness and consistency, highlighting a nuanced trade-off depending on application domain. WildFrame thus provides a principled benchmark for diagnosing how model architectures, training, and prompting influence susceptibility to framing, informing future design choices for task-specific alignment and robustness.
Abstract
Humans are influenced by how information is presented, a phenomenon known as the framing effect. Prior work suggests that LLMs may also be susceptible to framing, but it has relied on synthetic data and did not compare to human behavior. To address this gap, we introduce WildFrame - a dataset for evaluating LLM responses to positive and negative framing in naturally-occurring sentences, alongside human responses on the same data. WildFrame consists of 1,000 real-world texts selected to convey a clear sentiment; we then reframe each text in either a positive or negative light and collect human sentiment annotations. Evaluating eleven LLMs on WildFrame, we find that all models respond to reframing in a human-like manner ($r\geq0.52$), and that both humans and models are influenced more by positive than negative reframing. Notably, GPT models are the least correlated with human behavior among all tested models. These findings raise a discussion around the goals of state-of-the-art LLM development and whether models should align closely with human behavior, to preserve cognitive phenomena such as the framing effect, or instead mitigate such biases in favor of fairness and consistency.
