Table of Contents
Fetching ...

Language Model Goal Selection Differs from Humans' in an Open-Ended Task

Gaia Molinaro, Dave August, Danielle Perszyk, Anne G. E. Collins

TL;DR

This work directly assess the validity of LLMs as proxies for human goal selection in a controlled, open-ended learning task borrowed from cognitive science, finding substantial divergence from human behavior.

Abstract

As large language models (LLMs) get integrated into human decision-making, they are increasingly choosing goals autonomously rather than only completing human-defined ones, assuming they will reflect human preferences. However, human-LLM similarity in goal selection remains largely untested. We directly assess the validity of LLMs as proxies for human goal selection in a controlled, open-ended learning task borrowed from cognitive science. Across four state-of-the-art models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur), we find substantial divergence from human behavior. While people gradually explore and learn to achieve goals with diversity across individuals, most models exploit a single identified solution (reward hacking) or show surprisingly low performance, with distinct patterns across models and little variability across instances of the same model. Even Centaur, explicitly trained to emulate humans in experimental settings, poorly captures people's goal selection. Chain-of-thought reasoning and persona steering provide limited improvements. These findings highlight the uniqueness of human goal selection, cautioning against replacing it with current models in applications such as personal assistance, scientific discovery, and policy research.

Language Model Goal Selection Differs from Humans' in an Open-Ended Task

TL;DR

This work directly assess the validity of LLMs as proxies for human goal selection in a controlled, open-ended learning task borrowed from cognitive science, finding substantial divergence from human behavior.

Abstract

As large language models (LLMs) get integrated into human decision-making, they are increasingly choosing goals autonomously rather than only completing human-defined ones, assuming they will reflect human preferences. However, human-LLM similarity in goal selection remains largely untested. We directly assess the validity of LLMs as proxies for human goal selection in a controlled, open-ended learning task borrowed from cognitive science. Across four state-of-the-art models (GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, and Centaur), we find substantial divergence from human behavior. While people gradually explore and learn to achieve goals with diversity across individuals, most models exploit a single identified solution (reward hacking) or show surprisingly low performance, with distinct patterns across models and little variability across instances of the same model. Even Centaur, explicitly trained to emulate humans in experimental settings, poorly captures people's goal selection. Chain-of-thought reasoning and persona steering provide limited improvements. These findings highlight the uniqueness of human goal selection, cautioning against replacing it with current models in applications such as personal assistance, scientific discovery, and policy research.
Paper Structure (28 sections, 11 figures, 1 table)

This paper contains 28 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Task structure. Top: screenshots from the visual version of the task developed for human participants. Bottom: schematic representation of goals and their characteristics, with hierarchical relationships highlighted. Reproduced with permission from molinaro2024latent.
  • Figure 2: Example action and goal selection choices. Each subplot represents a single human participant or model simulation (two examples per type to illustrate variability) over the practice and learning phases of the task (separated by a dotted line). Each dot shows the index of the particular sequence of actions selected. Ingredients representing pre-made potions were labeled 4-7 for clearer visualization. Human participants tended to first focus on one goal, then rehearse all in cycles. Models often focused on one or a few potions, either maximizing rewards or themselves despite negative feedback. Gemini 2.5. Pro stood out for a much more human-like pattern of behavior, although learning tended to proceed faster without human-like strategic hypothesis testing.
  • Figure 3: Example goal position choices in humans and models. Each column illustrates the goal selection of a single human participant or model output with two examples. Each dot aligns with a particular trial number, and shows (in color and y-axis), the position on the screen (for humans) or index in the list (for models) of the selected goal. Humans -- and, to a smaller extent, Gemini 2.5. Pro -- frequently cycled through goals to practice the correct solutions. By contrast, models tended to stick to one potion, often the first listed.
  • Figure 4: Performance across task phases. Top: average performance in the practice, early learning, late learning, and test blocks (note that Centaur's out-of-distribution score was 0). Bottom, first subplot: learning curve. Bottom, following subplots: sorted individual participant scores, with the x-axis normalized by the number of participants, such that it represents the proportion of participants with a score equal to or lower than the current y. Error bars and shading indicate the S.E.M.
  • Figure 5: Distributions of goal and action selection behaviors. Sorted individual scores over the normalized subject number for various aspects of goal (first five subplots) and action selection within repeated goals (right-most subplot) in humans and models.
  • ...and 6 more figures