Table of Contents
Fetching ...

Evaluating the Goal-Directedness of Large Language Models

Tom Everitt, Cristina Garbacea, Alexis Bellot, Jonathan Richens, Henry Papadatos, Siméon Campos, Rohin Shah

TL;DR

This work defines and quantifies goal-directedness (GD) for large language models as their propensity to apply relevant capabilities toward an explicit goal, distinct from raw task performance. It introduces a formal, capability-conditioned GD metric and an open-source evaluation framework implemented in a Blocksworld environment, with composite tasks covering information gathering, cognitive effort, plan execution, and their combination. Across state-of-the-art models, GD is not fully achieved and remains relatively stable across tasks, while being only modestly sensitive to motivational prompts; GD diverges from regret and context-length effects, underscoring GD as a meaningful agentic property rather than mere performance. The framework supports monitoring of LLM progress and can guide deliberate design choices around agentic properties, with important safety and ethics implications for autonomous AI systems.

Abstract

To what extent do LLMs use their capabilities towards their given goal? We take this as a measure of their goal-directedness. We evaluate goal-directedness on tasks that require information gathering, cognitive effort, and plan execution, where we use subtasks to infer each model's relevant capabilities. Our evaluations of LLMs from Google DeepMind, OpenAI, and Anthropic show that goal-directedness is relatively consistent across tasks, differs from task performance, and is only moderately sensitive to motivational prompts. Notably, most models are not fully goal-directed. We hope our goal-directedness evaluations will enable better monitoring of LLM progress, and enable more deliberate design choices of agentic properties in LLMs.

Evaluating the Goal-Directedness of Large Language Models

TL;DR

This work defines and quantifies goal-directedness (GD) for large language models as their propensity to apply relevant capabilities toward an explicit goal, distinct from raw task performance. It introduces a formal, capability-conditioned GD metric and an open-source evaluation framework implemented in a Blocksworld environment, with composite tasks covering information gathering, cognitive effort, plan execution, and their combination. Across state-of-the-art models, GD is not fully achieved and remains relatively stable across tasks, while being only modestly sensitive to motivational prompts; GD diverges from regret and context-length effects, underscoring GD as a meaningful agentic property rather than mere performance. The framework supports monitoring of LLM progress and can guide deliberate design choices around agentic properties, with important safety and ethics implications for autonomous AI systems.

Abstract

To what extent do LLMs use their capabilities towards their given goal? We take this as a measure of their goal-directedness. We evaluate goal-directedness on tasks that require information gathering, cognitive effort, and plan execution, where we use subtasks to infer each model's relevant capabilities. Our evaluations of LLMs from Google DeepMind, OpenAI, and Anthropic show that goal-directedness is relatively consistent across tasks, differs from task performance, and is only moderately sensitive to motivational prompts. Notably, most models are not fully goal-directed. We hope our goal-directedness evaluations will enable better monitoring of LLM progress, and enable more deliberate design choices of agentic properties in LLMs.

Paper Structure

This paper contains 49 sections, 1 equation, 11 figures, 4 algorithms.

Figures (11)

  • Figure 1: How motivated are LLMs to do their tasks well? Do they sometimes slack off? Left: the model uses many measurements to get an accurate estimate. Right: as part of a larger task, it fails to fully employ this capability.
  • Figure 2: Tasks and subtasks
  • Figure 3: Goal-directedness of models across main evaluation tasks. No model is fully goal-directed on Information Gathering and the Combined Task. Goal-directedness remains relatively consistent across tasks. Models that failed to understand a task have been dropped.
  • Figure 4: Measurements per block. Models consistently spend less time estimating block heights when its part of a larger task, as opposed to when its the only task.
  • Figure 5: Regret on main tasks (lower is better). The left-to-right trend is weak, so task performance differs from goal-directedness.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Definition 3.1: Goal-directedness