Table of Contents
Fetching ...

Mental Modeling of Reinforcement Learning Agents by Language Models

Wenhao Lu, Xufeng Zhao, Josua Spisak, Jae Hee Lee, Stefan Wermter

TL;DR

This study empirically examines how well large language models can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history, and provides new insights into the capabilities and limitations of modern LLMs.

Abstract

Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical world. This study empirically examines, for the first time, how well large language models (LLMs) can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research may unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in eXplainable reinforcement learning (XRL). To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully mental modelling agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs.

Mental Modeling of Reinforcement Learning Agents by Language Models

TL;DR

This study empirically examines how well large language models can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history, and provides new insights into the capabilities and limitations of modern LLMs.

Abstract

Can emergent language models faithfully model the intelligence of decision-making agents? Though modern language models exhibit already some reasoning ability, and theoretically can potentially express any probable distribution over tokens, it remains underexplored how the world knowledge these pretrained models have memorized can be utilized to comprehend an agent's behaviour in the physical world. This study empirically examines, for the first time, how well large language models (LLMs) can build a mental model of agents, termed agent mental modelling, by reasoning about an agent's behaviour and its effect on states from agent interaction history. This research may unveil the potential of leveraging LLMs for elucidating RL agent behaviour, addressing a key challenge in eXplainable reinforcement learning (XRL). To this end, we propose specific evaluation metrics and test them on selected RL task datasets of varying complexity, reporting findings on agent mental model establishment. Our results disclose that LLMs are not yet capable of fully mental modelling agents through inference alone without further innovations. This work thus provides new insights into the capabilities and limitations of modern LLMs.

Paper Structure

This paper contains 36 sections, 24 figures, 6 tables, 1 algorithm.

Figures (24)

  • Figure 1: A conception of LLMs approximating the agent's mental model for facilitating end-users understanding of the agent.
  • Figure 2: An overview of the LLM-Xavier workflow for offline evaluating LLMs' understanding of RL agent.
  • Figure 3: Comparative plots of LLMs' performance on various tasks with different history sizes (with indexed history in prompts): top for MountainCar task, middle for Acrobot task, bottom for Pendulum task with continuous action prediction. A description of these scenarios can be found in Appendix \ref{['sec:full-task-description']}
  • Figure 4: Comparison of models' performance in predicting absolute action values and action bins for the Pendulum task. Hatching indicates numeric prediction accuracy ("No Bins"); reduced transparency indicates using indexed history in prompts.
  • Figure 5: Dynamics of LLMs' performance on predicting individual state element for the MountainCar task (with indexed history in prompts).
  • ...and 19 more figures