Table of Contents
Fetching ...

Enhancing Agent Learning through World Dynamics Modeling

Zhiyuan Sun, Haochen Shi, Marc-Alexandre Côté, Glen Berseth, Xingdi Yuan, Bang Liu

TL;DR

The results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.

Abstract

Large language models (LLMs) have been increasingly applied to tasks in language understanding and interactive decision-making, with their impressive performance largely attributed to the extensive domain knowledge embedded within them. However, the depth and breadth of this knowledge can vary across domains. Many existing approaches assume that LLMs possess a comprehensive understanding of their environment, often overlooking potential gaps in their grasp of actual world dynamics. To address this, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the accuracy of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we assess the impact of each component on performance and compare the dynamics generated by DiVE to human-annotated dynamics. Our results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.

Enhancing Agent Learning through World Dynamics Modeling

TL;DR

The results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.

Abstract

Large language models (LLMs) have been increasingly applied to tasks in language understanding and interactive decision-making, with their impressive performance largely attributed to the extensive domain knowledge embedded within them. However, the depth and breadth of this knowledge can vary across domains. Many existing approaches assume that LLMs possess a comprehensive understanding of their environment, often overlooking potential gaps in their grasp of actual world dynamics. To address this, we introduce Discover, Verify, and Evolve (DiVE), a framework that discovers world dynamics from a small number of demonstrations, verifies the accuracy of these dynamics, and evolves new, advanced dynamics tailored to the current situation. Through extensive evaluations, we assess the impact of each component on performance and compare the dynamics generated by DiVE to human-annotated dynamics. Our results show that LLMs guided by DiVE make more informed decisions, achieving rewards comparable to human players in the Crafter environment and surpassing methods that require prior task-specific training in the MiniHack environment.
Paper Structure (49 sections, 2 equations, 7 figures, 6 tables)

This paper contains 49 sections, 2 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: The knowledge gap between LLMs and downstream domains. Although LLMs have a broad understanding of the world, they may struggle to grasp the complex dynamics of specific downstream domains.
  • Figure 2: Overall pipeline of DiVE. Left: Learning basic game dynamics from offline demonstrations (Section \ref{['sec:method:offline']}). We want to highlight the incorrect game dynamics being identified by the Verifier (labeled by $\times$), they are evidence of the LLMs hallucinate false facts perhaps because of memorizing Minecraft data. Right: Learning situational strategies from online interactions (Section \ref{['sec:method:online']}). For simplicity, we omit the verbalization process in the right figure.
  • Figure 3: Recall of learned dynamics over discovery steps, presented with mean and standard deviation, in the Crafter environment.
  • Figure 4: Precision of learned dynamics before and after verification in the Crafter environment.
  • Figure 5: Precision of verified dynamics over verified steps
  • ...and 2 more figures