Table of Contents
Fetching ...

Monte Carlo Planning with Large Language Model for Text-Based Game Agents

Zijing Shi, Meng Fang, Ling Chen

TL;DR

This work introduces MC-DML, a Monte Carlo planning framework that integrates a Large Language Model with dynamic in-trial and cross-trial memory to guide action evaluation in text-based games. By embedding LLM reasoning into the MCTS expansion via a learned prior and memory-informed signals, MC-DML addresses the limitations of traditional planning-then-learning approaches and the brittle exploration of pure LLM policies. Empirical results on the Jericho benchmark show that MC-DML achieves superior performance in the initial planning phase across multiple games, including challenging bottleneck scenarios, and ablations demonstrate the critical role of both memory components and dynamic pruning. The approach advances language-grounded planning by enabling more sample-efficient, memory-aware decision making in partially observable, high-branching environments, with potential implications for broader language-conditioned planning tasks.

Abstract

Text-based games provide valuable environments for language-based autonomous agents. However, planning-then-learning paradigms, such as those combining Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), are notably time-consuming due to extensive iterations. Additionally, these algorithms perform uncertainty-driven exploration but lack language understanding and reasoning abilities. In this paper, we introduce the Monte Carlo planning with Dynamic Memory-guided Large language model (MC-DML) algorithm. MC-DML leverages the language understanding and reasoning capabilities of Large Language Models (LLMs) alongside the exploratory advantages of tree search algorithms. Specifically, we enhance LLMs with in-trial and cross-trial memory mechanisms, enabling them to learn from past experiences and dynamically adjust action evaluations during planning. We conduct experiments on a series of text-based games from the Jericho benchmark. Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase, outperforming strong contemporary methods that require multiple iterations. This demonstrates the effectiveness of our algorithm, paving the way for more efficient language-grounded planning in complex environments.

Monte Carlo Planning with Large Language Model for Text-Based Game Agents

TL;DR

This work introduces MC-DML, a Monte Carlo planning framework that integrates a Large Language Model with dynamic in-trial and cross-trial memory to guide action evaluation in text-based games. By embedding LLM reasoning into the MCTS expansion via a learned prior and memory-informed signals, MC-DML addresses the limitations of traditional planning-then-learning approaches and the brittle exploration of pure LLM policies. Empirical results on the Jericho benchmark show that MC-DML achieves superior performance in the initial planning phase across multiple games, including challenging bottleneck scenarios, and ablations demonstrate the critical role of both memory components and dynamic pruning. The approach advances language-grounded planning by enabling more sample-efficient, memory-aware decision making in partially observable, high-branching environments, with potential implications for broader language-conditioned planning tasks.

Abstract

Text-based games provide valuable environments for language-based autonomous agents. However, planning-then-learning paradigms, such as those combining Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), are notably time-consuming due to extensive iterations. Additionally, these algorithms perform uncertainty-driven exploration but lack language understanding and reasoning abilities. In this paper, we introduce the Monte Carlo planning with Dynamic Memory-guided Large language model (MC-DML) algorithm. MC-DML leverages the language understanding and reasoning capabilities of Large Language Models (LLMs) alongside the exploratory advantages of tree search algorithms. Specifically, we enhance LLMs with in-trial and cross-trial memory mechanisms, enabling them to learn from past experiences and dynamically adjust action evaluations during planning. We conduct experiments on a series of text-based games from the Jericho benchmark. Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase, outperforming strong contemporary methods that require multiple iterations. This demonstrates the effectiveness of our algorithm, paving the way for more efficient language-grounded planning in complex environments.

Paper Structure

This paper contains 36 sections, 3 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: An example bottleneck state from the game Zork1.
  • Figure 2: A comparison of the PUCT and MC-DML algorithms. PUCT trains its policy through imitation learning from self-play data. In contrast, MC-DML uses a LLM as the initial policy. During planning, the LLM learns from past failure trajectories and adjusts the action value estimates. This approach more closely aligns with the human thought process.