Table of Contents
Fetching ...

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

Jonathan Light, Sixue Xing, Yuanzhe Liu, Weiqin Chen, Min Cai, Xiusi Chen, Guanzhi Wang, Wei Cheng, Yisong Yue, Ziniu Hu

TL;DR

This work proposes a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation, and shows that this method works well on two different games that challenge the planning and decision making skills of the agent.

Abstract

Effective extraction of the world knowledge in LLMs for complex decision-making tasks remains a challenge. We propose a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation. Given only the natural language description of the game and how input observations are formatted, our method can generate a working world model for fast and efficient MCTS simulation. We show that our method works well on two different games that challenge the planning and decision making skills of the agent for both language and non-language based action taking, without any training on domain-specific training data or explicitly defined world model.

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

TL;DR

This work proposes a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation, and shows that this method works well on two different games that challenge the planning and decision making skills of the agent.

Abstract

Effective extraction of the world knowledge in LLMs for complex decision-making tasks remains a challenge. We propose a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation. Given only the natural language description of the game and how input observations are formatted, our method can generate a working world model for fast and efficient MCTS simulation. We show that our method works well on two different games that challenge the planning and decision making skills of the agent for both language and non-language based action taking, without any training on domain-specific training data or explicitly defined world model.

Paper Structure

This paper contains 18 sections, 3 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of PIANIST. Starting with just the game description, the LLM generates a complete multi-agent, partial information world model, which can then be used for planning via search.
  • Figure 2: Integrating PIANIST components with MCTS. The realization function samples a hidden state for simulation, while the transition, action, and partition functions are used to expand new states. States are selected based on UCT values, aggregated across information sets for partial information. Though the diagram shows values for a single player, in practice, values for all players are inferred and updated simultaneously. See App. \ref{['sec:mcts_details']} for details and Fig. \ref{['fig:generation_graph']} for generation order.
  • Figure 3: Directed generation graph for PIANIST. We display the sequential generation order for the various components of PIANIST, with dependencies shown by directed arrows. Generating and testing objects in this order minimizes the probability of execution failure. The initial information set representation is given by the environment to allow an unified interface with the environment. Modularization also means we can test each component individually.