Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Yun Qu; Boyuan Wang; Yuhang Jiang; Jianzhun Shao; Yixiu Mao; Cheems Wang; Chang Liu; Xiangyang Ji

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Yun Qu, Boyuan Wang, Yuhang Jiang, Jianzhun Shao, Yixiu Mao, Cheems Wang, Chang Liu, Xiangyang Ji

TL;DR

This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration, which outperforms existing SOTA approaches on the challenging benchmarks by a large margin.

Abstract

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

TL;DR

Abstract

Paper Structure (45 sections, 1 theorem, 11 equations, 17 figures, 9 tables, 2 algorithms)

This paper contains 45 sections, 1 theorem, 11 equations, 17 figures, 9 tables, 2 algorithms.

Introduction
Preliminary
Related Works
LLM in Decision Making.
Efficient Multi-Agent Exploration.
Method
Devil is in the Key States
Key States Localization with LLM
Key State-Guided Exploration
Subspace-based Hindsight Intrinsic Reward
Key States Memory Tree
Experiments
Multiple-Particle Environment (MPE)
StarCraft Multi-Agent Challenge (SMAC)
Compatiblility with Various Algorithms
...and 30 more sections

Key Result

Proposition 4.1

Consider the one-dimensional asymmetric random walk problem, where an agent starts at $x=0$ and aims to reach $x=N\in\mathbb{N^+}, N>1$. The initial policy is asymmetric and random with probabilities $p\in (0.5,1)$ and $1-p$ for right and left movements, respectively. Without prior knowledge, the ex

Figures (17)

Figure 1: (a) The map of the task Pass. Two agents are initially positioned in the left room, requiring cooperation to explore the rooms, uncover the hidden switches, and move to the right room. (b) The key states ($\kappa_1$ and $\kappa_2$) generated by LLM for the task Pass, where the superscripts $A,B$ of $\kappa_i$ denote two agents Alice and Bob. (c) Visitation Map (log scale) of SOTA baseline method CMAE. (d) Visitation Map (log scale) of our method LEMAE. Our method exhibits a significant reduction in redundant exploration. Furthermore, an organic division of labor among agents emerges.
Figure 1: Ablation studies on Self-Check mechanism and LLMs. We compare the performance of two LLMs (GPT-4-turbo and GPT-3.5-turbo), recording the Acceptance Rate ($r_{acc}$) and Execution Rate ($r_{exe}$) in ten runs of the generated discriminator functions. w/o denotes the absence of our Self-Check mechanism.
Figure 2: Overview of the training process. (a) Key States Localization with LLM: We devise a set of prompts to guide LLM in localizing key states based on task-specific information. Refinements of the response are achieved through iterative self-checks by LLM. Subsequently, discriminator functions are derived from the final response to discriminate key states within trajectories. (b) Key States-Guided Exploration: Using the achieved key states chain within the processed trajectory, we look up KSMT to get the most probable next key states. By sampling from them as the subgoal for the concluding sub-trajectory, we integrate intrinsic rewards into the overall trajectory using SHIR.
Figure 3: Evaluating LEMAE against baseline methods on four MPE maps with sparse rewards, using test win rate as the evaluation metric. The acceleration rate refers to how much faster LEMAE finds the success state compared to CMAE.
Figure 4: (a) Key states discrimination functions generated on task Pass. (b) The map of Secret-Room with key states: $\kappa_1$ represents occupying the left switch to open all doors, while $\kappa_2$, $\kappa_3$, and $\kappa_4$ represent exploring right rooms 1, 2, and 3, respectively. The directional arrows symbolize the transitional relationships within KSMT. (c) The key states number curve in Secret-Room shows that LEMAE can identify all key states and proficiently prune task-irrelevant ones.
...and 12 more figures

Theorems & Definitions (2)

Proposition 4.1
proof

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

TL;DR

Abstract

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Exploration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (2)