Table of Contents
Fetching ...

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

Zijing Shi, Meng Fang, Shunfeng Zheng, Shilong Deng, Ling Chen, Yali Du

TL;DR

This work tackles ad hoc teamwork in language-driven multi-agent settings by introducing AvalonPlay, a multi-round Avalon-based benchmark where a learner must deduce teammates' hidden roles with limited information. It presents CodeAct, a general LLM agent combining memory retrieval, code-driven reasoning, and a self-debugging interpreter to rapidly adapt to new teammates without predesigned coordination protocols. Experimental results show that CodeAct outperforms semantic prompting strategies (CoT, ReAct) and that GPT-4 most effectively facilitates AHT, though memory forgetting and hallucinations remain pervasive challenges. The study highlights the importance of factual memory and programmable reasoning in robust, on-the-fly collaboration, and outlines future work on autonomous communication and fact verification.

Abstract

Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without established coordination protocols, requiring them to make intelligent inferences about teammates from limited data. This problem motivates the area of ad hoc teamwork, in which an agent may potentially cooperate with a variety of teammates to achieve a shared goal. Our study focuses on the ad hoc teamwork problem where the agent operates in an environment driven by natural language. Our findings reveal the potential of LLM agents in team collaboration, highlighting issues related to hallucinations in communication. To address this issue, we develop CodeAct, a general agent that equips LLM with enhanced memory and code-driven reasoning, enabling the repurposing of partial information for rapid adaptation to new teammates.

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game

TL;DR

This work tackles ad hoc teamwork in language-driven multi-agent settings by introducing AvalonPlay, a multi-round Avalon-based benchmark where a learner must deduce teammates' hidden roles with limited information. It presents CodeAct, a general LLM agent combining memory retrieval, code-driven reasoning, and a self-debugging interpreter to rapidly adapt to new teammates without predesigned coordination protocols. Experimental results show that CodeAct outperforms semantic prompting strategies (CoT, ReAct) and that GPT-4 most effectively facilitates AHT, though memory forgetting and hallucinations remain pervasive challenges. The study highlights the importance of factual memory and programmable reasoning in robust, on-the-fly collaboration, and outlines future work on autonomous communication and fact verification.

Abstract

Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without established coordination protocols, requiring them to make intelligent inferences about teammates from limited data. This problem motivates the area of ad hoc teamwork, in which an agent may potentially cooperate with a variety of teammates to achieve a shared goal. Our study focuses on the ad hoc teamwork problem where the agent operates in an environment driven by natural language. Our findings reveal the potential of LLM agents in team collaboration, highlighting issues related to hallucinations in communication. To address this issue, we develop CodeAct, a general agent that equips LLM with enhanced memory and code-driven reasoning, enabling the repurposing of partial information for rapid adaptation to new teammates.
Paper Structure (31 sections, 8 figures, 4 tables)

This paper contains 31 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: A flowchart of the AvalonPlay benchmark showing team sides and roles on the left and a detailed round pipeline on the right. Each round includes four stages: leader assignment, team selection, discussion and voting, quest execution.
  • Figure 2: An overview of the proposed CodeAct agent as the leader during team selection. We begins by establishing a memory retrieval system that distills information from past interactions, enabling the agent to access relevant information. Then, we integrate code-driven reasoning with action to determine teammate roles effectively. Finally, we employ an interpreter to execute the generated code, equipping the agent with self-debug capabilities.
  • Figure 3: An example of the CodeAct agent's prompt.
  • Figure 4: An example of the CodeAct agent's output.
  • Figure 5: The results of different models acting as the good and evil sides, playing games against each other. Under each setting, 30 games were conducted, totaling 150 quests.
  • ...and 3 more figures