Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

Xinzhu Liu; Peiyan Li; Wenju Yang; Di Guo; Huaping Liu

Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

Xinzhu Liu, Peiyan Li, Wenju Yang, Di Guo, Huaping Liu

TL;DR

This work tackles heterogeneous ad hoc teamwork by enabling an ad hoc robot to join unknown teammates at any time without prior coordination. It introduces a decentralized framework that employs a training-free, LLM-based hierarchical planner with Interactive Reflection of Thoughts (IRoT) to generate sub-tasks and sub-skills, guided by visual semantic perception and inter-agent communication. Through a ProcTHOR-10K-based tidying-up benchmark and physical robot experiments, the approach demonstrates improved success, efficiency, and adaptability across diverse teammate policies and joining times, with ablations validating the IRoT components. The results indicate strong generalization to different tasks and practical viability for real-world open environments, paving the way for future integration with humans and more open-ended scenes.

Abstract

Compared with the widely investigated homogeneous multi-robot collaboration, heterogeneous robots with different capabilities can provide a more efficient and flexible collaboration for more complex tasks. In this paper, we consider a more challenging heterogeneous ad hoc teamwork collaboration problem where an ad hoc robot joins an existing heterogeneous team for a shared goal. Specifically, the ad hoc robot collaborates with unknown teammates without prior coordination, and it is expected to generate an appropriate cooperation policy to improve the efficiency of the whole team. To solve this challenging problem, we leverage the remarkable potential of the large language model (LLM) to establish a decentralized heterogeneous ad hoc teamwork collaboration framework that focuses on generating reasonable policy for an ad hoc robot to collaborate with original heterogeneous teammates. A training-free hierarchical dynamic planner is developed using the LLM together with the newly proposed Interactive Reflection of Thoughts (IRoT) method for the ad hoc agent to adapt to different teams. We also build a benchmark testing dataset to evaluate the proposed framework in the heterogeneous ad hoc multi-agent tidying-up task. Extensive comparison and ablation experiments are conducted in the benchmark to demonstrate the effectiveness of the proposed framework. We have also employed the proposed framework in physical robots in a real-world scenario. The experimental videos can be found at https://youtu.be/wHYP5T2WIp0.

Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

TL;DR

Abstract

Paper Structure (50 sections, 2 equations, 17 figures, 4 tables)

This paper contains 50 sections, 2 equations, 17 figures, 4 tables.

Introduction
Related Work
Heterogeneous Multi-Agent Tasks
Ad Hoc Teamwork
LLMs for Embodied Multi-Robot Tasks
Problem Formulation
Methodology of ad hoc Teamwork
Preliminary Information
Communication
Visual Semantic Perception
Dynamic Sub-Task Planner with IRoT
Generation of Candidate Sub-Tasks
Evaluation and Ranking
Interactive Reflection
Dynamic Sub-skill Planner
...and 35 more sections

Figures (17)

Figure 1: The illustration of heterogeneous ad hoc teamwork collaboration. The heterogeneous ad hoc agent joins any original team at any time from any location, and then the new team collaborates to finish the task. As illustrated in (a), at the start, an original team is randomly selected from the group of candidate teams to execute a task. When $t = t_0$, an ad hoc agent joins the team from any given location. The ad hoc agent seamlessly integrates into the team without prior coordination. Then, the new team collaborates and finally finishes the task. As illustrated in (b), the proposed framework can be applied to the physical validation experiments with real robots.
Figure 1: One example of analysis output of the dynamic sub-task planner with IRoT.
Figure 2: The overview of the proposed ad hoc teamwork framework. The ad hoc agent exchanges effective information with the original team through the communication interface. The visual semantic perception takes the RGB-D and poses as input to build the top-down semantic map and the hierarchical scene graph. The dynamic sub-task planner based on the LLM utilizes the newly proposed IRoT method to generate the next sub-task intention for the ad hoc agent. The dynamic sub-skill planner predicts the next sub-skill for the ad hoc agent. Then, low-level actions are generated to execute the planned sub-skill. The state assessment module evaluates the new state after the execution of low-level actions in the corresponding sub-skill and generates effective information to help re-plan the sub-task or sub-skill. The left part shows the overall structure. The right part takes a closer look into the hierarchical planner. The entire planning process of the sub-task planner is illustrated. The sub-task planner with the LLM generates three candidate sub-tasks (A, B, and C) in the generation stage and ranks them as B, A, and C in the evaluation stage. Two rounds judgement in the interactive reflection stage are shown in the right part. The selected sub-task B is judged to be infeasible in the first round. Then the sub-task A is judged to be reasonable in the second round and chosen as the current planned sub-task. The sub-skill planner takes the chosen sub-task as input and generate the analysis and planned sub-skill with the LLM.
Figure 2: Example of the analysis and output of the dynamic sub-skill planner.
Figure 3: The examples of communication messages.
...and 12 more figures

Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

TL;DR

Abstract

Leveraging Large Language Model for Heterogeneous Ad Hoc Teamwork Collaboration

Authors

TL;DR

Abstract

Table of Contents

Figures (17)