Table of Contents
Fetching ...

Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence

Zachary Ravichandran, Fernando Cladera, Ankit Prabhu, Jason Hughes, Varun Murali, Camillo Taylor, George J. Pappas, Vijay Kumar

TL;DR

SPINE-HT addresses the challenge of heterogeneous robot collaboration in unstructured environments by grounding large-language-model reasoning in the physical and semantic context of a robot team. It introduces a three-stage loop—grounded subtask generation with plan validation, capability-based assignment via a linear-programming formulation, and online refinement from semantic-map feedback—operating on a DAG of subtasks to respect dependencies. The system leverages open-set semantic mapping, a feedback-driven world model, and closed-loop plan validation to achieve robust operation in real-world conditions, demonstrated across four platforms and through extensive simulations and field experiments. Results show substantial improvements in mission success and planning efficiency compared with prior LLM-enabled planners, indicating practical potential for scalable, language-driven multi-robot teamwork in open worlds.

Abstract

Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams frequently operate in unstructured environments -- uncertain, open-world settings without prior maps -- subtasks must be grounded in robot capabilities and the physical world. While heterogeneous teams have typically been designed for fixed specifications, generative intelligence opens the possibility of teams that can accomplish a wide range of missions described in natural language. However, current large language model (LLM)-enabled teaming methods typically assume well-structured and known environments, limiting deployment in unstructured environments. We present SPINE-HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team through a three-stage process. Given language specifications describing mission goals and team capabilities, an LLM generates grounded subtasks which are validated for feasibility. Subtasks are then assigned to robots based on capabilities such as traversability or perception and refined given feedback collected during online operation. In simulation experiments with closed-loop perception and control, our framework achieves nearly twice the success rate compared to prior LLM-enabled heterogeneous teaming approaches. In real-world experiments with a Clearpath Jackal, a Clearpath Husky, a Boston Dynamics Spot, and a high-altitude UAV, our method achieves an 87\% success rate in missions requiring reasoning about robot capabilities and refining subtasks with online feedback. More information is provided at https://zacravichandran.github.io/SPINE-HT.

Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence

TL;DR

SPINE-HT addresses the challenge of heterogeneous robot collaboration in unstructured environments by grounding large-language-model reasoning in the physical and semantic context of a robot team. It introduces a three-stage loop—grounded subtask generation with plan validation, capability-based assignment via a linear-programming formulation, and online refinement from semantic-map feedback—operating on a DAG of subtasks to respect dependencies. The system leverages open-set semantic mapping, a feedback-driven world model, and closed-loop plan validation to achieve robust operation in real-world conditions, demonstrated across four platforms and through extensive simulations and field experiments. Results show substantial improvements in mission success and planning efficiency compared with prior LLM-enabled planners, indicating practical potential for scalable, language-driven multi-robot teamwork in open worlds.

Abstract

Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams frequently operate in unstructured environments -- uncertain, open-world settings without prior maps -- subtasks must be grounded in robot capabilities and the physical world. While heterogeneous teams have typically been designed for fixed specifications, generative intelligence opens the possibility of teams that can accomplish a wide range of missions described in natural language. However, current large language model (LLM)-enabled teaming methods typically assume well-structured and known environments, limiting deployment in unstructured environments. We present SPINE-HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team through a three-stage process. Given language specifications describing mission goals and team capabilities, an LLM generates grounded subtasks which are validated for feasibility. Subtasks are then assigned to robots based on capabilities such as traversability or perception and refined given feedback collected during online operation. In simulation experiments with closed-loop perception and control, our framework achieves nearly twice the success rate compared to prior LLM-enabled heterogeneous teaming approaches. In real-world experiments with a Clearpath Jackal, a Clearpath Husky, a Boston Dynamics Spot, and a high-altitude UAV, our method achieves an 87\% success rate in missions requiring reasoning about robot capabilities and refining subtasks with online feedback. More information is provided at https://zacravichandran.github.io/SPINE-HT.

Paper Structure

This paper contains 26 sections, 1 equation, 11 figures, 10 tables.

Figures (11)

  • Figure 1: SPINE-HT takes as input mission and team specifications in natural language. SPINE-HT then generates grounded subtasks that are validated for realizability while preserving dependencies, assigns subtasks based on robot capabilities, then refines subtasks given robot feedback acquired during online operation.
  • Figure 2: SPINE-HT takes as input natural language specifying mission goals and team capabilities. Our framework then uses an LLM to generate grounded subtasks with their dependency ordering (§\ref{['sec:method_decomp']}). Subtasks are then assigned to robots based on capability (§\ref{['sec:method_assign']}). Feedback comprising semantic map updates outcomes are collected and used for subtask adaptation (§\ref{['sec:method_feedback']}).
  • Figure 3: Our framework aggregates updates from heterogeneous robots into a common semantic map used for subtask refinement.
  • Figure 4: Evaluation environment: real semi-urban office park (top) and three urban, semi-urban, and rural simulation environments (bottom).
  • Figure 5: Example real-world result where the planner must infer grounded subtasks, reason about robot capabilities, and refine subtasks online based.
  • ...and 6 more figures