Table of Contents
Fetching ...

FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams

Corban Rivera, Grayson Byrd, Meghan Booker, Bethany Kemp, Allison Gaines, Emma Holmes, James Uplinger, Celso M de Melo, David Handelman

TL;DR

FLEET addresses the challenge of coordinating heterogeneous robot teams guided by natural language by integrating an LLM front-end that outputs a task graph with durations and a capability-aware fitness matrix with a formal MILP (and fallback Auction) scheduler to minimize makespan under precedence and resource constraints. It enables closed-loop execution where robots stream status to a world model and trigger replanning when deviations occur, maintaining interpretable artifacts such as the task graph, fitness matrix, and schedule. Across PARTNR simulations and hardware trials with two Spots, FLEET outperforms purely generative planners, particularly on heterogeneous tasks, and ablations demonstrate that MILP and fitness contributions are complementary. The results highlight a practical path toward reliable, linguistically guided multi-robot coordination in open-world environments with safety and efficiency benefits in real hardware.

Abstract

Coordinating heterogeneous robot teams from free-form natural-language instructions is hard. Language-only planners struggle with long-horizon coordination and hallucination, while purely formal methods require closed-world models. We present FLEET, a hybrid decentralized framework that turns language into optimized multi-robot schedules. An LLM front-end produces (i) a task graph with durations and precedence and (ii) a capability-aware robot--task fitness matrix; a formal back-end solves a makespan-minimization problem while the underlying robots execute their free-form subtasks with agentic closed-loop control. Across multiple free-form language-guided autonomy coordination benchmarks, FLEET improves success over state of the art generative planners on two-agent teams across heterogeneous tasks. Ablations show that mixed integer linear programming (MILP) primarily improves temporal structure, while LLM-derived fitness is decisive for capability-coupled tasks; together they deliver the highest overall performance. We demonstrate the translation to real world challenges with hardware trials using a pair of quadruped robots with disjoint capabilities.

FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams

TL;DR

FLEET addresses the challenge of coordinating heterogeneous robot teams guided by natural language by integrating an LLM front-end that outputs a task graph with durations and a capability-aware fitness matrix with a formal MILP (and fallback Auction) scheduler to minimize makespan under precedence and resource constraints. It enables closed-loop execution where robots stream status to a world model and trigger replanning when deviations occur, maintaining interpretable artifacts such as the task graph, fitness matrix, and schedule. Across PARTNR simulations and hardware trials with two Spots, FLEET outperforms purely generative planners, particularly on heterogeneous tasks, and ablations demonstrate that MILP and fitness contributions are complementary. The results highlight a practical path toward reliable, linguistically guided multi-robot coordination in open-world environments with safety and efficiency benefits in real hardware.

Abstract

Coordinating heterogeneous robot teams from free-form natural-language instructions is hard. Language-only planners struggle with long-horizon coordination and hallucination, while purely formal methods require closed-world models. We present FLEET, a hybrid decentralized framework that turns language into optimized multi-robot schedules. An LLM front-end produces (i) a task graph with durations and precedence and (ii) a capability-aware robot--task fitness matrix; a formal back-end solves a makespan-minimization problem while the underlying robots execute their free-form subtasks with agentic closed-loop control. Across multiple free-form language-guided autonomy coordination benchmarks, FLEET improves success over state of the art generative planners on two-agent teams across heterogeneous tasks. Ablations show that mixed integer linear programming (MILP) primarily improves temporal structure, while LLM-derived fitness is decisive for capability-coupled tasks; together they deliver the highest overall performance. We demonstrate the translation to real world challenges with hardware trials using a pair of quadruped robots with disjoint capabilities.

Paper Structure

This paper contains 21 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: FLEET — Formal Language-grounded Execution and Efficient Teaming is a hybrid generative–formal framework for natural-language multi-robot tasking. Free-form operator instructions (bottom examples) are ingested by an LLM that (3) decomposes the command into a task graph and constraints and (4) estimates a robot–task fitness matrix. A formal mixed-integer linear programming (MILP) scheduler solves a makespan-minimization problem under precedence, capacity, and spatial constraints to produce a multi-robot schedule (6). Robots execute the plan (7) while streaming status and perception to a World Model (8-9); deviations (delays, failures, new detections) trigger closed-loop replanning back to the LLM and scheduler (10). The architecture supports heterogeneous teams (e.g., IR and RGB/VLM Spots) and yields interpretable artifacts—task graph, fitness matrix, and schedule—that explain decisions.
  • Figure 2: PARTNR Free-form Language-guided Benchmarks Partnr free-form language mulit-agent benchmarks builds on habitat-sim and introduces several categories of free-form language-guided tasks to be completed by one or more agents. The categories include "Constraint Free" where the subtasks are separable and do not necessarily depend on each other, "Heterogeneous" where the agents have disjoint capabilities that must be leveraged correctly to complete the tasks, and "Temporal" where the tasks have an implied dependency structure among the subtasks. This Figure illustrates a task from the Heterogeneous task set where agents are acting on the command "Take all of the glasses from the bedroom to the kitchen and wash them". In this example, the human agent can clean and the quadruped robots can not.
  • Figure 3: Hardware trial: Maneuver with implied dependencies (a) Operator instruction. (b) Planner output: schedule with 3 m segments and enforced alternation. (c–f) Execution frames showing alternating advances.
  • Figure 4: Hardware trial: Cross-modal inspection. The environment included a heat pad under a traffic cone at point-of-interest 1 and a bucket of ice at point of interest 2. The robot team was asked to visually and thermally characterize both points of interest with the additional constraint that only one robot could analyse a point of interest at a time. The robots have disjoint capabilities where Spot-IR can only provide thermal analysis and Spot-RGB/VLM is the only robot that can provide visual question answering. (a) Natural-language instruction. (b) Planner schedule with AND-dependencies (visual+thermal). (c–f) robot QA responses (RGB/IR). (g–j) execution frames. The formal scheduler releases steps on dependency completion, reducing idle time and handoff latency.